bufferize smtp parsing in got-notify-email

On Wed, Mar 20, 2024 at 07:49:54PM +0100, Omar Polo wrote: > On 2024/03/20 17:56:14 +0100, Stefan Sperling <stsp@stsp.name> wrote: > > > +#define SMTP_LINE_MAX 65535 > > > > Is this limit part of the SMTP spec? Or where does it come from? > > I've taken this from smtpd. right now I'm not finding the maximum value > from a cursory read of RFC5321. Fair enough. If it works for smtpd then it will work here. > > > for (;;) { > > > - error = got_poll_read_full_timeout(s, &len, buf, 1, 1, > > > - smtp_timeout); > > > + endl = memmem(smtp_buf, smtp_buflen, "\r\n", 2); > > > + if (endl != NULL) > > > + break; > > > + > > > + if (smtp_buflen == sizeof(smtp_buf)) > > > + errx(1, "line too long"); > > > > It seems odd to be doing this length check after already using > > smtp_buflen in the above memmem call. Wouldn't it make more sense > > to run this check after smtp_buflen is incremented at the bottom of > > the loop? > > I think this needs to stay here. Imagine the corner case of having the > buffer almost full but no CRLF in it. We poll(), read some more data > and fill the buffer: we should still check for a full line before > failing. > > It may make more sense to read this loop starting with the got_poll_fd() > call and then wrapping around until here. Yes, I was trying to see if there's a way to write this loop in a way that can be read from top to bottom without raising eyebrows. But fine, let's leave it as is. > actually we can't get EAGAIN here since we just polled with POLLIN, > which guarantees us that there's some data to be read without blocking, > so I've removed the errno check. Very good. > > It seems you have essentially reimplemented got_poll_read_full_timeout() > > and added the EAGAIN check. Can we implement this change by keeping > > got_poll_read_full_timeout() and extend that function with the EAGAIN > > check if it is needed? > > it's not a read_full() since we keep reading only until a full line has > been read, not until the buffer is completely filled. > > > I would like to avoid having variants of this > > poll loop spread across the entire code base, since it is extra effort > > to review such loops for correctness over and over again. Better to > > have it in one place if possible. > > I agree, but if you're fine with it I'd wait before adding an interface > like this to pollfd.c. I'd like to attempt to write the http notifier > first and then see what kind of interface could work for both cases. Alright then. Maybe we can later wrap this code as got_poll_read() which would support partial reads, as opposed to read_full() which tries to fill the entire buffer. > > > + /* > > > + * Normally we would get just one reply, but the regress doesn't > > > + * use a real SMTP server and queues all the replies upfront. > > > + */ > > > + linelen += 2; > > > + if (smtp_buflen == linelen) > > > > Maybe write >= here just to make sure we don't drop into the else > > clause and do the memmove with bogus length values? I understand > > this cannot happen if the buffer smtp response with "\r\n" (and > > we wouldn't get here otherwise), but still. > > In the normal case, you'd be right, but we break the regress like this. > A real smtp server will send just one reply, so at each call we only > read one line and process it. However, the regress queues up all the > replies, so on the first read() we fill the buffer with all the replies, > then we just consume the buffer. > > The if is actually a micro-optimization, I could just remove it and do > the memmove unconditionally. at worst, it'll be done with a length of > zero. I guess calling memmove with zero length would be fine. But it doesn't really matter. > Also, if we'll switch to using EHLO instead of HELO (which is actually > deprecated AFAICS) being able to process multiple lines of reply will > come in handy. I don't really understand SMTP. I looked at SMTP page on wikipedia when I implemented this and went by the example there, and it worked :) If EHLO offers some advantages then I suppose we should use it. By the way, I was wondering whether we should use smtpd.conf-style relay host URLs instead of 'hostname X port Y" to keep existing configs backwards compatible in case we add support for things like smtp+tls? In the long run we might end up adding TLS support and authentication because some people will want to avoid running a local MTA. In which case this tool will have to become smarter about SMTP anyway.

2024-03-20 16:56 Stefan Sperling:
bufferize smtp parsing in got-notify-email
- 2024-03-20 18:49 Omar Polo:
  bufferize smtp parsing in got-notify-email
- - 2024-03-21 08:16 Stefan Sperling:
    bufferize smtp parsing in got-notify-email
  - - 2024-03-21 09:12 Omar Polo:
      bufferize smtp parsing in got-notify-email