"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

Stefan Sperling <stsp@stsp.name>
Re: bufferize smtp parsing in got-notify-email
Omar Polo <op@omarpolo.com>
Thu, 21 Mar 2024 09:16:36 +0100

Download raw body.

On Wed, Mar 20, 2024 at 07:49:54PM +0100, Omar Polo wrote:
> On 2024/03/20 17:56:14 +0100, Stefan Sperling <stsp@stsp.name> wrote:
> > > +#define SMTP_LINE_MAX	65535
> > 
> > Is this limit part of the SMTP spec? Or where does it come from?
> I've taken this from smtpd.  right now I'm not finding the maximum value
> from a cursory read of RFC5321.

Fair enough. If it works for smtpd then it will work here.

> > >  	for (;;) {
> > > -		error = got_poll_read_full_timeout(s, &len, buf, 1, 1,
> > > -		    smtp_timeout);
> > > +		endl = memmem(smtp_buf, smtp_buflen, "\r\n", 2);
> > > +		if (endl != NULL)
> > > +			break;
> > > +
> > > +		if (smtp_buflen == sizeof(smtp_buf))
> > > +			errx(1, "line too long");
> > 
> > It seems odd to be doing this length check after already using
> > smtp_buflen in the above memmem call. Wouldn't it make more sense
> > to run this check after smtp_buflen is incremented at the bottom of
> > the loop?
> I think this needs to stay here.  Imagine the corner case of having the
> buffer almost full but no CRLF in it.  We poll(), read some more data
> and fill the buffer: we should still check for a full line before
> failing.
> It may make more sense to read this loop starting with the got_poll_fd()
> call and then wrapping around until here.

Yes, I was trying to see if there's a way to write this loop in a way that
can be read from top to bottom without raising eyebrows.
But fine, let's leave it as is.

> actually we can't get EAGAIN here since we just polled with POLLIN,
> which guarantees us that there's some data to be read without blocking,
> so I've removed the errno check.

Very good.
> > It seems you have essentially reimplemented got_poll_read_full_timeout()
> > and added the EAGAIN check. Can we implement this change by keeping
> > got_poll_read_full_timeout() and extend that function with the EAGAIN
> > check if it is needed?
> it's not a read_full() since we keep reading only until a full line has
> been read, not until the buffer is completely filled.
> > I would like to avoid having variants of this
> > poll loop spread across the entire code base, since it is extra effort
> > to review such loops for correctness over and over again. Better to
> > have it in one place if possible.
> I agree, but if you're fine with it I'd wait before adding an interface
> like this to pollfd.c.  I'd like to attempt to write the http notifier
> first and then see what kind of interface could work for both cases.

Alright then. Maybe we can later wrap this code as got_poll_read() which
would support partial reads, as opposed to read_full() which tries to
fill the entire buffer.

> > > +	/*
> > > +	 * Normally we would get just one reply, but the regress doesn't
> > > +	 * use a real SMTP server and queues all the replies upfront.
> > > +	 */
> > > +	linelen += 2;
> > > +	if (smtp_buflen == linelen)
> > 
> > Maybe write >= here just to make sure we don't drop into the else
> > clause and do the memmove with bogus length values? I understand
> > this cannot happen if the buffer smtp response with "\r\n" (and
> > we wouldn't get here otherwise), but still.
> In the normal case, you'd be right, but we break the regress like this.
> A real smtp server will send just one reply, so at each call we only
> read one line and process it.  However, the regress queues up all the
> replies, so on the first read() we fill the buffer with all the replies,
> then we just consume the buffer.
> The if is actually a micro-optimization, I could just remove it and do
> the memmove unconditionally.  at worst, it'll be done with a length of
> zero.

I guess calling memmove with zero length would be fine.
But it doesn't really matter.

> Also, if we'll switch to using EHLO instead of HELO (which is actually
> deprecated AFAICS) being able to process multiple lines of reply will
> come in handy.

I don't really understand SMTP.
I looked at SMTP page on wikipedia when I implemented this and went
by the example there, and it worked :)
If EHLO offers some advantages then I suppose we should use it.

By the way, I was wondering whether we should use smtpd.conf-style
relay host URLs instead of 'hostname X port Y" to keep existing
configs backwards compatible in case we add support for things like
smtp+tls? In the long run we might end up adding TLS support and
authentication because some people will want to avoid running a local MTA.
In which case this tool will have to become smarter about SMTP anyway.