From: Omar Polo <op@omarpolo.com>
Subject: Re: struct vs buffers for got_imsg_tree_entry
To: Stefan Sperling <stsp@stsp.name>
Cc: gameoftrees@openbsd.org
Date: Thu, 02 Feb 2023 12:22:27 +0100

On 2023/02/01 19:37:45 +0100, Stefan Sperling <stsp@stsp.name> wrote:
> On Wed, Feb 01, 2023 at 05:55:15PM +0100, Omar Polo wrote:
> > Not as straightforward as the other diffs. The struct
> > got_parse_tree_entry keeps a pointer to the ID in the parsed tree
> > buffer so we need to reconstruct the got_object_id from that.  I've
> > done so in the function send_tree_entries_batch, although we could
> > also do that directly when parsing.  I went with the former to avoid
> > going thru the rabbit hole of changing got_parsed_tree_entry too.
> > We'll get to that later.
> > 
> > This is needed because I'd like to transmit only structs got_object_id
> > over imsg and don't want to run into issue should that struct be
> > changed (*wink*).  The downside is that we introduce a memcpy
> > (actually, double, imsg_add is another implcit memcpy) that might
> > impact runtime.
> 
> struct got_imsg_tree_entry already used got_object_id before
> this commit: e033d8037031ecafbf4a99c69c25f1be778ef9d3 
> 
> I would like to keep tree entry parsing free of memcpy as much
> as possible in got-read-tree and especially got-read-pack.
> Considerable work went into this before, with repeated profiling etc.

I see.  I suspected that but wasn't sure.

> Could we use a special-case abstraction just for this layer?
> When adding sha256 support we could grow the ID array to the new
> maximum size, and keep pointers to the ID buffer as it is done now?
> Both SHA1 and SHA256 would then fit into this buffer:
> 
>  -	char id[SHA1_DIGEST_LENGTH];
>  +	char id[SHA256_DIGEST_LENGTH];
> 
> And we might perhaps need a flag or length field to tell us how
> much valid data can be found in the buffer. Though such a field
> would probably be redundant global repository format info, which
> we would need to keep around anyway. In the parent process there would
> be a struct repository available which would tell us which ID format
> is being used. And the helpers would likewise need this information
> to parse objects correctly, so they should also be able to tell how
> to populate this particular struct correctly.

Agreed; after all we can deduce the info, or in the worst case add a
"idlen" field.  The most important thing IMHO is to 1) send
got_object_id so the libexec process know what kind of data to expect
and 2) don't read SHA1_DIGEST_LENGTH bytes as `struct got_object_id'.

consider this patch dropped then, we'll revisit this part when
actually getting to sha256.