From: Omar Polo Subject: Re: struct vs buffers for got_imsg_tree_entry To: Stefan Sperling Cc: gameoftrees@openbsd.org Date: Thu, 02 Feb 2023 12:22:27 +0100 On 2023/02/01 19:37:45 +0100, Stefan Sperling wrote: > On Wed, Feb 01, 2023 at 05:55:15PM +0100, Omar Polo wrote: > > Not as straightforward as the other diffs. The struct > > got_parse_tree_entry keeps a pointer to the ID in the parsed tree > > buffer so we need to reconstruct the got_object_id from that. I've > > done so in the function send_tree_entries_batch, although we could > > also do that directly when parsing. I went with the former to avoid > > going thru the rabbit hole of changing got_parsed_tree_entry too. > > We'll get to that later. > > > > This is needed because I'd like to transmit only structs got_object_id > > over imsg and don't want to run into issue should that struct be > > changed (*wink*). The downside is that we introduce a memcpy > > (actually, double, imsg_add is another implcit memcpy) that might > > impact runtime. > > struct got_imsg_tree_entry already used got_object_id before > this commit: e033d8037031ecafbf4a99c69c25f1be778ef9d3 > > I would like to keep tree entry parsing free of memcpy as much > as possible in got-read-tree and especially got-read-pack. > Considerable work went into this before, with repeated profiling etc. I see. I suspected that but wasn't sure. > Could we use a special-case abstraction just for this layer? > When adding sha256 support we could grow the ID array to the new > maximum size, and keep pointers to the ID buffer as it is done now? > Both SHA1 and SHA256 would then fit into this buffer: > > - char id[SHA1_DIGEST_LENGTH]; > + char id[SHA256_DIGEST_LENGTH]; > > And we might perhaps need a flag or length field to tell us how > much valid data can be found in the buffer. Though such a field > would probably be redundant global repository format info, which > we would need to keep around anyway. In the parent process there would > be a struct repository available which would tell us which ID format > is being used. And the helpers would likewise need this information > to parse objects correctly, so they should also be able to tell how > to populate this particular struct correctly. Agreed; after all we can deduce the info, or in the worst case add a "idlen" field. The most important thing IMHO is to 1) send got_object_id so the libexec process know what kind of data to expect and 2) don't read SHA1_DIGEST_LENGTH bytes as `struct got_object_id'. consider this patch dropped then, we'll revisit this part when actually getting to sha256.