"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: object enumeration in got-read-pack
To:
Omar Polo <op@omarpolo.com>
Cc:
gameoftrees@openbsd.org
Date:
Tue, 24 May 2022 22:06:28 +0200

Download raw body.

Thread
On Tue, May 24, 2022 at 07:45:13PM +0200, Omar Polo wrote:
> unfortunately I broke this again with the arc4random diff... apologize.
> I'm attaching a rebased diff (I just had to propagate the `seed' into
> some more functions via the new load_packed_obj_arg struct.)  The diff
> doesn't adress _any_ of my comments, it's just your diff rebased hoping
> it saves you some time :)

No worries.

This patch still has some design-level problems and I don't consider it
done yet. The current design works well for the case where we want to pack
every object that is referenced, but not so well when we want to pack only
a subset of objects; got-read-pack could end up doing a lot of unnecessary
work in that case. It only knows commits to start enumerating from, but has
no boundary commits that would tell it where to stop enumerating; it just
keeps going until it runs out of parent commits to traverse.
I still need to think about this some more.

> some comments/questions inline.

> > +	if (tree == NULL) {
> 
> I got_object_id_dup fails in this if body aren't we leaking those dup'ed
> ids?
> 
> If i'm reading correctly we're coming here from load_packed_object_ids
> which allocates load_packed_obj_arg on the stack and bails out on error.

Yes, a->dpath at least will leak when the first got_object_id_dup(id)
fails. Nice catch, I'll come up with a way to prevent this.

> 
> > +		free(a->id);
> > +		a->id = got_object_id_dup(id);
> > +		if (a->id == NULL)
> > +			return got_error_from_errno("got_object_id_dup");
> > +
> > +		free(a->dpath);
> > +		a->dpath = strdup(dpath);
> > +		if (a->dpath == NULL)
> > +			return got_error_from_errno("strdup");
> > +
> > +		a->mtime = mtime;
> > +		return NULL;
> > +	}
> nit: maybe drop braces here?

I tend to keep braces around multi-line statements, just as a matter
of habit. But they aren't needed for pure indent, indeed.

> > +	if (best_packidx_path) {
> > +		err = got_repo_get_packidx(best_packidx, best_packidx_path,
> > +		    repo);
> > +	}

> nit: I think you can leave this in a single line (it would be 70
> characters long) now that it's not nested anymore.  FWIW i sometimes
> find easier to mentally parse blocks like these if they're on a single
> line.

Right, thanks!

> > +		if (te_offset + sizeof(ite) + ite.namelen >
> > +		    datalen) {
> > +			err = got_error(GOT_ERR_PRIVSEP_LEN);
> > +			break;
> > +		}

> > +	if (imsg_add(wbuf, tree_id->sha1, SHA1_DIGEST_LENGTH) == -1) {
> > +		err = got_error_from_errno("imsg_add ENUMERATED_TREE");
> 
> we shouldn't call ibuf_free in the imsg_add error case, right?
> 
> (the other imsg_add calls are fine, these one here are probably a
> leftover)

Yes, correct. I will remove the ibuf_free() calls on error in a
later revision of this patch.

> > +		ibuf_free(wbuf);
> > +		return err;
> > +	}
> > +	if (imsg_add(wbuf, &nentries, sizeof(nentries)) == -1) {
> > +		err = got_error_from_errno("imsg_add ENUMERATED_TREE");
> > +		ibuf_free(wbuf);
> > +		return err;
> > +	}
> > +	if (imsg_add(wbuf, path, path_len) == -1) {
> > +		err = got_error_from_errno("imsg_add ENUMERATED_TREE");
> > +		ibuf_free(wbuf);
> > +		return err;
> > +	}