"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
Re: check RB_INSERT() return value
To:
Stefan Sperling <stsp@stsp.name>
Cc:
gameoftrees@openbsd.org
Date:
Thu, 14 Apr 2022 08:59:49 +0200

Download raw body.

Thread
Stefan Sperling <stsp@stsp.name> wrote:
> Turns out RB_INSERT() has a return value which I overlooked.
> This return value can tell us whether an element we are trying
> to insert is already present in the RB tree.
> 
> The patch below simply raises an error in such situations. Our existing
> code should never allow this condition to occur. If we hit this error
> then there is a bug that must fixed.
> 
> In the future, we could use this to optimize code which performs
> a presence check before doing an insertion.
> For example, assuming we propagate RB_INSERT() errors up via the
> function got_object_idset_add(), instead of having code which does:
> 
> 	if (!got_object_idset_contains(idset, id)) {
> 		err = got_object_idset_add(idset, id, NULL);
> 		if (err)
> 			goto done;
> 	}
> 
> ... we could have code which does the same thing with just
> a single walk of the RB tree instead of two walks:
> 
> 	err = got_object_idset_add(idset, id, NULL);
> 	if (err && err->code != GOT_ERR_OBJ_EXISTS)
> 		goto done;
> 
> I have a WIP patch which makes some changes of the above nature
> in pack_create.c, but it does not help with performance (yet?).
> It is possible that we spend so much time looking things up that
> the overhead related to additions is too small to make a difference.
> In principle it is always a good idea to avoid doing any extra
> work which we can avoid doing.

of course

> In any case, as a first step, we should check the return value.
> 
> ok?

I like the idea, and the patch looks fine, modulo a minor nit.  With a
bit more of context on got_object_idset_add:

 	new = malloc(sizeof(*new));
 	if (new == NULL)
 		return got_error_from_errno("malloc");
 
 	memcpy(&new->id, id, sizeof(new->id));
 	new->data = data;
 
-	RB_INSERT(got_object_idset_tree, &set->entries, new)
+	if (RB_INSERT(got_object_idset_tree, &set->entries, new) != NULL)
+		return got_error(GOT_ERR_OBJ_EXISTS);

shouldn't we free `new' here before returning the error?  At the moment
this is innocuos since we check that element isn't present before adding
it, but later this would turn into a leak.

> diff 3d1a1e4cbc91e39df83eb35eb3df3cc87719a0b5 096ea7a61d69620491f66b05fccd7e0e72f61f2b
> blob - 15a27715685e7d137844ebe2be87a33d550d6f96
> blob + 8ae16da40c07f4dcb3133f9ff08328a10605722b
> --- include/got_error.h
> +++ include/got_error.h
> @@ -167,6 +167,7 @@
>  #define GOT_ERR_NO_PATCH	149
>  #define GOT_ERR_HUNK_FAILED	150
>  #define GOT_ERR_PATCH_FAILED	151
> +#define GOT_ERR_FILEIDX_DUP_ENTRY 152
>  
>  struct got_error {
>          int code;
> blob - 19d6e558696c99c50807ee4f730af001911d1f1f
> blob + 29541c0234b7f96a4638157fdd9017888571a76c
> --- lib/error.c
> +++ lib/error.c
> @@ -215,6 +215,7 @@ static const struct got_error got_errors[] = {
>  	{ GOT_ERR_NO_PATCH, "no patch found" },
>  	{ GOT_ERR_HUNK_FAILED, "hunk failed to apply" },
>  	{ GOT_ERR_PATCH_FAILED, "patch failed to apply" },
> +	{ GOT_ERR_FILEIDX_DUP_ENTRY, "duplicate file index entry" },
>  };
>  
>  static struct got_custom_error {
> blob - 6ea3deaf8e0753eaf31a2c0f81af1a25b313be30
> blob + b05a8137d02606a3c4fee8f04809b0ac5cd71e8d
> --- lib/fileindex.c
> +++ lib/fileindex.c
> @@ -265,7 +265,9 @@ add_entry(struct got_fileindex *fileindex, struct got_
>  	if (fileindex->nentries >= GOT_FILEIDX_MAX_ENTRIES)
>  		return got_error(GOT_ERR_NO_SPACE);
>  
> -	RB_INSERT(got_fileindex_tree, &fileindex->entries, ie);
> +	if (RB_INSERT(got_fileindex_tree, &fileindex->entries, ie) != NULL)
> +		return got_error_path(ie->path, GOT_ERR_FILEIDX_DUP_ENTRY);
> +
>  	fileindex->nentries++;
>  	return NULL;
>  }
> blob - 37ea8e262784f9521e53f7db7f9ff11232dce3d6
> blob + aee8d02645d4856a33f795cd0888ee6ae12d64ac
> --- lib/object_idset.c
> +++ lib/object_idset.c
> @@ -102,7 +102,9 @@ got_object_idset_add(struct got_object_idset *set, str
>  	memcpy(&new->id, id, sizeof(new->id));
>  	new->data = data;
>  
> -	RB_INSERT(got_object_idset_tree, &set->entries, new);
> +	if (RB_INSERT(got_object_idset_tree, &set->entries, new) != NULL)
> +		return got_error(GOT_ERR_OBJ_EXISTS);
> +
>  	set->totelem++;
>  	return NULL;
>  }