"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Kyle Ackerman <kack@kyleackerman.net>
Subject:
Re: ref_resolve() leak
To:
Stefan Sperling <stsp@stsp.name>
Cc:
"gameoftrees@openbsd.org" <gameoftrees@openbsd.org>
Date:
Tue, 12 Nov 2024 06:43:03 +0000

Download raw body.

Thread




On Monday, November 11th, 2024 at 5:13 PM, Stefan Sperling <stsp@stsp.name> wrote:

> 
> 
> On Sun, Nov 10, 2024 at 05:01:16PM +0000, Kyle Ackerman wrote:
> 
> > While I was tracking down a memory leak in ref_resolve(), I did this refactor to plug this leak in `got send`. Currently, got_send_pack() only uses got_pathlist_insert to keep a pathlist without any duplicates. The source of the leak is when a duplicate is to be inserted, got_pathlist_insert returns NULL and the caller leaks it (expecting it to be added to the list). So in order to plug this leak, we could
> > - change the current behavior of the function (which the prototype's comment makes a point that the caller must free the memory)
> > - pro: other function could preform the same leak (this would plug it)
> > - con: a lot of work.
> > - check to see if that path already exists within the pathlist before adding
> > - pro: would prevent duplicates (plugging the leak)
> > - con: we now are adding another linear operation
> > - do the following
> > Creating a new pathlist struct that is in a RB tree (rather than a TAILQ). We can use got_path_cmp() to compare each node in the tree. Since we can't have any duplicates in a RB tree, we don't need to search the tree separate from inserting into the tree.
> > - pro: This plugs the leak. This also makes find_ref() O(log n) rather than O(n) (which may be more important once we get the bitset implementation going).
> > - con: Slight refactor (mainly limited to send.c). Most of this code will be refactored for multi_ack support.
> 
> 
> It would be better to fix the memory leak by reparing existing misuse
> of the current API first. Adding the new API which avoids the problem
> by design should be done as a separate step.
> 
> The pathlist API design dates from before the 0.1 release. When this API
> was introduced I was working on the codebase alone and was focussing more
> on getting things working than making things fast and optimal.
> 
> I don't recall anymore why this code didn't use a tree originally.
> Maybe it was just because using a list is a simple way to get started.
> It would be worth a shot trying to implement the existing API on top
> of a tree, to see if there are any actual use case which a tree cannot
> satisfy. I hope we can get away with a tree in all cases.
> 
> I have spotted one apparent bug in your diff:
> 
> > const struct got_error *
> > +got_pathtree_insert(struct got_pathtree_entry **inserted,
> > + struct got_pathtree *pathlist, const char *path, void *data)
> > +{
> > + struct got_pathtree_entry *new;
> > + size_t path_len = strlen(path);
> > +
> > + if (inserted)
> > + *inserted = NULL;
> > +
> > + new = malloc(sizeof(*new));
> > + if (new == NULL)
> > + return got_error_from_errno("malloc");
> > + new->path = path;
> > + new->path_len = path_len;
> > + new->data = data;
> > + if(RB_INSERT(got_pathtree, pathlist, new)){
> > + free(data);
> 
> 
> The above line frees 'data', which is a pointer owned by the caller.
> It should not be freed here. The caller should free it if necessary.
> 
> > + free(new);
> > + return NULL;
> > + }
> > + if (inserted)
> > + *inserted = new;
> > + return NULL;
> > +}

Here is a diff to repair misuse of the current API. It seems like most other functions create **inserted and check to see if it is NULL to determine if it was added to the pathlist.

diff /home/kyle/src/got-fork
commit - e37a5589c16266235a9b0d3b6d7be7ec67b46390
path + /home/kyle/src/got-fork
blob - 02b0eae943c002b268adbb1836eda2ac7816e499
file + lib/fetch.c
--- lib/fetch.c
+++ lib/fetch.c
@@ -313,9 +313,11 @@ got_fetch_pack(struct got_object_id **pack_hash, struc
 		if (packfile_size_cur <= ssizeof(pack_hdr) + SHA1_DIGEST_LENGTH)
 			packfile_size_cur = 0;
 		if (!done && refname && id) {
-			err = got_pathlist_insert(NULL, refs, refname, id);
+			err = got_pathlist_insert(&pe, refs, refname, id);
 			if (err)
 				goto done;
+			if (pe == NULL)
+				free(id);
 		} else if (!done && server_progress) {
 			char *p;
 			/*
blob - 1976fa9212d7b93d58eab0f3844f52e5f8d48f06
file + lib/send.c
--- lib/send.c
+++ lib/send.c
@@ -172,6 +172,7 @@ insert_sendable_ref(struct got_pathlist_head *refs, co
 	const struct got_error *err;
 	struct got_reference *ref;
 	struct got_object_id *id = NULL;
+	struct got_pathlist_entry *new = NULL;
 	int obj_type;
 
 	err = got_ref_open(&ref, repo, refname, 0);
@@ -200,11 +201,11 @@ insert_sendable_ref(struct got_pathlist_head *refs, co
 		goto done;
 	}
 
-	err = got_pathlist_insert(NULL, refs, target_refname, id);
+	err = got_pathlist_insert(&new, refs, target_refname, id);
 done:
 	if (ref)
 		got_ref_close(ref);
-	if (err)
+	if (err || new == NULL /* duplicate */)
 		free(id);
 	return err;
 }


I have left the instances of got_pathlist_insert() with NULL as the id and inserted (as we don't need to check to free id if it is NULL). Can also confirm this plugs the memory leak.