From: Kyle Ackerman Subject: Re: ref_resolve() leak To: Stefan Sperling Cc: "gameoftrees@openbsd.org" Date: Tue, 12 Nov 2024 06:43:03 +0000 On Monday, November 11th, 2024 at 5:13 PM, Stefan Sperling wrote: > > > On Sun, Nov 10, 2024 at 05:01:16PM +0000, Kyle Ackerman wrote: > > > While I was tracking down a memory leak in ref_resolve(), I did this refactor to plug this leak in `got send`. Currently, got_send_pack() only uses got_pathlist_insert to keep a pathlist without any duplicates. The source of the leak is when a duplicate is to be inserted, got_pathlist_insert returns NULL and the caller leaks it (expecting it to be added to the list). So in order to plug this leak, we could > > - change the current behavior of the function (which the prototype's comment makes a point that the caller must free the memory) > > - pro: other function could preform the same leak (this would plug it) > > - con: a lot of work. > > - check to see if that path already exists within the pathlist before adding > > - pro: would prevent duplicates (plugging the leak) > > - con: we now are adding another linear operation > > - do the following > > Creating a new pathlist struct that is in a RB tree (rather than a TAILQ). We can use got_path_cmp() to compare each node in the tree. Since we can't have any duplicates in a RB tree, we don't need to search the tree separate from inserting into the tree. > > - pro: This plugs the leak. This also makes find_ref() O(log n) rather than O(n) (which may be more important once we get the bitset implementation going). > > - con: Slight refactor (mainly limited to send.c). Most of this code will be refactored for multi_ack support. > > > It would be better to fix the memory leak by reparing existing misuse > of the current API first. Adding the new API which avoids the problem > by design should be done as a separate step. > > The pathlist API design dates from before the 0.1 release. When this API > was introduced I was working on the codebase alone and was focussing more > on getting things working than making things fast and optimal. > > I don't recall anymore why this code didn't use a tree originally. > Maybe it was just because using a list is a simple way to get started. > It would be worth a shot trying to implement the existing API on top > of a tree, to see if there are any actual use case which a tree cannot > satisfy. I hope we can get away with a tree in all cases. > > I have spotted one apparent bug in your diff: > > > const struct got_error * > > +got_pathtree_insert(struct got_pathtree_entry **inserted, > > + struct got_pathtree *pathlist, const char *path, void *data) > > +{ > > + struct got_pathtree_entry *new; > > + size_t path_len = strlen(path); > > + > > + if (inserted) > > + *inserted = NULL; > > + > > + new = malloc(sizeof(*new)); > > + if (new == NULL) > > + return got_error_from_errno("malloc"); > > + new->path = path; > > + new->path_len = path_len; > > + new->data = data; > > + if(RB_INSERT(got_pathtree, pathlist, new)){ > > + free(data); > > > The above line frees 'data', which is a pointer owned by the caller. > It should not be freed here. The caller should free it if necessary. > > > + free(new); > > + return NULL; > > + } > > + if (inserted) > > + *inserted = new; > > + return NULL; > > +} Here is a diff to repair misuse of the current API. It seems like most other functions create **inserted and check to see if it is NULL to determine if it was added to the pathlist. diff /home/kyle/src/got-fork commit - e37a5589c16266235a9b0d3b6d7be7ec67b46390 path + /home/kyle/src/got-fork blob - 02b0eae943c002b268adbb1836eda2ac7816e499 file + lib/fetch.c --- lib/fetch.c +++ lib/fetch.c @@ -313,9 +313,11 @@ got_fetch_pack(struct got_object_id **pack_hash, struc if (packfile_size_cur <= ssizeof(pack_hdr) + SHA1_DIGEST_LENGTH) packfile_size_cur = 0; if (!done && refname && id) { - err = got_pathlist_insert(NULL, refs, refname, id); + err = got_pathlist_insert(&pe, refs, refname, id); if (err) goto done; + if (pe == NULL) + free(id); } else if (!done && server_progress) { char *p; /* blob - 1976fa9212d7b93d58eab0f3844f52e5f8d48f06 file + lib/send.c --- lib/send.c +++ lib/send.c @@ -172,6 +172,7 @@ insert_sendable_ref(struct got_pathlist_head *refs, co const struct got_error *err; struct got_reference *ref; struct got_object_id *id = NULL; + struct got_pathlist_entry *new = NULL; int obj_type; err = got_ref_open(&ref, repo, refname, 0); @@ -200,11 +201,11 @@ insert_sendable_ref(struct got_pathlist_head *refs, co goto done; } - err = got_pathlist_insert(NULL, refs, target_refname, id); + err = got_pathlist_insert(&new, refs, target_refname, id); done: if (ref) got_ref_close(ref); - if (err) + if (err || new == NULL /* duplicate */) free(id); return err; } I have left the instances of got_pathlist_insert() with NULL as the id and inserted (as we don't need to check to free id if it is NULL). Can also confirm this plugs the memory leak.