From: Stefan Sperling Subject: Re: ref_resolve() leak To: Kyle Ackerman Cc: "gameoftrees@openbsd.org" Date: Tue, 12 Nov 2024 00:13:43 +0100 On Sun, Nov 10, 2024 at 05:01:16PM +0000, Kyle Ackerman wrote: > While I was tracking down a memory leak in ref_resolve(), I did this refactor to plug this leak in `got send`. Currently, got_send_pack() only uses got_pathlist_insert to keep a pathlist without any duplicates. The source of the leak is when a duplicate is to be inserted, got_pathlist_insert returns NULL and the caller leaks it (expecting it to be added to the list). So in order to plug this leak, we could > - change the current behavior of the function (which the prototype's comment makes a point that the caller must free the memory) > - pro: other function could preform the same leak (this would plug it) > - con: a lot of work. > - check to see if that path already exists within the pathlist before adding > - pro: would prevent duplicates (plugging the leak) > - con: we now are adding another linear operation > - do the following > Creating a new pathlist struct that is in a RB tree (rather than a TAILQ). We can use got_path_cmp() to compare each node in the tree. Since we can't have any duplicates in a RB tree, we don't need to search the tree separate from inserting into the tree. > - pro: This plugs the leak. This also makes find_ref() O(log n) rather than O(n) (which may be more important once we get the bitset implementation going). > - con: Slight refactor (mainly limited to send.c). Most of this code will be refactored for multi_ack support. It would be better to fix the memory leak by reparing existing misuse of the current API first. Adding the new API which avoids the problem by design should be done as a separate step. The pathlist API design dates from before the 0.1 release. When this API was introduced I was working on the codebase alone and was focussing more on getting things working than making things fast and optimal. I don't recall anymore why this code didn't use a tree originally. Maybe it was just because using a list is a simple way to get started. It would be worth a shot trying to implement the existing API on top of a tree, to see if there are any actual use case which a tree cannot satisfy. I hope we can get away with a tree in all cases. I have spotted one apparent bug in your diff: > const struct got_error * > +got_pathtree_insert(struct got_pathtree_entry **inserted, > + struct got_pathtree *pathlist, const char *path, void *data) > +{ > + struct got_pathtree_entry *new; > + size_t path_len = strlen(path); > + > + if (inserted) > + *inserted = NULL; > + > + new = malloc(sizeof(*new)); > + if (new == NULL) > + return got_error_from_errno("malloc"); > + new->path = path; > + new->path_len = path_len; > + new->data = data; > + if(RB_INSERT(got_pathtree, pathlist, new)){ > + free(data); The above line frees 'data', which is a pointer owned by the caller. It should not be freed here. The caller should free it if necessary. > + free(new); > + return NULL; > + } > + if (inserted) > + *inserted = new; > + return NULL; > +}