Download raw body.
improve cleaning of redundant pack files
Stefan Sperling <stsp@stsp.name> wrote: > I found a server running out of disk in /var/www/got/public because > repositories kept accumulating pack files every time gotadmin cleanup > was run by my sync script. > > All those pack files were redundant and should have been removed. > But this didn't happen. > > We use the timestamp of the last modified reference as a guide > for our modication timetamp safety check. All refs in affected > repositories were packed in the packed-refs file, which was last > modified about two days ago, so the modication threshold was set to > roughly two days ago minus 10 minutes. Any pack file newer than this > will be kept, even if it is 100% redundant, and even if gotadmin cleanup > itself created this pack. > > My sync script runs 'gotadmin cleanup' every 5 minutes. > So every 5 minutes a new redundant pack is created. These will linger > until a reference is updated. But if no such update happens within > 2 days then the disk runs full. > > We can fix this by applying the modification timestamp check only to > pack files which contain unreferenced objects. This way, if a pack file > is truely redundant, i.e. only contains objects which are referenced and > hence are part of the large pack file created by gotadmin cleanup, then > this pack file will be removed. > > There is an existing bug exposed by this tweak: > If a commit is discovered via a tag then we need to add the commit's ID > to the list of traversed object IDs, but we fail to do so (first chunk > in the diff below). I have seen this happen on the Dulwich repository. > Which grew to over 3 GB within 2 days, while it is actually around 15 MB > when repacked properly... ouch. > > ok? Love it! Sorry for the delay in reviewing but wanted to spend some time on this. I really like it. I think I should have done something like this back when we changed cleanup to also produce a packfile, we suffered a bit from my previous implementation that was just trying to remove redundant packfiles without creating anything new. To be fair, I'd prefer if this could be split into three, since it's a delicate part of the codebase, one for the missing ID colouring (ooos!), one for the path_len addition and a last one for the rest, but your call. Just one minor micro-nit below: > @@ -1433,6 +1453,16 @@ repo_purge_redundant_packfiles(struct got_repository * > sorted = calloc(npacks, sizeof(*sorted)); > if (sorted == NULL) > return got_error_from_errno("calloc"); > + please remove this trailing witespace :p > + err = got_object_id_str(&id_str, large_pack_hash); > + if (err) > + goto done; > + large_idxpath_len = asprintf(&large_idxpath, "%s/pack-%s.idx", > + GOT_OBJECTS_PACK_DIR, id_str); > + if (large_idxpath_len == -1) { > + err = got_error_from_errno("asprintf"); > + goto done; > + } ok op@ Thanks, Omar Polo
improve cleaning of redundant pack files