From: Stefan Sperling Subject: Re: got fetch downloads too much To: Christian Weisgerber Cc: gameoftrees@openbsd.org Date: Wed, 6 Oct 2021 20:30:33 +0200 On Wed, Oct 06, 2021 at 07:19:15PM +0200, Christian Weisgerber wrote: > Stefan Sperling: > > > Is it reproducible? You could try "rewinding" those refs to the > > commits which you started fetching with, e.g. with 'got ref'. > > $ got ref -c 7f4bb501768b9b3f856f91fbc8b9c80a2a4aaa04 refs/remotes/origin/main > $ got ref -c f7e56b48260646f8b1d5d619a92ca1b2c2d26efc refs/remotes/origin/stable/12 > > > And then fetch again. Does it still fetch more than it should? > > It does: > > Connecting to "origin" anongit@git.freebsd.org > server: Enumerating objects: 82599, done. > server: Counting objects: 100% (19209/19209), done. > server: Compressing objects: 100% (12254/12254), done. > server: Total 82599 (delta 13980), reused 6955 (delta 6955), pack-reused 63390 > 75M fetched; indexing 100%; resolving deltas 100% > Fetched 8bbc0e2ea7ac9610a329d754bff75fc96724f1cb.pack > Updated refs/remotes/origin/main: da3278ded3b2647d26da26788bab8363e502a144 > Updated refs/remotes/origin/stable/12: 6fea4b82e7b86ac680d5615f8361863353737325 > Updated refs/remotes/origin/stable/13: b1cca74367374bbb9cdc881c671a9f9525dca313 > > got.conf: > > remote "origin" { > server anongit@git.freebsd.org > protocol ssh > repository "/src.git" > fetch-all-branches yes > } My guess is that you did not not yet rebase the corresponding branches in refs/heads before fetching again? In that case, the current code keeps fetching the objects again in order to get "new" objects for e.g. refs/heads/main. Here is a quick hack to prevent this. See the comment added by the patch for details. diff 0c079dbc7e6ed9857d4a13d908cc6858d57c81ec /home/stsp/src/got blob - 96308310d28fd9186f083ce3c0028cceb9cd61b5 file + libexec/got-fetch-pack/got-fetch-pack.c --- libexec/got-fetch-pack/got-fetch-pack.c +++ libexec/got-fetch-pack/got-fetch-pack.c @@ -65,6 +65,29 @@ static const struct got_capability got_capabilities[] { GOT_CAPA_SIDE_BAND_64K, NULL }, }; +static int +match_remote_object_id(struct got_pathlist_head *have_refs, + struct got_object_id *their_id) +{ + struct got_pathlist_entry *pe; + + /* + * This only matches the tip commit of each of our references. + * A better approach might be to ask the main process if their + * object ID is contained anywhere in our repository. However, + * searching the tips is good enough to avoid repeated downloads + * in case we have cached the current tip in refs/remotes/ but + * the corresponding branch in refs/heads/ has not been rebased. + */ + TAILQ_FOREACH(pe, have_refs, entry) { + struct got_object_id *id = pe->data; + if (got_object_id_cmp(id, their_id) == 0) + return 1; + } + + return 0; +} + static void match_remote_ref(struct got_pathlist_head *have_refs, struct got_object_id *my_id, char *refname) @@ -457,6 +480,14 @@ fetch_pack(int fd, int packfd, uint8_t *pack_sha1, err = got_error(GOT_ERR_BAD_OBJ_ID_STR); goto done; } + if (match_remote_object_id(have_refs, &want[nref])) { + if (chattygot) { + fprintf(stderr, + "%s: not fetching %s, we already have %s\n", + getprogname(), refname, id_str); + } + continue; + } match_remote_ref(have_refs, &have[nref], refname); err = send_fetch_ref(ibuf, &want[nref], refname); if (err)