"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Stefan Sperling <stsp@stsp.name>
Subject:
Re: got fetch downloads too much
To:
Christian Weisgerber <naddy@mips.inka.de>, gameoftrees@openbsd.org
Date:
Wed, 6 Oct 2021 21:09:46 +0200

Download raw body.

Thread
On Wed, Oct 06, 2021 at 08:30:33PM +0200, Stefan Sperling wrote:
> My guess is that you did not not yet rebase the corresponding branches
> in refs/heads before fetching again?
> 
> In that case, the current code keeps fetching the objects again in order
> to get "new" objects for e.g. refs/heads/main. Here is a quick hack to
> prevent this. See the comment added by the patch for details.

Ignore the previous patch. I found a better way to handle this.
And unlike the previous patch, this new one does not break the tests...

The real problem is that we are sending the server an incomplete
picture of the object IDs we have at the tip of our references.
We are only telling the server about tips of references we want to fetch.
Whil, we should be sending the tips of *all* our references.

With this change the server can do a full scan for the objects required,
understands that we don't need to download anything, and sends us an empty
pack file. Even when we ask for more objects to fetch for branches in
our refs/heads namespace which have not been rebased yet. Because the
server now sees the object ID which our refs/remotes/origin/main reference
points at, and takes this into consideration when computing the set of
objects it needs to send:

  server: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
  Already up-to-date

And I found two cosmetic issues when fetching an empty pack.

The first is that got fetch prints a final:

  Fetched 029d08823bd8a8eab510ad6ac75c823cfd3ed31e.pack

Even if the pack was empty and has therefore not been indexed and
already deleted from disk. This is fixed by setting *pack_hash to
NULL in case the downloaded pack file is empty.

And we need to tweak pack file size progress reporting a bit because
otherwise we see some bogus download progress for the packfile's header:

  server: Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
  32B fetchedAlready up-to-date

ok?

diff 6b6ae798ff6f768ecbbaeb20b6d4697f605e06bc /home/stsp/src/got
blob - 3b8efe373216c218b36b32818d61e5a06d485b73
file + lib/fetch.c
--- lib/fetch.c
+++ lib/fetch.c
@@ -298,6 +298,9 @@ got_fetch_pack(struct got_object_id **pack_hash, struc
 		    &packfile_size_cur, (*pack_hash)->sha1, &fetchibuf);
 		if (err != NULL)
 			goto done;
+		/* Don't report size progress for an empty pack file. */
+		if (packfile_size_cur <= ssizeof(pack_hdr) + SHA1_DIGEST_LENGTH)
+			packfile_size_cur = 0;
 		if (!done && refname && id) {
 			err = got_pathlist_insert(NULL, refs, refname, id);
 			if (err)
@@ -404,8 +407,11 @@ got_fetch_pack(struct got_object_id **pack_hash, struc
 	 * If the pack file contains no objects, we may only need to update
 	 * references in our repository. The caller will take care of that.
 	 */
-	if (nobj == 0)
+	if (nobj == 0) {
+		free(*pack_hash);
+		*pack_hash = NULL;
 		goto done;
+	}
 
 	if (lseek(packfd, 0, SEEK_SET) == -1) {
 		err = got_error_from_errno("lseek");
blob - 96308310d28fd9186f083ce3c0028cceb9cd61b5
file + libexec/got-fetch-pack/got-fetch-pack.c
--- libexec/got-fetch-pack/got-fetch-pack.c
+++ libexec/got-fetch-pack/got-fetch-pack.c
@@ -57,7 +57,6 @@
 
 struct got_object *indexed;
 static int chattygot;
-static struct got_object_id zhash = {.sha1={0}};
 
 static const struct got_capability got_capabilities[] = {
 	{ GOT_CAPA_AGENT, "got/" GOT_VERSION_STR },
@@ -516,10 +515,9 @@ fetch_pack(int fd, int packfd, uint8_t *pack_sha1,
 	if (nwant == 0)
 		goto done;
 
-	for (i = 0; i < nref; i++) {
-		if (got_object_id_cmp(&have[i], &zhash) == 0)
-			continue;
-		got_sha1_digest_to_str(have[i].sha1, hashstr, sizeof(hashstr));
+	TAILQ_FOREACH(pe, have_refs, entry) {
+		struct got_object_id *id = pe->data;
+		got_sha1_digest_to_str(id->sha1, hashstr, sizeof(hashstr));
 		n = snprintf(buf, sizeof(buf), "have %s\n", hashstr);
 		if (n >= sizeof(buf)) {
 			err = got_error(GOT_ERR_NO_SPACE);