"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Omar Polo <op@omarpolo.com>
Subject:
first draft of gotadmin load
To:
gameoftrees@openbsd.org
Date:
Sun, 09 Jul 2023 13:55:57 +0200

Download raw body.

Thread
here's a first draft of the load command.  it is intended to be the
counterpart of `gotadmin dump' and, just like dump, it's intended to
handle fast-export stream in the future but at the moment only handles
git bundles.

there's some code lifted from got.c and fetch.c since a git bundle is
just a packfile with a plaintext header, so `gotadmin load' is very
similar to fetch.

There are still a few questions open in this initial design:

 - is fine for the default behaviour to load all the refs?  should I
   add -a instead to load everything and make -a or -b mandatory?

 - I'd like to verify the pack file more.  I can for instance image a
   case where the header of the pack advertises an object that's not
   available in the pack.  We should catch this case.

 - It should only fast-forward the references by default?  at the
   moment it happily overwrites the (selected) references with what
   the dump advertizes.

 - I'm doing the parsing of the bundle header in the main process.
   This is usually a no-go, but the header is very simple and adding a
   libexec only for it felt overkill.  Will write one however if we
   want to stick to 'no parsing in the main process'.

I'm happy to address these and other concers before committing, but
since it's quite long already working on it in-tree is also an option.

After this, I'm planning to take a short break from dump/load and
improve cleanup so that packfiles with unreachable objects can be
deleted.  This is quite important since one can load a huge dump but
only use it for a reference or two, and gotadmin cleanup won't notice
that some packs are redundant since it only looks for unique objects.


diffstat refs/heads/main refs/heads/load
 M  gotadmin/Makefile              |    1+  1-
 M  gotadmin/gotadmin.1            |   42+  0-
 M  gotadmin/gotadmin.c            |  294+  0-
 M  include/got_error.h            |    1+  0-
 A  include/got_repository_load.h  |   26+  0-
 M  lib/error.c                    |    1+  0-
 A  lib/load.c                     |  532+  0-
 M  regress/cmdline/Makefile       |    4+  1-
 A  regress/cmdline/load.sh        |  130+  0-

9 files changed, 1031 insertions(+), 2 deletions(-)

diff refs/heads/main refs/heads/load
commit - a3287e9971f6990ab19426fccf0b41a9b6bc4b68
commit + 30d0e2f62d7512579a7503ae85c0c3757fd01b92
blob - 80e118616d09947850bdc2b12829cac8eb2c8b3f
blob + 4485f4876de9c6581701463287d7bb44c2dec812
--- gotadmin/Makefile
+++ gotadmin/Makefile
@@ -12,7 +12,7 @@ SRCS=		gotadmin.c \
 		sigs.c buf.c date.c object_open_privsep.c \
 		read_gitconfig_privsep.c read_gotconfig_privsep.c \
 		pack_create_privsep.c pollfd.c reference_parse.c object_qid.c \
-		dump.c
+		dump.c load.c
 MAN =		${PROG}.1
 
 CPPFLAGS = -I${.CURDIR}/../include -I${.CURDIR}/../lib
blob - f19fef0cdaa1a53fa46ff731dbd91266b8419ab5
blob + 9324fb22df1a2d3a0fe90f3db35a40004c1bad82
--- gotadmin/gotadmin.1
+++ gotadmin/gotadmin.1
@@ -385,7 +385,49 @@ be excluded.
 If a reference appears in both the included and excluded lists, it will
 be excluded.
 .El
+.It Xo
+.Cm load
+.Op Fl lnq
+.Op Fl b Ar reference
+.Op Fl r Ar repository-path
+.Op Ar file
+.Xc
+Read a repository dump stream from standard input or
+.Ar file .
+.Pp
+The options for
+.Cm gotadmin dump
+are as follows:
+.Bl -tag -width Ds
+.It Fl b Ar reference
+Load only the specified
+.Ar reference
+from the dump.
+This option may be specified multiple times to build a list of
+reference to load.
+If not provided, all the reference will be loaded.
+.It Fl l
+List references available for loading from the dump and exit
+immediately.
+Cannot be used together with
+.Fl b
+and
+.Fl n .
+.It Fl n
+Attempt to load the dump but don't install new packfile or update any
+reference.
+Can be used to verify the integrity of the dump.
+.It Fl q
+Suppress progress reporting output.
+.It Fl r Ar repository-path
+Use the repository at the specified path.
+If not specified, assume the repository is located at or above the
+current working directory.
+If this directory is a
+.Xr got 1
+work tree, use the repository path associated with this work tree.
 .El
+.El
 .Sh EXIT STATUS
 .Ex -std gotadmin
 .Sh SEE ALSO
blob - 78fd2d69dfbdea474b964eb51ee58209f1d0734d
blob + 0069679a7240714805fbec1d30f1878010083bea
--- gotadmin/gotadmin.c
+++ gotadmin/gotadmin.c
@@ -40,6 +40,7 @@
 #include "got_repository.h"
 #include "got_repository_admin.h"
 #include "got_repository_dump.h"
+#include "got_repository_load.h"
 #include "got_gotconfig.h"
 #include "got_path.h"
 #include "got_privsep.h"
@@ -88,6 +89,7 @@ __dead static void	usage_dump(void);
 __dead static void	usage_listpack(void);
 __dead static void	usage_cleanup(void);
 __dead static void	usage_dump(void);
+__dead static void	usage_load(void);
 
 static const struct got_error*		cmd_init(int, char *[]);
 static const struct got_error*		cmd_info(int, char *[]);
@@ -96,6 +98,7 @@ static const struct got_error*		cmd_dump(int, char *[]
 static const struct got_error*		cmd_listpack(int, char *[]);
 static const struct got_error*		cmd_cleanup(int, char *[]);
 static const struct got_error*		cmd_dump(int, char *[]);
+static const struct got_error*		cmd_load(int, char *[]);
 
 static const struct gotadmin_cmd gotadmin_commands[] = {
 	{ "init",	cmd_init,	usage_init,	"" },
@@ -105,6 +108,7 @@ static const struct gotadmin_cmd gotadmin_commands[] =
 	{ "listpack",	cmd_listpack,	usage_listpack,	"ls" },
 	{ "cleanup",	cmd_cleanup,	usage_cleanup,	"cl" },
 	{ "dump",	cmd_dump,	usage_dump,	"" },
+	{ "load",	cmd_load,	usage_load,	"" },
 };
 
 static void
@@ -1543,3 +1547,293 @@ cmd_dump(int argc, char *argv[])
 
 	return error;
 }
+
+__dead static void
+usage_load(void)
+{
+	fprintf(stderr, "usage: %s load [-lnq] [-b reference] "
+	    "[-r repository-path] [file]\n",
+	    getprogname());
+	exit(1);
+}
+
+static const struct got_error *
+load_progress(void *arg, off_t packfile_size, int nobj_total,
+    int nobj_indexed, int nobj_loose, int nobj_resolved)
+{
+	return pack_index_progress(arg, packfile_size, nobj_total,
+	    nobj_indexed, nobj_loose, nobj_resolved);
+}
+
+static int
+is_wanted_ref(struct got_pathlist_head *wanted, const char *ref)
+{
+	struct got_pathlist_entry *pe;
+
+	if (TAILQ_EMPTY(wanted))
+		return 1;
+
+	TAILQ_FOREACH(pe, wanted, entry) {
+		if (strcmp(pe->path, ref) == 0)
+			return 1;
+	}
+
+	return 0;
+}
+
+static const struct got_error *
+create_ref(const char *refname, struct got_object_id *id,
+    int verbosity, struct got_repository *repo)
+{
+	const struct got_error *err = NULL;
+	struct got_reference *ref;
+	char *id_str;
+
+	err = got_object_id_str(&id_str, id);
+	if (err)
+		return err;
+
+	err = got_ref_alloc(&ref, refname, id);
+	if (err)
+		goto done;
+
+	err = got_ref_write(ref, repo);
+	got_ref_close(ref);
+
+	if (err == NULL && verbosity >= 0)
+		printf("Created reference %s: %s\n", refname, id_str);
+done:
+	free(id_str);
+	return err;
+}
+
+static const struct got_error *
+update_ref(struct got_reference *ref, struct got_object_id *new_id,
+    int replace_tags, int verbosity, struct got_repository *repo)
+{
+	const struct got_error *err = NULL;
+	char *new_id_str = NULL;
+	struct got_object_id *old_id = NULL;
+
+	err = got_object_id_str(&new_id_str, new_id);
+	if (err)
+		goto done;
+
+	if (!replace_tags &&
+	    strncmp(got_ref_get_name(ref), "refs/tags/", 10) == 0) {
+		err = got_ref_resolve(&old_id, repo, ref);
+		if (err)
+			goto done;
+		if (got_object_id_cmp(old_id, new_id) == 0)
+			goto done;
+		if (verbosity >= 0) {
+			printf("Rejecting update of existing tag %s: %s\n",
+			    got_ref_get_name(ref), new_id_str);
+		}
+		goto done;
+	}
+
+	if (got_ref_is_symbolic(ref)) {
+		if (verbosity >= 0) {
+			printf("Replacing reference %s: %s\n",
+			    got_ref_get_name(ref),
+			    got_ref_get_symref_target(ref));
+		}
+		err = got_ref_change_symref_to_ref(ref, new_id);
+		if (err)
+			goto done;
+		err = got_ref_write(ref, repo);
+		if (err)
+			goto done;
+	} else {
+		err = got_ref_resolve(&old_id, repo, ref);
+		if (err)
+			goto done;
+		if (got_object_id_cmp(old_id, new_id) == 0)
+			goto done;
+
+		err = got_ref_change_ref(ref, new_id);
+		if (err)
+			goto done;
+		err = got_ref_write(ref, repo);
+		if (err)
+			goto done;
+	}
+
+	if (verbosity >= 0)
+		printf("Updated %s: %s\n", got_ref_get_name(ref),
+		    new_id_str);
+done:
+	free(old_id);
+	free(new_id_str);
+	return err;
+}
+
+static const struct got_error *
+cmd_load(int argc, char *argv[])
+{
+	const struct got_error *error = NULL;
+	struct got_repository *repo = NULL;
+	struct got_pathlist_head include_args;
+	struct got_pathlist_head available_refs;
+	struct got_pathlist_entry *pe;
+	struct got_pack_progress_arg ppa;
+	FILE *in = stdin;
+	int *pack_fds = NULL;
+	char *repo_path = NULL;
+	int list_refs_only = 0;
+	int noop = 0;
+	int verbosity = 0;
+	int ch;
+
+	TAILQ_INIT(&include_args);
+	TAILQ_INIT(&available_refs);
+
+#ifndef PROFILE
+	if (pledge("stdio rpath wpath cpath fattr flock proc exec "
+	    "sendfd unveil", NULL) == -1)
+		err(1, "pledge");
+#endif
+
+	while ((ch = getopt(argc, argv, "b:lnqr:")) != -1) {
+		switch (ch) {
+		case 'b':
+			error = got_pathlist_append(&include_args,
+			    optarg, NULL);
+			if (error)
+				return error;
+			break;
+		case 'l':
+			list_refs_only = 1;
+			break;
+		case 'n':
+			noop = 1;
+			break;
+		case 'q':
+			verbosity = -1;
+			break;
+		case 'r':
+			repo_path = realpath(optarg, NULL);
+			if (repo_path == NULL)
+				return got_error_from_errno2("realpath",
+				    optarg);
+			got_path_strip_trailing_slashes(repo_path);
+			break;
+		default:
+			usage_load();
+			/* NOTREACHED */
+		}
+	}
+	argc -= optind;
+	argv += optind;
+
+	if (list_refs_only && !TAILQ_EMPTY(&include_args))
+		errx(1, "-b and -l are mutually exclusive");
+
+	if (list_refs_only && noop)
+		errx(1, "-n and -l are mutually exclusive");
+
+	if (argc > 1)
+		usage_load();
+	if (argc == 1) {
+		in = fopen(argv[0], "re");
+		if (in == NULL)
+			return got_error_from_errno2("open", argv[0]);
+	}
+
+	if (repo_path == NULL) {
+		error = get_repo_path(&repo_path);
+		if (error)
+			goto done;
+	}
+	error = got_repo_pack_fds_open(&pack_fds);
+	if (error != NULL)
+		goto done;
+	error = got_repo_open(&repo, repo_path, NULL, pack_fds);
+	if (error)
+		goto done;
+
+	error = apply_unveil(got_repo_get_path_git_dir(repo), 0);
+	if (error)
+		goto done;
+
+	memset(&ppa, 0, sizeof(ppa));
+	ppa.out = stdout;
+	ppa.verbosity = verbosity;
+
+	error = got_repo_load(in, &available_refs, repo, list_refs_only, noop,
+	    load_progress, &ppa, check_cancelled, NULL);
+	if (verbosity >= 0)    /* XXX printed_something is always zero */
+		printf("\n");
+	if (error)
+		goto done;
+
+	if (list_refs_only) {
+		TAILQ_FOREACH(pe, &available_refs, entry) {
+			const char *refname = pe->path;
+			struct got_object_id *id = pe->data;
+			char *idstr;
+
+			error = got_object_id_str(&idstr, id);
+			if (error)
+				goto done;
+
+			printf("%s: %s\n", refname, idstr);
+			free(idstr);
+		}
+		goto done;
+	}
+
+	if (noop)
+		goto done;
+
+	/* Update references */
+	TAILQ_FOREACH(pe, &available_refs, entry) {
+		const struct got_error *unlock_err;
+		struct got_reference *ref;
+		const char *refname = pe->path;
+		struct got_object_id *id = pe->data;
+
+		if (!is_wanted_ref(&include_args, pe->path))
+			continue;
+
+		error = got_ref_open(&ref, repo, refname, 1);
+		if (error) {
+			if (error->code != GOT_ERR_NOT_REF)
+				goto done;
+			error = create_ref(refname, id, verbosity, repo);
+			if (error)
+				goto done;
+		} else {
+			/* XXX: check advances only and add -f to force? */
+			error = update_ref(ref, id, 1, verbosity, repo);
+			unlock_err = got_ref_unlock(ref);
+			if (unlock_err && error == NULL)
+				error = unlock_err;
+			got_ref_close(ref);
+			if (error)
+				goto done;
+		}
+	}
+
+ done:
+	if (in != stdin && fclose(in) == EOF && error == NULL)
+		error = got_error_from_errno("fclose");
+
+	if (repo)
+		got_repo_close(repo);
+
+	if (pack_fds) {
+		const struct got_error *pack_err;
+
+		pack_err = got_repo_pack_fds_close(pack_fds);
+		if (error == NULL)
+			error = pack_err;
+	}
+
+	got_pathlist_free(&include_args, GOT_PATHLIST_FREE_NONE);
+	got_pathlist_free(&available_refs, GOT_PATHLIST_FREE_ALL);
+	free(repo_path);
+
+	return error;
+}
blob - 6ca6d041c1b17b5198707cc33b19f9fefccf07e0
blob + 105744fb64ab3e71162d690183b908dcd64cc259
--- include/got_error.h
+++ include/got_error.h
@@ -185,6 +185,7 @@
 #define GOT_ERR_GID		168
 #define GOT_ERR_NO_PROG		169
 #define GOT_ERR_MERGE_COMMIT_OUT_OF_DATE 170
+#define GOT_ERR_BUNDLE_FORMAT 171
 
 struct got_error {
         int code;
blob - /dev/null
blob + a86f846bc1419f0944d15233239fca593d6526dc (mode 644)
--- /dev/null
+++ include/got_repository_load.h
@@ -0,0 +1,26 @@
+/*
+ * Copyright (c) 2023 Omar Polo <op@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+/* A callback function which gets invoked with progress information. */
+typedef const struct got_error *(*got_load_progress_cb)(void *, off_t,
+    int, int, int, int);
+
+/*
+ * Load a bundle in the repository.
+ */
+const struct got_error *
+got_repo_load(FILE *, struct got_pathlist_head *, struct got_repository *,
+    int, int, got_load_progress_cb, void *, got_cancel_cb, void *);
blob - 14daf3a57c9be57d4f707ce25da1a357e09ada88
blob + 9be6fcbe0d2e6af91f5c95d80f653bc3f359d729
--- lib/error.c
+++ lib/error.c
@@ -235,6 +235,7 @@ static const struct got_error got_errors[] = {
 	{ GOT_ERR_MERGE_COMMIT_OUT_OF_DATE, "merging cannot proceed because "
 	    "the work tree is no longer up-to-date; merge must be aborted "
 	    "and retried" },
+	{ GOT_ERR_BUNDLE_FORMAT, "unknown git bundle version" },
 };
 
 static struct got_custom_error {
blob - /dev/null
blob + 544f59bad76e1479b2badc43c1532f06d70db9d9 (mode 644)
--- /dev/null
+++ lib/load.c
@@ -0,0 +1,532 @@
+/*
+ * Copyright (c) 2023 Omar Polo <op@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include <sys/queue.h>
+#include <sys/socket.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <sys/tree.h>
+#include <sys/types.h>
+#include <sys/uio.h>
+#include <sys/wait.h>
+
+#include <endian.h>
+#include <limits.h>
+#include <sha1.h>
+#include <sha2.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+#include <imsg.h>
+
+#include "got_error.h"
+#include "got_cancel.h"
+#include "got_object.h"
+#include "got_opentemp.h"
+#include "got_path.h"
+#include "got_reference.h"
+#include "got_repository.h"
+#include "got_repository_load.h"
+
+#include "got_lib_delta.h"
+#include "got_lib_hash.h"
+#include "got_lib_object.h"
+#include "got_lib_object_cache.h"
+#include "got_lib_pack.h"
+#include "got_lib_ratelimit.h"
+#include "got_lib_repository.h"
+#include "got_lib_privsep.h"
+
+#define GIT_BUNDLE_SIGNATURE_V2 "# v2 git bundle\n"
+
+#ifndef nitems
+#define nitems(_a)	(sizeof((_a)) / sizeof((_a)[0]))
+#endif
+
+#ifndef ssizeof
+#define ssizeof(_x) ((ssize_t)(sizeof(_x)))
+#endif
+
+static const struct got_error *
+temp_file(int *fd, char **path, const char *ext, struct got_repository *repo)
+{
+	const struct got_error *err;
+	char p[PATH_MAX];
+	int r;
+
+	*path = NULL;
+
+	r = snprintf(p, sizeof(p), "%s/%s/loading",
+	    got_repo_get_path_git_dir(repo), GOT_OBJECTS_PACK_DIR);
+	if (r < 0 || (size_t)r >= sizeof(p))
+		return got_error_from_errno("snprintf");
+
+	err = got_opentemp_named_fd(path, fd, p, ext);
+	if (err)
+		return err;
+
+	if (fchmod(*fd, GOT_DEFAULT_FILE_MODE) == -1)
+		return got_error_from_errno("fchmod");
+
+	return NULL;
+}
+
+static const struct got_error *
+load_report_progress(got_load_progress_cb progress_cb, void *progress_arg,
+    struct got_ratelimit *rl, off_t packsiz, int nobj_total,
+    int nobj_indexed, int nobj_loose, int nobj_resolved)
+{
+	const struct got_error *err;
+	int elapsed;
+
+	if (progress_cb == NULL)
+		return NULL;
+
+	err = got_ratelimit_check(&elapsed, rl);
+	if (err || !elapsed)
+		return err;
+
+	return progress_cb(progress_arg, packsiz, nobj_total, nobj_indexed,
+	    nobj_loose, nobj_resolved);
+}
+
+static const struct got_error *
+copypack(FILE *in, int outfd, off_t *tot,
+    struct got_object_id *id, struct got_ratelimit *rl,
+    got_load_progress_cb progress_cb, void *progress_arg,
+    got_cancel_cb cancel_cb, void *cancel_arg)
+{
+	const struct got_error *err;
+	struct got_hash hash;
+	struct got_object_id expected_id;
+	char buf[BUFSIZ], sha1buf[SHA1_DIGEST_LENGTH];
+	size_t r, sha1buflen = 0;
+
+	*tot = 0;
+	got_hash_init(&hash, GOT_HASH_SHA1);
+
+	for (;;) {
+		err = cancel_cb(cancel_arg);
+		if (err)
+			return err;
+
+		r = fread(buf, 1, sizeof(buf), in);
+		if (r == 0)
+			break;
+
+		/*
+		 * An expected SHA1 checksum sits at the end of the
+		 * pack file.  Since we don't know the file size ahead
+		 * of time we have to keep SHA1_DIGEST_LENGTH bytes
+		 * buffered and avoid mixing those bytes int our hash
+		 * computation until we know for sure that additional
+		 * pack file data bytes follow.
+		 *
+		 * We can assume that BUFSIZE is greater than
+		 * SHA1_DIGEST_LENGTH and that a short read means that
+		 * we've reached EOF.
+		 */
+
+		if (r >= sizeof(sha1buf)) {
+			*tot += sha1buflen;
+			got_hash_update(&hash, sha1buf, sha1buflen);
+			if (write(outfd, sha1buf, sha1buflen) == -1)
+				return got_error_from_errno("write");
+
+			r -= sizeof(sha1buf);
+			memcpy(sha1buf, &buf[r], sizeof(sha1buf));
+			sha1buflen = sizeof(sha1buf);
+
+			*tot += r;
+			got_hash_update(&hash, buf, r);
+			if (write(outfd, buf, r) == -1)
+				return got_error_from_errno("write");
+
+			err = load_report_progress(progress_cb, progress_arg,
+			    rl, *tot, 0, 0, 0, 0);
+			if (err)
+				return err;
+
+			continue;
+		}
+
+		if (sha1buflen == 0)
+			return got_error(GOT_ERR_BAD_PACKFILE);
+
+		/* short read, we've reached EOF */
+		*tot += r;
+		got_hash_update(&hash, sha1buf, r);
+		if (write(outfd, sha1buf, r) == -1)
+			return got_error_from_errno("write");
+
+		memmove(&sha1buf[0], &sha1buf[r], sizeof(sha1buf) - r);
+		memcpy(&sha1buf[sizeof(sha1buf) - r], buf, r);
+		break;
+	}
+
+	if (sha1buflen == 0)
+		return got_error(GOT_ERR_BAD_PACKFILE);
+
+	got_hash_final_object_id(&hash, id);
+
+	/* XXX SHA256 */
+	memset(&expected_id, 0, sizeof(expected_id));
+	memcpy(&expected_id.sha1, sha1buf, sizeof(expected_id.sha1));
+
+	if (got_object_id_cmp(id, &expected_id) != 0)
+		return got_error(GOT_ERR_PACKIDX_CSUM);
+
+	/* re-add the expected hash at the end of the pack */
+	if (write(outfd, sha1buf, sizeof(sha1buf)) == -1)
+		return got_error_from_errno("write");
+
+	*tot += sizeof(sha1buf);
+	err = progress_cb(progress_arg, *tot, 0, 0, 0, 0);
+	if (err)
+		return err;
+
+	return NULL;
+}
+
+const struct got_error *
+got_repo_load(FILE *in, struct got_pathlist_head *refs_found,
+    struct got_repository *repo, int list_refs_only, int noop,
+    got_load_progress_cb progress_cb, void *progress_arg,
+    got_cancel_cb cancel_cb, void *cancel_arg)
+{
+	const struct got_error *err = NULL;
+	struct got_object_id id;
+	struct got_object *obj;
+	struct got_packfile_hdr pack_hdr;
+	struct got_ratelimit rl;
+	struct imsgbuf idxibuf;
+	const char *repo_path;
+	char *packpath = NULL, *idxpath = NULL;
+	char *tmppackpath = NULL, *tmpidxpath = NULL;
+	int packfd = -1, idxfd = -1;
+	char *spc, *refname, *id_str = NULL;
+	char *line = NULL;
+	size_t linesize = 0;
+	ssize_t linelen;
+	size_t i;
+	ssize_t n;
+	off_t packsiz;
+	int tmpfds[3] = {-1, -1, -1};
+	int imsg_idxfds[2] = {-1, -1};
+	int ch, done, nobj, idxstatus;
+	pid_t idxpid;
+
+	got_ratelimit_init(&rl, 0, 500);
+
+	repo_path = got_repo_get_path_git_dir(repo);
+
+	linelen = getline(&line, &linesize, in);
+	if (linelen == -1) {
+		err = got_ferror(in, GOT_ERR_IO);
+		goto done;
+	}
+
+	if (strcmp(line, GIT_BUNDLE_SIGNATURE_V2) != 0) {
+		err = got_error(GOT_ERR_BUNDLE_FORMAT);
+		goto done;
+	}
+
+	/* Parse the prerequisite */
+	for (;;) {
+		ch = fgetc(in);
+		if (ch != '-') {
+			if (ch != EOF)
+				ungetc(ch, in);
+			break;
+		}
+
+		linelen = getline(&line, &linesize, in);
+		if (linelen == -1) {
+			err = got_ferror(in, GOT_ERR_IO);
+			goto done;
+		}
+
+		if (line[linelen - 1] == '\n')
+			line[linelen - 1] = '\0';
+
+		if (!got_parse_object_id(&id, line, GOT_HASH_SHA1)) {
+			err = got_error_path(line, GOT_ERR_BAD_OBJ_ID_STR);
+			goto done;
+		}
+
+		err = got_object_open(&obj, repo, &id);
+		if (err)
+			goto done;
+		got_object_close(obj);
+	}
+
+	/* Read references */
+	for (;;) {
+		struct got_object_id *id;
+		char *dup;
+
+		linelen = getline(&line, &linesize, in);
+		if (linelen == -1) {
+			err = got_ferror(in, GOT_ERR_IO);
+			goto done;
+		}
+		if (line[linelen - 1] == '\n')
+			line[linelen - 1] = '\0';
+		if (*line == '\0')
+			break;
+
+		spc = strchr(line, ' ');
+		if (spc == NULL) {
+			err = got_error(GOT_ERR_IO);
+			goto done;
+		}
+		*spc = '\0';
+
+		refname = spc + 1;
+		if (!got_ref_name_is_valid(refname)) {
+			err = got_error(GOT_ERR_BAD_REF_DATA);
+			goto done;
+		}
+
+		id = malloc(sizeof(*id));
+		if (id == NULL) {
+			err = got_error_from_errno("malloc");
+			goto done;
+		}
+
+		if (!got_parse_object_id(id, line, GOT_HASH_SHA1)) {
+			free(id);
+			err = got_error(GOT_ERR_BAD_OBJ_ID_STR);
+			goto done;
+		}
+
+		dup = strdup(refname);
+		if (dup == NULL) {
+			free(id);
+			err = got_error_from_errno("strdup");
+			goto done;
+		}
+
+		err = got_pathlist_append(refs_found, dup, id);
+		if (err) {
+			free(id);
+			free(dup);
+			goto done;
+		}
+	}
+
+	if (list_refs_only)
+		goto done;
+
+	err = temp_file(&packfd, &tmppackpath, ".pack", repo);
+	if (err)
+		goto done;
+
+	err = temp_file(&idxfd, &tmpidxpath, ".idx", repo);
+	if (err)
+		goto done;
+
+	err = copypack(in, packfd, &packsiz, &id, &rl,
+	    progress_cb, progress_arg, cancel_cb, cancel_arg);
+	if (err)
+		goto done;
+
+	if (lseek(packfd, 0, SEEK_SET) == -1) {
+		err = got_error_from_errno("lseek");
+		goto done;
+	}
+
+	/* Safety checks on the pack' content. */
+	if (packsiz <= ssizeof(pack_hdr) + SHA1_DIGEST_LENGTH) {
+		err = got_error_msg(GOT_ERR_BAD_PACKFILE, "short pack file");
+		goto done;
+	}
+
+	n = read(packfd, &pack_hdr, ssizeof(pack_hdr));
+	if (n == -1) {
+		err = got_error_from_errno("read");
+		goto done;
+	}
+	if (n != ssizeof(pack_hdr)) {
+		err = got_error(GOT_ERR_IO);
+		goto done;
+	}
+	if (pack_hdr.signature != htobe32(GOT_PACKFILE_SIGNATURE)) {
+		err = got_error_msg(GOT_ERR_BAD_PACKFILE,
+		    "bad pack file signature");
+		goto done;
+	}
+	if (pack_hdr.version != htobe32(GOT_PACKFILE_VERSION)) {
+		err = got_error_msg(GOT_ERR_BAD_PACKFILE,
+		    "bad pack file version");
+		goto done;
+	}
+	nobj = be32toh(pack_hdr.nobjects);
+	if (nobj == 0 &&
+	    packsiz > ssizeof(pack_hdr) + SHA1_DIGEST_LENGTH) {
+		err = got_error_msg(GOT_ERR_BAD_PACKFILE,
+		    "bad pack file with zero objects");
+		goto done;
+	}
+	if (nobj != 0 &&
+	    packsiz <= ssizeof(pack_hdr) + SHA1_DIGEST_LENGTH) {
+		err = got_error_msg(GOT_ERR_BAD_PACKFILE,
+		    "empty pack file with non-zero object count");
+		goto done;
+	}
+
+	/* nothing to do if there are no objects. */
+	if (nobj == 0)
+		goto done;
+
+	for (i = 0; i < nitems(tmpfds); i++) {
+		tmpfds[i] = got_opentempfd();
+		if (tmpfds[i] == -1) {
+			err = got_error_from_errno("got_opentempfd");
+			goto done;
+		}
+	}
+
+	if (lseek(packfd, 0, SEEK_SET) == -1) {
+		err = got_error_from_errno("lseek");
+		goto done;
+	}
+
+	if (socketpair(AF_UNIX, SOCK_STREAM, PF_UNSPEC, imsg_idxfds) == -1) {
+		err = got_error_from_errno("socketpair");
+		goto done;
+	}
+	idxpid = fork();
+	if (idxpid == -1) {
+		err= got_error_from_errno("fork");
+		goto done;
+	} else if (idxpid == 0)
+		got_privsep_exec_child(imsg_idxfds,
+		    GOT_PATH_PROG_INDEX_PACK, tmppackpath);
+	if (close(imsg_idxfds[1]) == -1) {
+		err = got_error_from_errno("close");
+		goto done;
+	}
+	imsg_idxfds[1] = -1;
+	imsg_init(&idxibuf, imsg_idxfds[0]);
+
+	err = got_privsep_send_index_pack_req(&idxibuf, id.sha1, packfd);
+	if (err)
+		goto done;
+	packfd = -1;
+
+	err = got_privsep_send_index_pack_outfd(&idxibuf, idxfd);
+	if (err)
+		goto done;
+	idxfd = -1;
+
+	for (i = 0; i < nitems(tmpfds); i++) {
+		err = got_privsep_send_tmpfd(&idxibuf, tmpfds[i]);
+		if (err != NULL)
+			goto done;
+		tmpfds[i] = -1;
+	}
+
+	done = 0;
+	while (!done) {
+		int nobj_total, nobj_indexed, nobj_loose, nobj_resolved;
+
+		err = got_privsep_recv_index_progress(&done, &nobj_total,
+		    &nobj_indexed, &nobj_loose, &nobj_resolved, &idxibuf);
+		if (err)
+			goto done;
+		if (nobj_indexed != 0) {
+			err = load_report_progress(progress_cb, progress_arg,
+			    &rl, packsiz, nobj_total, nobj_indexed,
+			    nobj_loose, nobj_resolved);
+			if (err)
+				goto done;
+		}
+	}
+	if (close(imsg_idxfds[0]) == -1) {
+		err = got_error_from_errno("close");
+		goto done;
+	}
+	imsg_idxfds[0] = -1;
+	if (waitpid(idxpid, &idxstatus, 0) == -1) {
+		err = got_error_from_errno("waitpid");
+		goto done;
+	}
+
+	if (noop)
+		goto done;
+
+	err = got_object_id_str(&id_str, &id);
+	if (err)
+		goto done;
+
+	if (asprintf(&packpath, "%s/%s/pack-%s.pack", repo_path,
+	    GOT_OBJECTS_PACK_DIR, id_str) == -1) {
+		err = got_error_from_errno("asprintf");
+		goto done;
+	}
+
+	if (asprintf(&idxpath, "%s/%s/pack-%s.idx", repo_path,
+	    GOT_OBJECTS_PACK_DIR, id_str) == -1) {
+		err = got_error_from_errno("asprintf");
+		goto done;
+	}
+
+	if (rename(tmppackpath, packpath) == -1) {
+		err = got_error_from_errno3("rename", tmppackpath, packpath);
+		goto done;
+	}
+	free(tmppackpath);
+	tmppackpath = NULL;
+
+	if (rename(tmpidxpath, idxpath) == -1) {
+		err = got_error_from_errno3("rename", tmpidxpath, idxpath);
+		goto done;
+	}
+	free(tmpidxpath);
+	tmpidxpath = NULL;
+
+ done:
+	free(line);
+	free(packpath);
+	free(idxpath);
+	free(id_str);
+
+	if (tmppackpath && unlink(tmppackpath) == -1 && err == NULL)
+		err = got_error_from_errno2("unlink", tmppackpath);
+	if (packfd != -1 && close(packfd) == -1 && err == NULL)
+		err = got_error_from_errno("close");
+	free(tmppackpath);
+
+	if (tmpidxpath && unlink(tmpidxpath) == -1 && err == NULL)
+		err = got_error_from_errno2("unlink", tmpidxpath);
+	if (idxfd != -1 && close(idxfd) == -1 && err == NULL)
+		err = got_error_from_errno("close");
+	free(tmpidxpath);
+
+	if (imsg_idxfds[0] != -1 && close(imsg_idxfds[0]) == -1 && err == NULL)
+		err = got_error_from_errno("close");
+	if (imsg_idxfds[1] != -1 && close(imsg_idxfds[1]) == -1 && err == NULL)
+		err = got_error_from_errno("close");
+
+	for (i = 0; i < nitems(tmpfds); ++i)
+		if (tmpfds[i] != -1 && close(tmpfds[i]) == -1 && err == NULL)
+			err = got_error_from_errno("close");
+
+	return err;
+}
blob - 0b66aa7ed80c3e660be33ffdb8bb00ac61dcacd8
blob + 8625a7dfe8434206263677cae6d47f88935f4a43
--- regress/cmdline/Makefile
+++ regress/cmdline/Makefile
@@ -1,7 +1,7 @@
 REGRESS_TARGETS=checkout update status log add rm diff blame branch tag \
 	ref commit revert cherrypick backout rebase init import histedit \
 	integrate merge stage unstage cat clone fetch send tree patch pack \
-	cleanup dump
+	cleanup dump load
 NOOBJ=Yes
 
 GOT_TEST_ROOT=/tmp
@@ -102,4 +102,7 @@ dump:
 dump:
 	./dump.sh -q -r "$(GOT_TEST_ROOT)"
 
+load:
+	./load.sh -q -r "$(GOT_TEST_ROOT)"
+
 .include <bsd.regress.mk>
blob - /dev/null
blob + 7b7bbe844754e7af7ee9a6c3e88f168a902dbafe (mode 755)
--- /dev/null
+++ regress/cmdline/load.sh
@@ -0,0 +1,130 @@
+#!/bin/sh
+#
+# Copyright (c) 2023 Omar Polo <op@openbsd.org>
+#
+# Permission to use, copy, modify, and distribute this software for any
+# purpose with or without fee is hereby granted, provided that the above
+# copyright notice and this permission notice appear in all copies.
+#
+# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+
+. ./common.sh
+
+test_load_bundle() {
+	local testroot=`test_init test_load_bundle`
+
+	# generate a bundle with all the history of the repository
+	(cd "$testroot/repo" && git bundle create -q "$testroot/bundle" master)
+
+	# then load it in an empty repository
+	(cd "$testroot/" && gotadmin init -b master repo2) >/dev/null
+	(cd "$testroot/repo2" && gotadmin load "$testroot/bundle") \
+		>/dev/null
+	if [ $? -ne 0 ]; then
+		echo "failed to load the bundle" >&2
+		test_done "$testroot" 1
+		return 1
+	fi
+
+	(cd "$testroot/repo"  && got log -p >$testroot/repo.log)
+	(cd "$testroot/repo2" && got log -p >$testroot/repo2.log)
+	if ! cmp -s $testroot/repo.log $testroot/repo2.log; then
+		diff -u $testroot/repo.log $testroot/repo2.log
+		test_done "$testroot" 1
+		return 1
+	fi
+
+	base=$(git_show_head "$testroot/repo")
+
+	echo "modified alpha in master" >$testroot/repo/alpha
+	git_commit "$testroot/repo" -m "edit alpha in master"
+
+	(cd "$testroot/repo" && git bundle create -q \
+		"$testroot/bundle" "$base..master")
+
+	(cd "$testroot/repo2" && gotadmin load "$testroot/bundle") >/dev/null
+	if [ $? -ne 0 ]; then
+		echo "failed to load incremental bundle" >&2
+		test_done "$testroot" 1
+		return 1
+	fi
+
+	(cd "$testroot/repo"  && got log -p >$testroot/repo.log)
+	(cd "$testroot/repo2" && got log -p >$testroot/repo2.log)
+	if ! cmp -s $testroot/repo.log $testroot/repo2.log; then
+		diff -u $testroot/repo.log $testroot/repo2.log
+		test_done "$testroot" 1
+		return 1
+	fi
+
+	test_done "$testroot" 0
+}
+
+test_load_branch_from_bundle() {
+	local testroot=`test_init test_load_branch_from_bundle`
+
+	echo "modified alpha in master" >$testroot/repo/alpha
+	git_commit "$testroot/repo" -m "edit alpha in master"
+
+	master_commit="$(git_show_head "$testroot/repo")"
+
+	(cd "$testroot/repo" && git checkout -q -b newbranch)
+
+	for i in `seq 1`; do
+		echo "alpha edit #$i" > $testroot/repo/alpha
+		git_commit "$testroot/repo" -m "edit alpha"
+	done
+
+	newbranch_commit="$(git_show_head "$testroot/repo")"
+
+	(cd "$testroot/repo" && gotadmin dump -q >$testroot/bundle)
+
+	(cd "$testroot/" && gotadmin init -b newbranch repo2) >/dev/null
+
+	# check that the reference in the bundle are what we expect
+	(cd "$testroot/repo2" && gotadmin load -l "$testroot/bundle") \
+		>$testroot/stdout
+
+	cat <<EOF >$testroot/stdout.expected
+HEAD: $newbranch_commit
+refs/heads/master: $master_commit
+refs/heads/newbranch: $newbranch_commit
+EOF
+	if ! cmp -s "$testroot/stdout" "$testroot/stdout.expected"; then
+		diff -u "$testroot/stdout" "$testroot/stdout.expected"
+		test_done "$testroot" 1
+		return 1
+	fi
+
+	(cd "$testroot/repo2" && gotadmin load -q -b refs/heads/newbranch \
+		<$testroot/bundle)
+	if [ $? -ne 0 ]; then
+		echo "gotadmin load failed unexpectedly" >&2
+		test_done "$testroot" 1
+		return 1
+	fi
+
+	# now that the bundle is loaded, delete the branch master on
+	# the repo to have the same got log output.
+	(cd "$testroot/repo" && got branch -d master) >/dev/null
+
+	(cd "$testroot/repo"  && got log -p >$testroot/repo.log)
+	(cd "$testroot/repo2" && got log -p >$testroot/repo2.log)
+	if ! cmp -s $testroot/repo.log $testroot/repo2.log; then
+		diff -u $testroot/repo.log $testroot/repo2.log
+		test_done "$testroot" 1
+		return 1
+	fi
+
+	test_done "$testroot" 0
+}
+
+test_parseargs "$@"
+run_test test_load_bundle
+run_test test_load_branch_from_bundle