"GOT", but the "O" is a cute, smiling pufferfish. Index | Thread | Search

From:
Yang Zhong <yzhong@freebsdfoundation.org>
Subject:
Re: change got_worktree_init, open_worktree to use fds
To:
Yang Zhong <yzhong@freebsdfoundation.org>, gameoftrees@openbsd.org
Date:
Tue, 8 Dec 2020 10:31:25 -0800

Download raw body.

Thread
Here's a small patch demonstrating the use of the new fd field in
got_worktree. I modified got_fileindex_entry_update to take in an
fd, which required all of its calls to be modified as well. As you
said, fileindexes have their paths stored relative to the worktree's
path already, so most of the time I can pass in a pre-existing
relative path instead of the absolute one.

This required me to add worktree fd fields to a few helpers as well
as got_fileindex_entry_update itself. There is also one spot where
I used got_path_skip_common_ancestor to create a relative path, in
update_fileindex_after_commit. I thought that ct->path would work
(replacing ct->ondisk_path), but it did not; I'm probably missing
some of the nuances in this case.

On Sun, Dec 6, 2020 at 6:39 PM Stefan Sperling <stsp@stsp.name> wrote:
>
> On Sun, Dec 06, 2020 at 05:59:45PM -0800, Yang Zhong wrote:
> > >   err = got_path_skip_common_ancestor(&child,
> > >       got_worktree_get_root_path(worktree), abspath_in_worktree);
> > >   if (err)
> > >         goto done;
> > >   if (openat(worktree_fd, child, ...) == -1) {
> > >         err = got_error_from_errno2("openat", abspath_in_worktree);
> > >         free(child);
> > >         goto done;
> > >   }
> > >   free(child);
> > >
> > > We can derive relative paths from absolute ones in this way.
> > > Any function which has access to struct got_worktree and an absolute path
> > > that points inside this worktree should be able to use openat() to open the
> > > corresponding file. Provided an fd and worktree root abspath are both stored
> > > in struct got_worktree. Similarly for got_repository and the tempdir.
> >
> > I definitely see the benefit of this solution. I'm concerned that adding
> > this extra step to every open() call will add too much noise, but I'll try
> > this solution in a section of code first, to be sure. Likely, some places
> > will still make more sense with relative paths (such as in the first patch
> > I wrote).
>
> Yes, I agree this approach may not always be the best choice. Your PoC
> patch shows that quite a few calls to open() exist, and perhaps not all
> of them can be handled the same way. We will have to look at actual code
> changes to make an informed decision on a case-by-case basis.
>
> Some code paths, such as the work tree status crawl, already have relative
> paths available. In the work tree's case, they are read from the file index
> where paths are stored relative to the work tree's root directory.
>
> > > (I also hope that capsicum will be OK with multiple path components in the
> > > path argument of openat(), like this: openat(open("/one"), "two/three");
> > > If not, an fd for every intermediate directory would be required, which
> > > would be very awkward to handle.)
> >
> > Yes, capsicum is fine with this. It only requires that the path have no ".."s
> > in it.
>
> That seems fine, then.
>
> This means paths obtained directly from user input are not safe to use.
> But I suppose such input will already be converted to absolute paths by
> realpath(3) before capsicum is entered. Converting absolute paths obtained
> from realpath(3) to relative ones should ensure that ".." never occurs under
> normal conditions.
diff --git a/lib/fileindex.c b/lib/fileindex.c
index b46dbb29..ce162817 100644
--- a/lib/fileindex.c
+++ b/lib/fileindex.c
@@ -88,15 +88,15 @@ got_fileindex_perms_to_st(struct got_fileindex_entry *ie)
 
 const struct got_error *
 got_fileindex_entry_update(struct got_fileindex_entry *ie,
-    const char *ondisk_path, uint8_t *blob_sha1, uint8_t *commit_sha1,
+    int wt_fd, const char *ondisk_path, uint8_t *blob_sha1, uint8_t *commit_sha1,
     int update_timestamps)
 {
 	struct stat sb;
 
-	if (lstat(ondisk_path, &sb) != 0) {
+	if (fstatat(wt_fd, ondisk_path, &sb, AT_SYMLINK_NOFOLLOW) != 0) {
 		if (!((ie->flags & GOT_FILEIDX_F_NO_FILE_ON_DISK) &&
 		    errno == ENOENT))
-			return got_error_from_errno2("lstat", ondisk_path);
+			return got_error_from_errno2("fstatat", ondisk_path);
 		sb.st_mode = GOT_DEFAULT_FILE_MODE;
 	} else {
 		if (sb.st_mode & S_IFDIR)
diff --git a/lib/got_lib_fileindex.h b/lib/got_lib_fileindex.h
index f1005832..537374d7 100644
--- a/lib/got_lib_fileindex.h
+++ b/lib/got_lib_fileindex.h
@@ -108,7 +108,7 @@ uint16_t got_fileindex_perms_from_st(struct stat *);
 mode_t got_fileindex_perms_to_st(struct got_fileindex_entry *);
 
 const struct got_error *got_fileindex_entry_update(struct got_fileindex_entry *,
-    const char *, uint8_t *, uint8_t *, int);
+    int, const char *, uint8_t *, uint8_t *, int);
 const struct got_error *got_fileindex_entry_alloc(struct got_fileindex_entry **,
     const char *);
 void got_fileindex_entry_free(struct got_fileindex_entry *);
diff --git a/lib/worktree.c b/lib/worktree.c
index 8bf4b7ff..8994bf43 100644
--- a/lib/worktree.c
+++ b/lib/worktree.c
@@ -1175,7 +1175,7 @@ done:
 static const struct got_error *
 create_fileindex_entry(struct got_fileindex_entry **new_iep,
     struct got_fileindex *fileindex, struct got_object_id *base_commit_id,
-    const char *ondisk_path, const char *path, struct got_object_id *blob_id)
+    int wt_fd, const char *path, struct got_object_id *blob_id)
 {
 	const struct got_error *err = NULL;
 	struct got_fileindex_entry *new_ie;
@@ -1186,7 +1186,7 @@ create_fileindex_entry(struct got_fileindex_entry **new_iep,
 	if (err)
 		return err;
 
-	err = got_fileindex_entry_update(new_ie, ondisk_path,
+	err = got_fileindex_entry_update(new_ie, wt_fd, path,
 	    blob_id->sha1, base_commit_id->sha1, 1);
 	if (err)
 		goto done;
@@ -1861,11 +1861,11 @@ done:
  * we had to run a full content comparison to find out.
  */
 static const struct got_error *
-sync_timestamps(char *ondisk_path, unsigned char status,
+sync_timestamps(int wt_fd, const char *path, unsigned char status,
     struct got_fileindex_entry *ie, struct stat *sb)
 {
 	if (status == GOT_STATUS_NO_CHANGE && stat_info_differs(ie, sb))
-		return got_fileindex_entry_update(ie, ondisk_path,
+		return got_fileindex_entry_update(ie, wt_fd, path,
 		    ie->blob_sha1, ie->commit_sha1, 1);
 
 	return NULL;
@@ -1918,7 +1918,7 @@ update_blob(struct got_worktree *worktree,
 		if (got_fileindex_entry_has_commit(ie) &&
 		    memcmp(ie->commit_sha1, worktree->base_commit_id->sha1,
 		    SHA1_DIGEST_LENGTH) == 0) {
-			err = sync_timestamps(ondisk_path, status, ie, &sb);
+			err = sync_timestamps(worktree->root_fd, path, status, ie, &sb);
 			if (err)
 				goto done;
 			err = (*progress_cb)(progress_arg, GOT_STATUS_EXISTS,
@@ -1928,7 +1928,7 @@ update_blob(struct got_worktree *worktree,
 		if (got_fileindex_entry_has_blob(ie) &&
 		    memcmp(ie->blob_sha1, te->id.sha1,
 		    SHA1_DIGEST_LENGTH) == 0) {
-			err = sync_timestamps(ondisk_path, status, ie, &sb);
+			err = sync_timestamps(worktree->root_fd, path, status, ie, &sb);
 			goto done;
 		}
 	}
@@ -1987,17 +1987,17 @@ update_blob(struct got_worktree *worktree,
 		 * Otherwise, a future status walk would treat them as
 		 * unmodified files again.
 		 */
-		err = got_fileindex_entry_update(ie, ondisk_path,
+		err = got_fileindex_entry_update(ie, worktree->root_fd, path,
 		    blob->id.sha1, worktree->base_commit_id->sha1,
 		    update_timestamps);
 	} else if (status == GOT_STATUS_MODE_CHANGE) {
-		err = got_fileindex_entry_update(ie, ondisk_path,
+		err = got_fileindex_entry_update(ie, worktree->root_fd, path,
 		    blob->id.sha1, worktree->base_commit_id->sha1, 0);
 	} else if (status == GOT_STATUS_DELETE) {
 		err = (*progress_cb)(progress_arg, GOT_STATUS_MERGE, path);
 		if (err)
 			goto done;
-		err = got_fileindex_entry_update(ie, ondisk_path,
+		err = got_fileindex_entry_update(ie, worktree->root_fd, path,
 		    blob->id.sha1, worktree->base_commit_id->sha1, 0);
 		if (err)
 			goto done;
@@ -2020,11 +2020,11 @@ update_blob(struct got_worktree *worktree,
 			goto done;
 
 		if (ie) {
-			err = got_fileindex_entry_update(ie, ondisk_path,
+			err = got_fileindex_entry_update(ie, worktree->root_fd, path,
 			    blob->id.sha1, worktree->base_commit_id->sha1, 1);
 		} else {
 			err = create_fileindex_entry(&ie, fileindex,
-			    worktree->base_commit_id, ondisk_path, path,
+			    worktree->base_commit_id, worktree->root_fd, path,
 			    &blob->id);
 		}
 		if (err)
@@ -2124,7 +2124,7 @@ delete_blob(struct got_worktree *worktree, struct got_fileindex *fileindex,
 		 * Preserve the working file and change the deleted blob's
 		 * entry into a schedule-add entry.
 		 */
-		err = got_fileindex_entry_update(ie, ondisk_path, NULL, NULL,
+		err = got_fileindex_entry_update(ie, worktree->root_fd, ie->path, NULL, NULL,
 		    0);
 	} else {
 		err = (*progress_cb)(progress_arg, GOT_STATUS_DELETE, ie->path);
@@ -2933,7 +2933,7 @@ merge_file_cb(void *arg, struct got_blob_object *blob1,
 				goto done;
 			if (status == GOT_STATUS_DELETE) {
 				err = got_fileindex_entry_update(ie,
-				    ondisk_path, blob2->id.sha1,
+				    a->worktree->root_fd, path2, blob2->id.sha1,
 				    a->worktree->base_commit_id->sha1, 0);
 				if (err)
 					goto done;
@@ -2955,7 +2955,7 @@ merge_file_cb(void *arg, struct got_blob_object *blob1,
 			err = got_fileindex_entry_alloc(&ie, path2);
 			if (err)
 				goto done;
-			err = got_fileindex_entry_update(ie, ondisk_path,
+			err = got_fileindex_entry_update(ie, a->worktree->root_fd, path2,
 			    NULL, NULL, 1);
 			if (err) {
 				got_fileindex_entry_free(ie);
@@ -3775,7 +3775,7 @@ schedule_addition(void *arg, unsigned char status, unsigned char staged_status,
 	err = got_fileindex_entry_alloc(&ie, relpath);
 	if (err)
 		goto done;
-	err = got_fileindex_entry_update(ie, ondisk_path, NULL, NULL, 1);
+	err = got_fileindex_entry_update(ie, a->worktree->root_fd, relpath, NULL, NULL, 1);
 	if (err) {
 		got_fileindex_entry_free(ie);
 		goto done;
@@ -4581,8 +4581,8 @@ revert_file(void *arg, unsigned char status, unsigned char staged_status,
 				goto done;
 			if (status == GOT_STATUS_DELETE ||
 			    status == GOT_STATUS_MODE_CHANGE) {
-				err = got_fileindex_entry_update(ie,
-				    ondisk_path, blob->id.sha1,
+				err = got_fileindex_entry_update(ie, a->worktree->root_fd,
+				    relpath, blob->id.sha1,
 				    a->worktree->base_commit_id->sha1, 1);
 				if (err)
 					goto done;
@@ -5285,18 +5285,25 @@ done:
 }
 
 static const struct got_error *
-update_fileindex_after_commit(struct got_pathlist_head *commitable_paths,
+update_fileindex_after_commit(struct got_worktree *worktree, struct got_pathlist_head *commitable_paths,
     struct got_object_id *new_base_commit_id, struct got_fileindex *fileindex,
     int have_staged_files)
 {
 	const struct got_error *err = NULL;
 	struct got_pathlist_entry *pe;
+	char *relpath = NULL;
 
 	TAILQ_FOREACH(pe, commitable_paths, entry) {
 		struct got_fileindex_entry *ie;
 		struct got_commitable *ct = pe->data;
 
 		ie = got_fileindex_entry_get(fileindex, pe->path, pe->path_len);
+
+		err = got_path_skip_common_ancestor(&relpath,
+		    worktree->root_path, ct->ondisk_path);
+		if (err)
+			goto done;
+
 		if (ie) {
 			if (ct->status == GOT_STATUS_DELETE ||
 			    ct->staged_status == GOT_STATUS_DELETE) {
@@ -5306,32 +5313,38 @@ update_fileindex_after_commit(struct got_pathlist_head *commitable_paths,
 				got_fileindex_entry_stage_set(ie,
 				    GOT_FILEIDX_STAGE_NONE);
 				got_fileindex_entry_staged_filetype_set(ie, 0);
-				err = got_fileindex_entry_update(ie,
-				    ct->ondisk_path, ct->staged_blob_id->sha1,
+
+				err = got_fileindex_entry_update(ie, worktree->root_fd,
+				    relpath, ct->staged_blob_id->sha1,
 				    new_base_commit_id->sha1,
 				    !have_staged_files);
 			} else
-				err = got_fileindex_entry_update(ie,
-				    ct->ondisk_path, ct->blob_id->sha1,
+				err = got_fileindex_entry_update(ie, worktree->root_fd,
+				    relpath, ct->blob_id->sha1,
 				    new_base_commit_id->sha1,
 				    !have_staged_files);
 		} else {
 			err = got_fileindex_entry_alloc(&ie, pe->path);
 			if (err)
-				break;
-			err = got_fileindex_entry_update(ie, ct->ondisk_path,
+				goto done;
+			err = got_fileindex_entry_update(ie, worktree->root_fd, relpath,
 			    ct->blob_id->sha1, new_base_commit_id->sha1, 1);
 			if (err) {
 				got_fileindex_entry_free(ie);
-				break;
+				goto done;
 			}
 			err = got_fileindex_entry_add(fileindex, ie);
 			if (err) {
 				got_fileindex_entry_free(ie);
-				break;
+				goto done;
 			}
 		}
+		free(relpath); //NOTE add done
+		relpath = NULL;
 	}
+done:
+	if (relpath)
+		free(relpath);
 	return err;
 }
 
@@ -5662,7 +5675,7 @@ got_worktree_commit(struct got_object_id **new_commit_id,
 	if (err)
 		goto done;
 
-	err = update_fileindex_after_commit(&commitable_paths, *new_commit_id,
+	err = update_fileindex_after_commit(worktree, &commitable_paths, *new_commit_id,
 	    fileindex, have_staged_files);
 	sync_err = sync_fileindex(fileindex, fileindex_path);
 	if (sync_err && err == NULL)
@@ -6248,7 +6261,7 @@ rebase_commit(struct got_object_id **new_commit_id,
 	if (err)
 		goto done;
 
-	err = update_fileindex_after_commit(&commitable_paths, *new_commit_id,
+	err = update_fileindex_after_commit(worktree, &commitable_paths, *new_commit_id,
 	    fileindex, 0);
 	sync_err = sync_fileindex(fileindex, fileindex_path);
 	if (sync_err && err == NULL)