From: Stefan Sperling Subject: Re: got remove recursion To: Tracey Emery Cc: gameoftrees@openbsd.org Date: Thu, 12 Dec 2019 19:02:08 +0100 On Wed, Dec 11, 2019 at 03:36:15PM -0700, Tracey Emery wrote: > Hello, > > The diff below adds recursion to got remove. New test added and all > tests pass. TODO and man page updated. Small fix for incorrect usage. > > I'm not really keen on the ignore_nochange arg name. If someone has a > better idea, let me know. Maybe 'report_unchanged'? > Comments? Ok? There's a bit of an oddity in your patch which gets exposed by the regress test you've added. Your approach of adding this functionality seems to rely on non-obvious side-effects. Which leaves me worried that this approach may be too brittle in the long term. At the top-level you're passing the work tree's root dir as relative path "" to worktree_status (this seems fine): (gdb) list 2995 char *ondisk_path; 2996 if (asprintf(&ondisk_path, "%s/%s", worktree->root_path, 2997 pe->path) == -1) 2998 return got_error_from_errno("asprintf"); 2999 sda.ondisk_path = ondisk_path; 3000 err = worktree_status(worktree, pe->path, fileindex, repo, 3001 schedule_for_deletion, &sda, NULL, NULL, 0, 1); 3002 free(ondisk_path); 3003 if (err) 3004 break; (gdb) p pe->path $44 = 0xb97cb4b0ba0 "" (gdb) p ondisk_path $45 = 0xb982ec7f280 "/tmp/got-test-rm_directory-VOqpla1g/wt/" (gdb) Now ondisk_path is the absolute version of pe->path, and this absolute path gets passed down via sda.ondisk_path. Now, later on, schedule_for_deletion() is called for each file found while traversing this directory. Then two get_file_status() calls happen; one with the work tree's root path (ondisk_path) and another get_file_status() call for the file which is about to be deleted: #0 schedule_for_deletion (arg=0x7f7ffffcd818, status=126 '~', staged_status=32 ' ', relpath=0xb97cb4b0370 "alpha", blob_id=0x7f7ffffcd3f0, staged_blob_id=0x0, commit_id=0x7f7ffffcd3d8) at /home/stsp/src/got/got/../lib/worktree.c:2935 2935 got_worktree_get_root_path(a->worktree), (gdb) list 2910,2945 2910 struct got_object_id *commit_id) 2911 { 2912 struct schedule_deletion_args *a = arg; 2913 const struct got_error *err = NULL; 2914 struct got_fileindex_entry *ie = NULL; 2915 struct stat sb; 2916 char *path = NULL; 2917 2918 ie = got_fileindex_entry_get(a->fileindex, relpath, strlen(relpath)); 2919 if (ie == NULL) 2920 return got_error(GOT_ERR_BAD_PATH); 2921 2922 staged_status = get_staged_status(ie); 2923 if (staged_status != GOT_STATUS_NO_CHANGE) { 2924 if (staged_status == GOT_STATUS_DELETE) 2925 return NULL; 2926 return got_error_path(relpath, GOT_ERR_FILE_STAGED); 2927 } 2928 2929 err = get_file_status(&status, &sb, ie, a->ondisk_path, a->repo); 2930 if (err) 2931 return err; 2932 2933 if (S_ISDIR(sb.st_mode)) { 2934 if (asprintf(&path, "%s/%s", 2935 got_worktree_get_root_path(a->worktree), 2936 relpath) == -1) { 2937 err = got_error_from_errno("asprintf"); 2938 return err; 2939 } 2940 2941 err = get_file_status(&status, &sb, ie, path, a->repo); 2942 if (err) 2943 return err; 2944 } else 2945 path = strdup(a->ondisk_path); (gdb) p relpath $40 = 0xb97cb4b0370 "alpha" (gdb) p a->ondisk_path $41 = 0xb982ec7f280 "/tmp/got-test-rm_directory-VOqpla1g/wt/" (gdb) p a->worktree->root_path $42 = 0xb984434e7c0 "/tmp/got-test-rm_directory-VOqpla1g/wt" (gdb) p status $43 = 126 '~' Now here come the inconsistencies: Note that status is '~' which means "obstructed"; generally this is used to represent a situation where a regular file was expected at a given path but some other thing was found, such as a directory, /dev device node, or named socket. This value of status is not used. At some point, someone will be looking at this and wonder what is going on. It seems wrong to obtain a file index entry for a relpath "alpha", and then call get_file_status() with this same file index entry and an on-disk path that does not point at the corresponding file on disk. In the above case, the on-disk path points at the work tree's root directory instead of the ie's file's path. This is why status becomes '~'. Technically, get_file_status() should not even be called with a path that does not correspond to a file index entry (which by definition contains files only, not directories). We should try to somehow improve the clarity of this code but I don't have a good suggestion right now.