From: Martin Pieuchot Subject: Re: diff algo implementation ("duff") To: neeels Cc: gameoftrees@openbsd.org Date: Wed, 22 Jan 2020 12:56:33 +0100 On 22/01/20(Wed) 04:32, neeels wrote: > [...] > The first step would probably be some automated profiling with large amounts of > test data, so that we get feedback on whether things improve or are getting > worse. Any simple ideas? What kind of tests cases do you want? Huge diffs in term of lines added/removed like sys/dev/drm or Mesa updates? Or moving a function in a driver file and making sure the diff is as small as possible? > [...] > I hope that the coding style matches got, and that the structuring and massive > amount of code comments allow diff hacking fun to come up also for others than > me? Ideally should be able to replace all the consumers of the current diff implementation in base OpenBSD with yours. diff(1) might be the easiest one to pick to look for regressions/performances/improvements. That means on top of the algorithms you implemented we could start looking at the integration of this new pieces of code :o) Did you look at the current diffreg() interface? It is used in: usr.bin/cvs usr.bin/diff usr.bin/rcs got/lib/ Maybe we could add some data in OpenBSD's regress/usr.bin/diff and do profiling with that. I'm not sure how to compare the outputs since it might depends on the algorithm used. > Another thing, so far it is called just "diff", which is asking for huge name > conflicts and confusion with previous diff. > I have had "duff" as a local alias for a unidiff (diff -u) for a long time, so > I think I want to name this project "duff", and make unidiff output the > default (in case the so far simplistic cmdline tool becomes install-worthy...) I'd love to see your project become a drop-in replacement for OpenBSD's diff(1). So I don't see any problem with naming it "diff" :o) Being compliant with POSIX and with the commonly use arguments is the tricky part.