From: Martin Pieuchot <mpi@openbsd.org>
Subject: Re: diff algo implementation ("duff")
To: neeels <got@kleinekatze.de>
Cc: gameoftrees@openbsd.org
Date: Wed, 22 Jan 2020 12:56:33 +0100

On 22/01/20(Wed) 04:32, neeels wrote:
> [...] 
> The first step would probably be some automated profiling with large amounts of
> test data, so that we get feedback on whether things improve or are getting
> worse. Any simple ideas?

What kind of tests cases do you want?  Huge diffs in term of lines
added/removed like sys/dev/drm or Mesa updates?  Or moving a function in
a driver file and making sure the diff is as small as possible?

> [...] 
> I hope that the coding style matches got, and that the structuring and massive
> amount of code comments allow diff hacking fun to come up also for others than
> me?

Ideally should be able to replace all the consumers of the current diff
implementation in base OpenBSD with yours.  diff(1) might be the easiest
one to pick to look for regressions/performances/improvements.

That means on top of the algorithms you implemented we could start
looking at the integration of this new pieces of code :o)  Did you look
at the current diffreg() interface?  It is used in:

	usr.bin/cvs
	usr.bin/diff
	usr.bin/rcs
	got/lib/

Maybe we could add some data in OpenBSD's regress/usr.bin/diff and do
profiling with that.  I'm not sure how to compare the outputs since it
might depends on the algorithm used.

> Another thing, so far it is called just "diff", which is asking for huge name
> conflicts and confusion with previous diff.
> I have had "duff" as a local alias for a unidiff (diff -u) for a long time, so
> I think I want to name this project "duff", and make unidiff output the
> default (in case the so far simplistic cmdline tool becomes install-worthy...)

I'd love to see your project become a drop-in replacement for OpenBSD's
diff(1).  So I don't see any problem with naming it "diff" :o)  Being
compliant with POSIX and with the commonly use arguments is the tricky
part.