Anyone have a good algorithms for minimum 2-dimens...
# thinking-together
b
Anyone have a good algorithms for minimum 2-dimensional diff? Input is 2 trimmed and clean CSVs (files don’t end in \n; lines don’t end in ,; no quotes or escaping). (one note: There often is a varying number of columns in rows) Example, given the 2 inputs below, write efficient diff and patch methods and a nice diff encoding.
Copy code
rank,color
1,blue
2,red
3,green 

rank,color
1,blue
2,orange
3,purple
A nice diff format would also be helpful. Can’t seem to find much on this yet, but there must be some stuff out there in the Spreadsheet/Finance world.
💯 1
e
I don't know any csv diff algorithms but there are a bunch of "CSV diff" implementations on github, maybe one of them works for you? Other idea: here's a cool article about diffing here: https://neil.fraser.name/writing/diff/:
Any difference algorithm could theoretically process any input, regardless of whether it is split by characters, words or lines ()*. However, some difference algorithms are much more efficient at handling small tokens such as characters, others are much more efficient at handling large tokens such as lines.
*: or rows? So if you treat the grid as a sequence (cells instead of characters, using row width instead of line length, and having nth element at Y * Width + X), perhaps one of the sequence diffs algos could yields better results than the others?
👍 1
b
In case anyone else ever is ever looking at this sort of problem. The good resources I was pointed to are: https://en.wikipedia.org/wiki/Polygon_covering ..Daff: http://paulfitz.github.io/daff/ ... and https://specs.frictionlessdata.io/tabular-diff/
👍 2
e
interesting, how's the polygon covering related? Is it more about the presentation of the diff than the diffing itself?
I wish daff had a brief description of the algorithm it uses. It seems highly concerned with "finding alignment" https://github.com/paulfitz/daff/blob/master/coopy/CompareTable.hx