Mapping line numbers across two diff files using emacs/python/winmerge
Question
Consider the following two files that are slightly different:
foo
(old version):
<Line 1> a
<Line 2> b
<Line 3> c
<Line 4> d
foo
(new version):
<Line 1> a
<Line 2> e
<Line 3> b
<Line 4> c
<Line 5> f
<Line 6> d
As you can see, characters e
and f
are introduced in the new file.
I have a set of line numbers corresponding to the older file…say, 1
, 3
, and 4
(corresponding to letters a
, c
, and d
).
Is there a way to do a mapping across these two files, so that I can get the line numbers of the corresponding characters in the newer file?
E.G., the result would be:
Old file line numbers (1,3,4) ===> New File line numbers (1,4,6)
Unfortunately I have only emacs (with a working ediff), Python, and winmerge at my disposal.
Solution
What you need is a string searching algorithm where you have multiple patterns (the lines from the old version of foo) that you want to search for within a text (the new version of foo). The Rabin-Karp algorithm is one such algorithm for this sort of task. I've adapted it to your problem:
def linematcher(haystack, needles, lineNumbers):
f = open(needles)
needles = [line.strip() for n, line in enumerate(f, 1) if n in lineNumbers]
f.close()
hsubs = set(hash(s) for s in needles)
for n, lineWithNewline in enumerate(open(haystack), 1):
line = lineWithNewline.strip()
hs = hash(line)
if hs in hsubs and line in needles:
print "{0} ===> {1}".format(lineNumbers[needles.index(line)], n)
Assuming your two files are called old_foo.txt
and new_foo.txt
then you would call this function like this:
linematcher('new_foo.txt', 'old_foo.txt', [1, 3, 4])
When I tried in on your data it printed:
1 ===> 1
3 ===> 4
4 ===> 6
OTHER TIPS
You can do it all in Emacs:
(defun get-joint-index (file-a index file-b)
(let ((table (make-hash-table :test #'equal)))
(flet ((line () (buffer-substring-no-properties
(point-at-bol) (point-at-eol))))
(with-temp-buffer (insert-file file-b)
(loop for i from 1 do (puthash (line) i table)
while (zerop (forward-line))))
(with-temp-buffer (insert-file file-a)
(loop for i in index do (goto-line i)
collect (gethash (line) table))))))
To run,
M-:(get-joint-index "/tmp/old" '(1 3 4) "/tmp/new")
-> (1 4 6)