Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I am trying to output the difference between two text files using the library difflib in Python 2, with the function HtmlDiff to generate an html file.

V1 = 'This has four words'
V2 = 'This has more than four words'
res = difflib.HtmlDiff().make_table(V1, V2)
text_file = open(OUTPUT, "w")
text_file.write(res)
text_file.close()

However the output html looks like this on a browser:

The display is comparing each single character, making it completely unreadable.

What should I modify for the comparison to be more human-friendly? (e.g. full sentences on each side)

If the input specifies "lines", then the output is also formatted respecting the lines, but it is not displaying the differences:

V1 = ['This has four words']
V2 = ['This has more than four words']
res = difflib.HtmlDiff().make_table(V1, V2)
text_file = open(OUTPUT, "w")
text_file.write(res)
text_file.close()

Resulting html (as viewed on a browser):

You seem to be reading V1 from the file opened with encoding utf-8, the read-reading the file into V1 opened without encoding. Are you sure you need both these? Same for V2? – DisappointedByUnaccountableMod May 25, 2020 at 19:43 OK well that probably explains the missing of utf-8 decoding - because you removed the encoding on the open. You are using Python 3, aren’t you? If you give your code simple plain ascii text to compare, does it produce better output? – DisappointedByUnaccountableMod May 25, 2020 at 19:48 @barny that did solve the encoding problem, however the output still has the same problem. I updated the code to be clearer and easier to reproduce (this is python 2) – hirschme May 25, 2020 at 20:15 “Not displaying the differences” - so you want markup? Try stackoverflow.com/questions/774316/… – DisappointedByUnaccountableMod May 25, 2020 at 20:38

To get a markup you can use difflib.SequenceMatcher as in the function defined in this answer https://stackoverflow.com/a/788780/2318649

to get this code:

import difflib
def show_diff(seqm):
    # function from https://stackoverflow.com/questions/774316/python-difflib-highlighting-differences-inline
    """Unify operations between two compared strings
seqm is a difflib.SequenceMatcher instance whose a & b are strings"""
    output= []
    for opcode, a0, a1, b0, b1 in seqm.get_opcodes():
        if opcode == 'equal':
            output.append(seqm.a[a0:a1])
        elif opcode == 'insert':
            output.append("<ins>" + seqm.b[b0:b1] + "</ins>")
        elif opcode == 'delete':
            output.append("<del>" + seqm.a[a0:a1] + "</del>")
        elif opcode == 'replace':
            raise NotImplementedError( "what to do with 'replace' opcode?" )
        else:
            raise RuntimeError( f"unexpected opcode unknown opcode {opcode}" )
    return ''.join(output)
V1 = 'This has four words but fewer than eleven'
V2 = 'This has more than four words'
sm= difflib.SequenceMatcher(None, V1, V2)
html = "<html><body>"+show_diff(sm)+"</body></html>"
open("output.html","wt").write(html)

which produces:

this is an old question, but i have been struggling with it myself for a few days. I was getting this:

before fixing anything i finally pieced together something. looks like this:

html = difflib.HtmlDiff().make_file(a.split(' '), b.split(' '), fromdesc="original", todesc="modified")

after adding simple little split

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.