I am comparing two text files but the result that is returned in certain cases is wrong, I have verified that it happens when the file exceeds 200 lines and only in some cases.
In the comparison I only get the positive results from file2.
Input
These are the two text files that I compare.
I share them with Google drive links because they are very long:
- file1
- file2
Output
What it returns is:
different
EL BOSQUE ENCANTADO,El Bosque Encantado,"PRECIO
12,00�"
When it should return:
different
CIBELES DE CINE,Galer�a de Cristal de CentroCentro,"PRECIO
7,00�"
By testing I have seen that If I deleted an intro from the end of the files and leave them at 200 lines, it does it well but if I go over 200 lines it does it wrong.
Code
# BOOKSTORES
import difflib
import sys
# WE LOOK FOR THE DIFFERENCES AND PRINT THEM.
with open('file1.txt', encoding='utf8') as file_1, open('file2.txt', encoding='utf8') as file_2:
diff = difflib.Differ()
result = diff.compare(file_1.readlines(), file_2.readlines())
result = [line for line in result if line.startswith(("+ "))]
print(''.join(result))
sys.exit(1)
What am I doing wrong? Can it be done in another way that makes it correct?