I am trying to find a more appropriate algorithm for string similarity.
I have always used Levenshtein or Smith-Waterman but since there are so many others that I didn’t even know the existence of (and never tried) I would like your suggestion on the matter.
So, let me clarify that characters are usually in the same positions (or near) and that sometimes happen that there is no space between two words and instead sometimes there is.
Of course it might happen that few characters of a string might be different and it might also happen that there are additional or less words and/or lines (it happens also with empty lines so maybe I should some preprocessing and remove the empty lines).
Some examples of the texts (the 3 empty consecutive lines stands as a separator for a new block of text):
1556 + @ @ N Q =, 84%1m
N a
f TRAINER ACHIEVEMENT\“
'{m‘}’fi
e -
Test A
FINAL STRIKE) ¢
Dealt the decisive blow fi k
v
a,
Hioukaa
ST HITTER
Dealt a whg . ‘ 4
LOOKING FOR FRIENDS?
/ TRAINERACHIEVEMENTS. @)
.V
Yas ” e : |
FINAL STRIKE } E
Talk about clutch!
‘g%‘
Ces272021 3
STYLE SAMANT
Check out t E— .“.”o.i":
‘TO SUMMARY
/ TRAINERACHIEVEMENTS @)
Yasscream ,
FM ) {
Talk about clutch!
Ces272021 t ‘
STYLE SAy_é NT
checcout ') TO SUMMARY | ;.;;i
1808 d # @ N Q = 64%m
/ TRAINERACHIEVEMENTS @)
Yau ' |
FINAL STRIKE }
Talk about clutch!
R
Ces272021 )
STYLE SAMA;NT
Check out t — ".."o.’o":
'OTO SUMMARY
/ TRAINERACHIEVEMENTS @)
Yasscream ,
FM : {
Talk about clutch!
Ces272021 7
STYLE SAMA NT
checcout ') TO SUMMARY | ;.;;:
1808 d # @ N Q =:64%m
/ TRAINERACHIEVEMENTS @)
Yas .“ S ' i"
FINALSTRIKES
Talk about clutch! }
y
Ces272021 IR
Vo
STYLE SAM%NT
Check out t m— ..o;i
.)TO SUMMARY
- i
1809 # @ * N Q = 64%m
/ TRAINERACHIEVEMENTS @)
Yas : R : ”
FINAL STRIKE J
Talk about clutch!
Ces272021
STYLE SAVA. NT
Check out t — ".."o.’o":
'O TO SUMMARY
/ TRAINERACHIEVEMENTS @)
4 A i
Yas ;'W:_fi“‘ e ’ ,,
FINAL STRIKE }
Talk about clutch!
fldv
Ces272021 %’
STYLE SAMAN T,
Check out ' . P
/ TRAINERACHIEVEMENTS @)
Ve *i
Yas ” e : |
FINAL STRIKE } i
Talk about clutch!
F b
Ces272021
STYLE SA¥L§. NT
Check out t E— .“.”o.i":
‘TO SUMMARY
/ TRAINERACHIEVEMENTS @)
Yasscream ,
FM : {
Talk about clutch!
Ces272021 7
STYLE SAMA NT
checcout ') TO SUMMARY | ;.;;:
/ TRAINERACHIEVEMENTS @)
Yas X . . ‘(
FINAL STRIKE }
Talk about clutch!
W ¢,
Ces272021 -
STYLE SAMA u T
Check out t T ..o;i
.)TO SUMMARY
Thanks in advance.
zaxunobi is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.