I tried difflib get_close_matches() but I want more fine-grained control, so I wanted to do something similar myself. I looked at the source for get_close_matches() and it just obtains the ratios and returns the best one (or more).
get_close_matches() uses ratio() and picks the highest result. However, when doing the ratio() individually, the correct match (chosen by get_close_matches()) doesn’t have the highest ratio.
I’m not sure if I’m using it incorrectly or what’s going on.
Here is a test program showing what’s going on:
test_string = "ifthecalleridentifieshimselforherselfasinventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthathisorherassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthatheorshewillbecalledback"
test_list = ["ifthecalleridentifiestheirselfasaninventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthattheirassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthattheywillbecalledback",
"2ifthecalleridentifiedtheirselfasaninventorapplicantoranauthorizedrepresentativeoftheassigneeofrecordpatentdataportalshouldbeusedtoverifythecorrespondenceaddressofrecord"]
print ("Radio of string to first item: ", difflib.SequenceMatcher(None, test_string, test_list[0]).ratio())
print ("Radio of string to second item:", difflib.SequenceMatcher(None, test_string, test_list[1]).ratio())
print (difflib.get_close_matches(test_string, test_list, n=1, cutoff=0.1))
<code>import difflib
test_string = "ifthecalleridentifieshimselforherselfasinventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthathisorherassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthatheorshewillbecalledback"
test_list = ["ifthecalleridentifiestheirselfasaninventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthattheirassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthattheywillbecalledback",
"2ifthecalleridentifiedtheirselfasaninventorapplicantoranauthorizedrepresentativeoftheassigneeofrecordpatentdataportalshouldbeusedtoverifythecorrespondenceaddressofrecord"]
print ("Radio of string to first item: ", difflib.SequenceMatcher(None, test_string, test_list[0]).ratio())
print ("Radio of string to second item:", difflib.SequenceMatcher(None, test_string, test_list[1]).ratio())
print (difflib.get_close_matches(test_string, test_list, n=1, cutoff=0.1))
</code>
import difflib
test_string = "ifthecalleridentifieshimselforherselfasinventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthathisorherassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthatheorshewillbecalledback"
test_list = ["ifthecalleridentifiestheirselfasaninventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthattheirassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthattheywillbecalledback",
"2ifthecalleridentifiedtheirselfasaninventorapplicantoranauthorizedrepresentativeoftheassigneeofrecordpatentdataportalshouldbeusedtoverifythecorrespondenceaddressofrecord"]
print ("Radio of string to first item: ", difflib.SequenceMatcher(None, test_string, test_list[0]).ratio())
print ("Radio of string to second item:", difflib.SequenceMatcher(None, test_string, test_list[1]).ratio())
print (difflib.get_close_matches(test_string, test_list, n=1, cutoff=0.1))
And here are the results I’m getting:
<code>Ratio of string to first list element: 0.4924114671163575
Ratio of string to second list element: 0.5520169851380042
['ifthecalleridentifiestheirselfasaninventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthattheirassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthattheywillbecalledback']
<code>Ratio of string to first list element: 0.4924114671163575
Ratio of string to second list element: 0.5520169851380042
['ifthecalleridentifiestheirselfasaninventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthattheirassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthattheywillbecalledback']
</code>
Ratio of string to first list element: 0.4924114671163575
Ratio of string to second list element: 0.5520169851380042
['ifthecalleridentifiestheirselfasaninventoranapplicantoranauthorizedrepresentativeoftheassigneeofrecordaskforthecorrespondenceaddressofrecordandinformcallerthattheirassociationwiththeapplicationmustbeverifiedbeforeanyinformationconcerningtheapplicationcanbereleasedandthattheywillbecalledback']
The ratio for the second element is higher, but bet_close_matches correctly returns the first element.