Loading the Presidio analyzing engine takes some time. I want to filter out specific names but I want to filter out different names for every document. I don’t understand how to perform this seemingly simple task with Presidio.
Somehow I can pass an allow_list to the analyze function but not a deny_list.
I need to create a recognizer to a that can take a different deny list each time.
But I don’t want to reload the analyzer engine each time because I don’t need to reload the NLP models each time.
How can a recognizer be built in Presidio that takes a different deny list (or better a deny dictionary) each time it is called?
I found that there is a function my_analyzer_first.registry.remove_recognizer
but this seems to not work because it permanently changes the analyzer. Running the code below, recognizes “Bob” as patient even after the recognizer has been removed.
from presidio_analyzer import PatternRecognizer, EntityRecognizer, RecognizerResult, AnalyzerEngine, nlp_engine
from presidio_analyzer.nlp_engine import NlpArtifacts
import time
my_analyzer_first = AnalyzerEngine(
supported_languages=["en"], default_score_threshold=0.5
)
def on_the_fly_without_loading(text : str, deny_list : list[str]):
denylist_recognizer = PatternRecognizer(supported_entity="[PATIENT]", deny_list=deny_list)
my_analyzer_first.registry.add_recognizer(denylist_recognizer)
entities = my_analyzer_first.analyze(text=text, language="en")
my_analyzer_first.registry.remove_recognizer(denylist_recognizer)
return entities
def sanity_check(text : str):
entities = my_analyzer_first.analyze(text=text, language="en")
return entities
if __name__=="__main__":
N = 10
text = 4 * "I went to the zoo and said hello to Bob the tiger."
deny_list = ["Bob"]
print(on_the_fly_without_loading(text, deny_list))
print(sanity_check(text))
You’d expect ‘Bob’ to only be removed in the first case but it is removed each time.
When comparing the registry my_analyzer_first.registry
before and after my_analyzer_first.registry.remove_recognizer(denylist_recognizer)
, I concluded doesn’t change the value!