I’m looking for a Large Language Model (LLM) that can effectively extract medical terms like symptoms, tests, and conditions from unstructured text. However, due to strict data privacy concerns, I need a solution that can be run entirely offline without sending data to an external API.
Specific Requirements:
Offline: Model should be able to run locally without an internet connection.
Medical NER: Model should be trained or fine-tuned for recognizing medical terminology.
Data Privacy: No sensitive patient data should be transmitted to any external service.
Possible Solutions (but not limited to):
spacy: Are there specific spacy models or pipelines (other than en_core_med7_lg) suitable for this task and available for offline use?
Other LLMs: Are there any open-source LLMs focused on medical NER that I could deploy locally?
Custom Training: Could I fine-tune a general-purpose LLM like BERT or GPT on a medical dataset to achieve this offline?
What I’ve Tried:
I’ve explored some cloud-based medical NER APIs, but they don’t meet my privacy requirements. I’ve also looked into spacy, but the available medical models seem limited for offline use.
Additional Considerations:
Performance: The model should ideally have reasonable accuracy and efficiency for real-world medical text.
Ease of Use: A straightforward way to load and use the model within a Python environment would be preferred.
Any suggestions, recommendations, or code examples would be greatly appreciated!
SOURABH IDS is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.