I am currently working on my master’s thesis, which is the second part of a series. In my thesis part 1, titled ‘Adverse Drug Event Detection Using NLP Techniques’, I attempted to reproduce the results of a paper but had to adapt the methodology to a different dataset (n2c2) due to data availability constraints. My focus was on Named Entity Recognition (NER) for identifying Adverse Drug Events (ADE) in clinical text, where I fine-tuned a DeBERTa model.
As this is my maiden journey into academic research, I encountered challenges with model accuracy and am looking for state-of-the-art approaches to improve my results. I have came acress following options:
Data augmentation techniques.
Advanced model architectures or fine-tuning strategies.
Integration of external medical knowledge bases to improve context understanding.
If anyone has experience with similar research or insights into the latest NLP advancements in the medical domain, your input would be invaluable. Additionally, if there are common pitfalls or essential considerations that a first-time thesis writer should be aware of, I am eager to learn.
For reference, my current work builds upon the initial research paper text, and I have adapted it to the nuances of the n2c2 dataset.
Thank you in advance for your time and assistance!
I attempted to reproduce the results of a prior study but encountered challenges with lower-than-expected accuracy. Here’s what I’ve tried:
Fine-tuning DeBERTa with standard hyperparameters on the n2c2 dataset.
Basic data preprocessing for NER tasks (tokenization, BIO tagging).
I expected to achieve comparable results to the initial study, which used a different dataset, but my model’s performance is lagging, particularly in precision and recall.