I am looking to fine tune a summarization model from the HuggingFace Repository of NLP models to convert an extractive summary of a scientific research paper to an abstractive one. I already have a model that generates extractive summaries from the paper but I need data for training the model that would take the extractive summary as an input and generate corresponding abstractive summary as an output.
So far most of the datasets I have seen contain the entire paper as an input and generate an abstractive summary. Whereas I need to input the extractive summary to the model.
The closest dataset that I found would help me is this one:
https://huggingface.co/datasets/allenai/scitldr/viewer?row=10.
This has the Abstract, Introduction and Conclusion to the model as an input (which is kind of like an extractive summary), but only generates one line abstracts as the output. Whereas I would want the abstract also to be of multiple lines.
Could you kindly suggest me any dataset that you know of, which would help me with this? Any help is very appreciated, Thanks a lot !