I am doing a prototype where I have some very large SharePoint sites and libraries. These sites contain all sorts of documents (Excel, Word, emails, PDF, PowerPoints). These document libraries can contain clinical (medical) and non-clinical data. A lot of this was dumped from old legacy on-prem servers to SharePoint.
I want to use CoPilot to scan these libraries and return a list of the document name, URL/Path, and its classification “Clinical or non”. The long-term plan is to add a data classification label and use this to drive our security and retention policy.
As of today, we default to the highest security, archiving, and retention rates because these documents live in the same space. I was hoping to train CoPilot to help me do that classification. At the very least, do a large part of it.
Is this something that is even possible? What could be a starting point?