- For previous generations, there is macOS Spotlight and Linux Tracker
- What we want now is vector search that uses word proximity to get
better search results and use of natural language to ask more specific questions
.
What open source code is available from like Facebook or Open AI or otherwise that can:
.
- Extract words from files that may be somewhat complicated, like pdf or docx
- Process a list of directory-trees separately, each of terabyte size
- Save to an in-file-system database similar to SQLite, partitioned by directory so each can be updated independently
- have a search function merging partition-hits, possibly via Web interface
- Runnable on M processor and Linux/Intel, but AI part can be limited to M gpu or neural engine
- Preferably controllable from Go threading on general purpose hardware, ie. rather a C++ library than version-specific Python
.
I have already built this for audio around Whisper. What can be done for documents?
Thanks,