Now that Lucene supports joins (at indexing time and at querying time) can one use Lucene as a databse (a NoSQL one, with Eventual Consistency)?
Note: I was pondering on that for sometime and this is an idea that comes around again and again from time to time and there are proof that on actually can do that – to some extent (RavenDB). Besides I think referential integrity is over-rated – I never use foreign keys in a RDBMS when I work on a big, fast changing project with a small team; from experience it’s pure maintenance headache and kills productivity.
The only thing that seems to me a hurdle in building up this mindset is the lack of transactions – and yes; you can have 2 step transactions (with say MongoDB and Lucene uses that for some internals) but that’s lots of work; and I do not know how that would be possible with Lucene.
2
Yes. I’ve done projects in the past where we essentially used lucene as a data store in lieu of a database. This was long before NoSQL was hot. Really there’s no fixed definition of what qualifies as a NoSQL database, so anything that stores and retrieves data is sufficient. Things like dbm files have been around forever.
The main downside I see is updates. The only way to do an update in the Lucene ecosystem was to read the whole document, modify it, delete the original, and write back the contents as a new document. Some syntactic sugar has been added to Solr to make this easier. Lucene itself doesn’t support per-field updates, though some stacked update feature appears to be being worked on.
Otherwise, for read-only data, you won’t get a better general purposes way of slicing/dicing/analyzing data. We’ve done a lot of work for clients where they use Lucene (through Solr) just for the fact that every column is indexed, and its easy to lookup/filter/group/facet on anything. That’s why tools like kibana (built on ElasticSearch, another service built on top of Lucene) are so powerful even though they have relatively little to do with fulltext search.