I want to implement Lucene for full text search. I have a table with 6 million records in an sql database. Each minute around thousand new rows will be added from the application.
Index creation in Lucene takes a lot of time. Each time I delete or create a new index from the database because the user needs to see the updated records on search.
How can I have fast enough search capability?
1) I need updated and history values on search
2) The maximum delay should be 2 seconds
2
Lucene indexes don’t have to be recreated from scratch each time they’re changed. Much like your database tables, you can insert new records, delete old ones and affect an update using a delete/add pair.
Most databases have a mechanism which will allow calling external code, and you can leverage this in triggers that run on INSERT
, UPDATE
and DELETE
to maintain your index. Alternately, you can do the updates outside the database as your batches of data arrive, but you risk having (bigger) synchronization problems if something fails mid-stream.
Also, consider whether or not using Lucene is even necessary. Oracle, PostgreSQL and MySQL (and probably others) all have pre-integrated full-text search that may be good enough for your application and won’t need much additional work on your part.
1