I am currently working on a big data project, utilizing Spark for stream processing.
The system functions as follows: scheduled scrapers retrieve data from the web, which is then stored in a database. Real-time similarity checks and complex processing occur to provide users with the desired results promptly.
In short, a user defines their preferences, including various personal information stored in the database, triggering a thorough similarity check to generate top relevant recommendations in real time.
Given this scenario, what database is likely to be a suitable choice?”
I wrote that hadoop is very helpful in such situations but I will be needing another database layer like Hbase to interact with my web app.