I have a tech stack consisting of Hadoop as a distributed storage of raw data, HBase as a NoSQL database that runs on top of Hadoop and HDFS, Hive as an RDBMS data warehouse on top of Hive for applying structure(tables) on an unstructured data, Spark as an in-memory batch processing and execution engine on Hive, and Spark SQL for running queries.
There may be upsert operations while I am reading messages from Kafka.
How would I better read messages and data from Kafka with this stack?
It means in which layer(hdfs, hbase, hive) should I Insert the input message to have a more updated batch processing on the Spark with consideration of the ACID principle in all layers?