From this link:
While batch and ETL jobs run on Hive and Spark, near real-time interactive queries run on Presto.
This is a sentiment I have seen echoed elsewhere as well. I would like to understand what is the harm in using Presto for batch and OLAP jobs with fixed (i.e., non adhoc) queries? Surely if it can handle adhoc queries it should be able to handle non adhoc queries as well? Is it because it has high memory requirements and requires everything to fit in memory (just making a guess; I am not sure if this assumption is correct)? But then, isn’t Spark also an in-memory execution engine with similar memory requirements? The documentation on these tools is pretty dense and I read conflicting documentation on different sites so asking here.