I’m looking for advice on stream processing engines that allow me to do the following:
- Required: Write a query that joins an event stream with a historical table in Snowflake
- Required: Executes in near-real time < 5s even if a query involves 300M rows
- Highly desired: Gives me a way of doing dbt-like DAGs, where I can execute a DAG of actions (including external api calls) based on results of the query
- Highly desired: allows me to write queries in standard SQL
- Desired: true real time (big queries executing w/ subsecond latency)
What are the best options out there? It seems like Apache Flink enables this, but there also seem to be a number of other projects out there that may enable some or all of what I’m describing, including:
- kSQL
- Arroyo
- Proton
- Kafka Streams
- Snowflake’s Snowpipe Streaming
- Benthos
- RisingWave
- Spark Streaming
- Apache Beam
- Timely Dataflow and derivatives (Materialize, Bytewax, etc.)
Any recommendations on the best tool for the job? Are there interesting alternatives that I haven’t named?
I’ve researched the available streaming systems, but it’s hard to parse out whether the enable my functionality, given that I don’t have hands-on experience yet.
warrenbhw is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.