For my final year project, I need to use apache spark and kafka for data streaming. I have already set up pyspark environment in anaconda, but as for kafka I have no clue as there are no guides that integrates kafka and spark in anaconda together. Now mostly I saw is using a docker for the whole spark, kafka and jupyter setup. As I need to implement the MLlib from pyspark also. Which one is better for the case? Anaconda or docker?
I search all over the web but there’s no up to date guide for this particular setup and I’m not sure doing the setup in anaconda or docker is better
Window Man is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.