I am curious if Spark can utilize previous results to help execution, for the example below:
Inside same session, I first create a view1 with some contents. Because there is no action, nothing is triggered. I then create view2 with df=sqlContext.sql("select * from view1 where col=something"), df.createOrReplaceTempView("view2")
. Still nothing is triggered since no action.
Then I run: sqlContext.sql(“select * from view1 limit 1000”).collect(). This will trigger action and get results.
Now I want to run sqlContext.sql(“select * from view2 limit 1000”).collect().
My question is, will Spark internally optimize its execution for the 2nd query (1000 rows from view2) by using the results from 1st query (1000 rows from view1)? In theory, part of the results for 2nd query can be retrieved from 1st query, but IDK if Spark is really so intelligent.
Background: I am triaging the performance of Livy for interactive query.
Thanks.
I successfully run Livy.