Dependency DAG
Description
Pretty straight forward, basically, I am reading some parquet files from disk using polars which are the source of data. Doing some moderately heavy duty processing (a few million rows) to generate an intermediate data frame, then generating two results which need to be written back to some database
Technology Stack
- Ubuntu 22.04
- Python 3.10
- Polars 1.2.1
Question
Polars recommends using lazy evaluations as far as possible to optimise the execution. Now, the final results (result_1
and result_2
) obviously have to be materialised.
But if I call these two in sequence
#! /usr/bin/env python3
# encoding: utf-8
import polars as pl
...
result_1.collect() # Materialise result 1
result_2.collect() # Materialise result 2
Is the transformation from the source to intermediate frame (common ancestor) repeated? If so, it is clearly undesirable. In that case, I have to materialise the intermediate frame and then do the rest of the processing in eager mode.
Any documentation from polars on the expected behaviour and recommended practices around this scenario?