I have a pl.DataFrame
with a start_date
and end_date
column. I need to compute the difference between those two columns and add new columns representing the result in days
, weeks
, months
and years
.
I would be fine to get an approximate result, meaning dividing the days by 7 / 30 / 365.
My problem is to convert the duration[ns]
type into an integer type.
<code>import datetime
import polars as pl
df = pl.DataFrame(
{"start_date": datetime.date(2024, 1, 1), "end_date": datetime.date(2024, 7, 31)}
)
df = df.with_columns((pl.col("end_date") - pl.col("start_date")).alias("days"))
print(df)
shape: (1, 3)
┌────────────┬────────────┬──────────────┐
│ start_date ┆ end_date ┆ days │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ duration[ms] │
╞════════════╪════════════╪══════════════╡
│ 2024-01-01 ┆ 2024-07-31 ┆ 212d │
└────────────┴────────────┴──────────────┘
</code>
<code>import datetime
import polars as pl
df = pl.DataFrame(
{"start_date": datetime.date(2024, 1, 1), "end_date": datetime.date(2024, 7, 31)}
)
df = df.with_columns((pl.col("end_date") - pl.col("start_date")).alias("days"))
print(df)
shape: (1, 3)
┌────────────┬────────────┬──────────────┐
│ start_date ┆ end_date ┆ days │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ duration[ms] │
╞════════════╪════════════╪══════════════╡
│ 2024-01-01 ┆ 2024-07-31 ┆ 212d │
└────────────┴────────────┴──────────────┘
</code>
import datetime
import polars as pl
df = pl.DataFrame(
{"start_date": datetime.date(2024, 1, 1), "end_date": datetime.date(2024, 7, 31)}
)
df = df.with_columns((pl.col("end_date") - pl.col("start_date")).alias("days"))
print(df)
shape: (1, 3)
┌────────────┬────────────┬──────────────┐
│ start_date ┆ end_date ┆ days │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ duration[ms] │
╞════════════╪════════════╪══════════════╡
│ 2024-01-01 ┆ 2024-07-31 ┆ 212d │
└────────────┴────────────┴──────────────┘
1
you can use dt.total_days()
to extract hours from Duration datatype:
<code>df.with_columns(
days = (pl.col.end_date - pl.col.start_date).total_days()
)
┌────────────┬────────────┬──────┐
│ start_date ┆ end_date ┆ days │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ i64 │
╞════════════╪════════════╪══════╡
│ 2024-01-01 ┆ 2024-07-31 ┆ 212 │
└────────────┴────────────┴──────┘
</code>
<code>df.with_columns(
days = (pl.col.end_date - pl.col.start_date).total_days()
)
┌────────────┬────────────┬──────┐
│ start_date ┆ end_date ┆ days │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ i64 │
╞════════════╪════════════╪══════╡
│ 2024-01-01 ┆ 2024-07-31 ┆ 212 │
└────────────┴────────────┴──────┘
</code>
df.with_columns(
days = (pl.col.end_date - pl.col.start_date).total_days()
)
┌────────────┬────────────┬──────┐
│ start_date ┆ end_date ┆ days │
│ --- ┆ --- ┆ --- │
│ date ┆ date ┆ i64 │
╞════════════╪════════════╪══════╡
│ 2024-01-01 ┆ 2024-07-31 ┆ 212 │
└────────────┴────────────┴──────┘