I’ve managed to accurately (albeit untidily) create a growth accounting model that flags football players based on their activity. E.g. if a player didn’t have a record last year and they do this year ‘new’. Other flags include ‘churned’, ‘retained’, ‘resurrected’, ‘stale’ and ‘inactive’.
When I run these in the Snowflake DW they work properly although I do have to enter the years manually so for the yesterday CTE I would have to manually type 2007 and for the today CTE for comparison in the same query I would also have to manually type 2008.
I have been using dbt to create all of my models so far and am now trying to iterate over seasons with the yesterday CTE being {{ season – 1 }} and the today CTE as well as also manually defining the current season later on in another CTE as {{ season }}.
Pretty new to dbt but I understand that using macros is a good place to start with this kind of thing. My first attempt was to store and execute queries for each season as seen below in the MACRO code and then execute in the MODEL.
However, when doing this the connection to Snowflake doesn’t seem to be working properly or when the code was slightly different it provided the first line of a DDL statement to create or replace a transient table (transient standard for Snowflake).
I also tried printing each query and separating them with a UNION ALL but it didn’t seem to like that either.
Running out of ideas now so would be grateful for any help!
MACRO
{% macro generate_seasonal_model(start_season, end_season ) %}
{% for season in range(start_season, end_season + 1) %}
{% set season_query %}
WITH yesterday as (
SELECT *
FROM player_activity_growth_accounting
WHERE current_season = {{ season - 1 }}
),
today as (
SELECT
player_id
, COUNT(1) as number_of_events
, MAX(season_year_end) AS current_season
, MAX(
CASE
WHEN (
COALESCE(ABS(goals_scored), 0) + COALESCE(ABS(penalties_scored),0) + COALESCE(ABS(total_goals),0) + COALESCE(ABS(total_assists),0) +
COALESCE(ABS(away_yellow_cards),0) + COALESCE(ABS(home_yellow_cards),0) + COALESCE(ABS(total_yellow_cards),0) +
COALESCE(ABS(total_goals_conceeded),0) + COALESCE(ABS(total_minutes_played),0) + COALESCE(ABS(away_cleansheets),0) +
COALESCE(ABS(home_cleansheets),0) + COALESCE(ABS(total_cleansheets),0) + COALESCE(ABS(total_wins),0) + COALESCE(ABS(total_draws),0) +
COALESCE(ABS(total_lost),0) + COALESCE(ABS(total_appearances),0) + COALESCE(ABS(total_lineups),0)
) > 0 THEN season_year_end
ELSE null
END
) AS active_season
, CASE
WHEN (
COALESCE(ABS(goals_scored), 0) + COALESCE(ABS(penalties_scored),0) + COALESCE(ABS(total_goals),0) + COALESCE(ABS(total_assists),0) +
COALESCE(ABS(away_yellow_cards),0) + COALESCE(ABS(home_yellow_cards),0) + COALESCE(ABS(total_yellow_cards),0) +
COALESCE(ABS(total_goals_conceeded),0) + COALESCE(ABS(total_minutes_played),0) + COALESCE(ABS(away_cleansheets),0) +
COALESCE(ABS(home_cleansheets),0) + COALESCE(ABS(total_cleansheets),0) + COALESCE(ABS(total_wins),0) + COALESCE(ABS(total_draws),0) +
COALESCE(ABS(total_lost),0) + COALESCE(ABS(total_appearances),0) + COALESCE(ABS(total_lineups),0)
) > 0 THEN 1
ELSE 0
END AS is_active
FROM
{{ ref('stg_player_statistics') }}
WHERE season_year_end = {{ season }}
GROUP BY player_id, is_active
),
combined as (
SELECT
COALESCE(y.player_id, t.player_id) as player_id
, {{ season }} as current_season
, COALESCE(t.is_active, 0) as is_active
, COALESCE(y.first_active_season, t.active_season) as first_active_season
, COALESCE(t.active_season, y.last_active_season) as last_active_season
, CASE
WHEN y.all_seasons_active IS NULL AND t.active_season IS NOT NULL THEN ARRAY_CONSTRUCT(t.active_season)
WHEN t.active_season IS NULL THEN y.all_seasons_active
ELSE ARRAY_CAT(y.all_seasons_active, ARRAY_CONSTRUCT(t.active_season))
END AS all_seasons_active
, y.last_active_season as last_known_active_season
FROM yesterday y
FULL OUTER JOIN today t
ON y.player_id = t.player_id
)
SELECT
l.player_id
, l.current_season
, l.first_active_season
, l.last_active_season
, l.all_seasons_active
, CASE
WHEN l.is_active = 1 AND l.first_active_season = l.current_season THEN 'new'
WHEN l.is_active = 1 AND l.current_season - l.last_known_active_season = 1 THEN 'retained'
WHEN l.is_active = 1 AND l.current_season - l.last_known_active_season > 1 THEN 'resurrected'
WHEN COALESCE(l.is_active, 0) = 0 AND l.current_season - l.last_active_season = 1 THEN 'churned'
WHEN COALESCE(l.is_active, 0) = 0 AND l.current_season - l.last_active_season > 1 THEN 'stale'
WHEN COALESCE(l.is_active, 0) = 0 AND l.first_active_season IS NULL THEN 'inactive'
END AS seasonal_active_state
FROM combined l
{% endset %}
{% do run_query(season_query) %}
{% endfor %}
{% endmacro %}
MODEL
{% set seasons = {'start_season': 2007, 'end_season': 2025} %}
{{ generate_seasonal_model(**seasons) }}
The main areas that have been tested so far are:
- Variations of iterating over seasons in a macro; and
- UNION ALL with each season’s query concatenated together.
Hugo RIley is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.