inside GA4 there is a session_id which is used to identify the set of events that a user has made on site, I am using a case function to define my location names so I can merge it with a dataset of organic lead goals for a client. The challenge is that there are conversion events that occur on sections of the site where the visitor is not under a locations directory, so I need to search for a location in the rows of data and apply it to the whole session if there is one. I am using a case function to assign when a page_location contains the ‘/location/…’. The next step is identifying the first non null value for a location and apply the location based on session_id so essentially:
- Location meets case
- Find first location value for each session.
- Assign location value for each session on every row for that session.
I’ve been trying to accomplish this with the query below, with some help, but I must admit I am struggling validating my approach. Any help on validating what I am trying to do if it works would help significantly.
-- add create or overwrite table organic_leads_v5_step_2
CREATE OR REPLACE TABLE bigquery-inspired-closets.organic_leads_report.organic_leads_v5_step_3 AS(
-- Create a Common Table Expression (CTE) to get the first non-null location_name for each session
WITH first_location_per_session AS (
SELECT
ga_session_id,
location_name,
ROW_NUMBER() OVER(PARTITION BY ga_session_id ORDER BY event_timestamp ASC) AS row_num
FROM `bigquery.data`
WHERE location_name IS NOT NULL
),
-- Select the first non-null location_name per session
location_key AS (
SELECT
ga_session_id,
location_name AS location_name_key
FROM first_location_per_session
WHERE row_num = 1
)
-- Merge the location_name_key with the original data
SELECT
original.*,
PARSE_DATE('%Y-%m-%d',
FORMAT("%s-%s-%s",
SUBSTR(original.event_date, 1, 4),
SUBSTR(original.event_date, 5, 2),
SUBSTR(original.event_date, 7, 2))) AS formatted_event_date,
lk.location_name_key
FROM `bigquery.data` AS original
LEFT JOIN location_key AS lk
ON original.ga_session_id = lk.ga_session_id)
Here is a truncated example of what the data looks like:
page_location url.com/location/place
ga_session_id 12341234
event_name page_view
event_date 20230908
event_timestamp 1694171148103529
session_source google
session_medium organic
session_campaign (not set)