We also are using the EXTRACT(DATETIME FROM TIMESTAMP_MICROS( …see below method to convert the event timestamp column to the date in the time zone of the property.
Here’s the query we’re running in GA4
SELECT
params.value.string_value,
-- count(event_timestamp) as Events
-- EXTRACT(YEAR FROM PARSE_DATE('%Y%m%d', event_date)) AS year,
-- EXTRACT(MONTH FROM PARSE_DATE('%Y%m%d', event_date)) AS month,
-- EXTRACT(DAY FROM PARSE_DATE('%Y%m%d', event_date)) AS day,
COUNT(DISTINCT IF(stream_id = '2653072043', user_pseudo_id, NULL)) AS ios_users,
COUNT(DISTINCT IF(stream_id = '2350467728', user_pseudo_id, NULL)) AS android_users,
--COUNT(DISTINCT IF(stream_id IN ('2653072043', '2350467728'), user_pseudo_id, NULL)) AS total_monthly_users
FROM `mproov-58674.analytics_263332939.events_*`, UNNEST(event_params) AS params
WHERE
EXTRACT(MONTH FROM TIMESTAMP_MICROS(event_timestamp) at TIME ZONE ('America/Los_Angeles')) = 4
AND EXTRACT(YEAR FROM TIMESTAMP_MICROS(event_timestamp) at TIME ZONE ('America/Los_Angeles')) = 2024
AND EXTRACT(DAY FROM TIMESTAMP_MICROS(event_timestamp) at TIME ZONE ('America/Los_Angeles')) = 20
# yeild same results
#_TABLE_SUFFIX BETWEEN '20240420' AND '20240420'
#PARSE_DATE('%Y%m%d', event_date) BETWEEN DATE '2024-04-20' AND DATE '2024-04-21'
AND event_name = 'session_start'
#AND (stream_id = '2653072043' or stream_id = '2350467728')
#AND platform = 'iOS'
#AND platform = 'Android'
AND params.key = 'schoolId'
AND (params.value.string_value = '40' OR params.value.string_value = '41' OR params.value.string_value = '42')
GROUP BY params.value.string_value
It’s especially odd because when we just look at overall Daily Active, Weekly Active, Monthly Active users (different date ranges in our BigQuery queries) and Daily new downloads, weekly new downloads, and monthly new downloads, we DO see GA4 results matching up with BigQuery.
Why are we starting to see discrepancies only when we look into specific cohorts (params.key = schoolID) – and how do we reconcile these discrepancies? We know there’s something that goes wrong when we filter in the WHERE clause by date, but we don’t know why, or what the workaround query would be.
We fixed the time zone settings but still can’t match up our daily active users per specific cohorts in BigQuery to Google Analytics 4.
Everything matches between GA4 and BigQuery when we aren’t looking at the school cohorts and just look at the total app active users or total app downloads across any time range.
Below is an example of a query I wrote for getting new downloads at the week level – it matches up perfectly with GA4 results. We know there’s something that goes wrong when we filter in the WHERE clause by date, but we don’t know why, or what the workaround query would be.
SELECT
EXTRACT(YEAR FROM PARSE_DATE(‘%Y%m%d’, event_date)) AS year,
EXTRACT(WEEK FROM PARSE_DATE(‘%Y%m%d’, event_date)) AS week,
COUNT(DISTINCT IF(stream_id = ‘2653072043’, user_pseudo_id, NULL)) AS ios_downloads,
COUNT(DISTINCT IF(stream_id = ‘2350467728’, user_pseudo_id, NULL)) AS android_downloads,
COUNT(DISTINCT IF(stream_id IN (‘2653072043’, ‘2350467728’), user_pseudo_id, NULL)) AS total_weekly_downloads
FROM
`analytics_263332939.events_*`
WHERE
event_name = ‘first_open’
GROUP BY
year, week
ORDER BY
year, week;
fruitroops is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.