I am experiencing a problem with the differences between the data I see in BigQuery and the data GA4 shows. Although sessions and events appear the same, I am having trouble categorizing channels. I am using the default defined columns within GA4, as shown below. Perhaps someone has an idea on how to fix this?
CASE
WHEN (source IS NULL OR source = '(direct)' OR source = '') AND (medium IS NULL OR medium IN ('(not set)', '(none)', '')) THEN 'Direct'
WHEN REGEXP_CONTAINS(name, 'cross-network') THEN 'Cross-network'
WHEN (REGEXP_CONTAINS(source, 'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
OR REGEXP_CONTAINS(name, '^(.*(([^a-df-z]|^)shop|shopping).*)$'))
AND REGEXP_CONTAINS(medium, '^(.*cp.*|ppc|retargeting|paid.*)$') THEN 'Paid Shopping'
WHEN (REGEXP_CONTAINS(source, 'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
AND REGEXP_CONTAINS(medium, '^(.*cp.*|ppc|retargeting|paid.*)$')) OR gclid IS NOT NULL THEN 'Paid Search'
WHEN REGEXP_CONTAINS(source, 'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
AND REGEXP_CONTAINS(medium, '^(.*cp.*|ppc|retargeting|paid.*)$') THEN 'Paid Social'
WHEN REGEXP_CONTAINS(source, 'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
AND REGEXP_CONTAINS(medium, '^(.*cp.*|ppc|retargeting|paid.*)$') THEN 'Paid Video'
WHEN medium IN ('display', 'banner', 'expandable', 'interstitial', 'cpm') THEN 'Display'
WHEN REGEXP_CONTAINS(source, 'alibaba|amazon|google shopping|shopify|etsy|ebay|stripe|walmart')
OR REGEXP_CONTAINS(name, '^(.*(([^a-df-z]|^)shop|shopping).*)$') THEN 'Organic Shopping'
WHEN REGEXP_CONTAINS(source, 'badoo|facebook|fb|instagram|linkedin|pinterest|tiktok|twitter|whatsapp')
OR medium IN ('social', 'social-network', 'social-media', 'sm', 'social network', 'social media') THEN 'Organic Social'
WHEN REGEXP_CONTAINS(source, 'dailymotion|disneyplus|netflix|youtube|vimeo|twitch|vimeo|youtube')
OR REGEXP_CONTAINS(medium, '^(.*video.*)$') THEN 'Organic Video'
WHEN REGEXP_CONTAINS(source, 'baidu|bing|duckduckgo|ecosia|google|yahoo|yandex')
OR medium = 'organic' THEN 'Organic Search'
WHEN REGEXP_CONTAINS(medium, '^(.*cp.*|ppc|retargeting|paid.*)$') THEN 'Paid Other'
WHEN REGEXP_CONTAINS(source, 'email|e-mail|e_mail|e mail')
OR REGEXP_CONTAINS(medium, 'email|e-mail|e_mail|e mail') THEN 'Email'
WHEN medium = 'affiliate' THEN 'Affiliates'
WHEN medium = 'referral' THEN 'Referral'
WHEN medium = 'audio' THEN 'Audio'
WHEN medium = 'sms' THEN 'SMS'
WHEN medium LIKE '%push%' OR REGEXP_CONTAINS(medium, 'mobile|notification') THEN 'Mobile Push Notifications'
ELSE 'Unassigned'
END
If anyone has suggestions for this, I would love to hear them.
I have figured out several possibilities, but I still cannot figure out the problem. In fact, I keep seeing a difference between the data in BigQuery and GA4, sometimes only 5%, but in some cases it can go up to 30%.
2