if we have a HIVE query where I have applied “where” clause two filters
- with period = 24May2024 and
- status = ‘active’
then will it shuffle all the partitions to filter ** active** result as a second filter
for eg.
with
a as
(select * from table_1
where period = ’24May2024′ and status = ‘active’)
, b as
(select * from table_2)
select a., b.
from a join b on a.id = b.id
someone recommended that we do this, but want to make sure filtering the active records in “a” will shuffle less data or putting it in the end of main statement will be better and speed us the query.
with
a as
(select * from table
where period = ’24May2024′)
, b as
(select * from table_2)
select a.* , b.*
from a join b on a.id = b.id
where status = ‘active’
kamal is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.