I have a table which describe place where product is selling as below:
PRODUCT | Position |
---|---|
A | 1, 2, 3 |
B | 2 |
C | 7,8,9, 10 |
D | 2,3,4 |
E | 7,8,9 |
I want to get Products with match two or more selling positions in hive sql
Expect result:
A,D
C,E
2
- You can first UNNEST the positions, this is done in CTE
positions_exploded
- Then it is SELF JOINED on the condition that positions should be same and product pairs should be counted only once.
- Finally we count the pairs which has matching positions for 2 or more.
I have tried in postgres as I dont have hive, however I think you need to change some of the syntax for hive, you can explode like shown in this example
Fiddle in Postgres
WITH positions_exploded AS (
SELECT
product,
unnest(position) AS pos
FROM
product_position
)
,
matched_products AS (
SELECT
p1.product AS product1,
p2.product AS product2,
COUNT(*) AS common_positions
FROM
positions_exploded p1
JOIN
positions_exploded p2
ON
p1.pos = p2.pos AND p1.product < p2.product
GROUP BY
p1.product, p2.product
HAVING
COUNT(*) >= 2
)
SELECT
product1,
product2
FROM
matched_products;
Output
product1 | product2 |
---|---|
A | D |
C | E |
You can also concat the output in comma separated values like
STRING_AGG(product1 || ',' || product2, ', ') AS product_pairs