How to delete repeating records with amount differing only by 5%? by amount in the range amout-5% to amount+5%?
Here is an example table on hive
[# a.uk b.uk AMOUNT START_DATE FINISHED_DATE source
1 01 2 61.00 07.06.2020 08.07.2020 a
2 01 2 67.01 07.06.2020 08.07.2020 a
3 01 2 68.00 08.06.2020 09.07.2020 a
4 01 2 66.10 07.06.2020 08.07.2020 a
5 02 3 100.00 05.05.2022 01.01.2024 b
...]
Here the duplicate of entry #1 is #2 and #4. No. 3 has a distinct START_DATE and FINISHED_DATE, so it is a unique record.
I have tried using cte table
///
...
COUNT(*) OVER (
PARTITION BY a.uk, b.uk, AMOUNT, START_DATE, FINISHED_DATE, source
ORDER BY AMOUNT
RANGE BETWEEN 0.05 * AMOUNT PRECEDING AND 0.05 * AMOUNT FOLLOWING
) AS count_within_range
FROM temp_table t1
) t
WHERE t.count_within_range > 1
...
///
But, hive throws an error ‘cannot recognize input near ‘*’ ‘.’ in windowframeboundary‘
New contributor
Liquid skyru is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
1