I have table with over 35 million email records.
I need to get all of the records between a date range and also exist at least 3 times in the table.
1 email may have more than one property but will never have the same property twice.
The table looks like this:
PROPERTY | LAST_SENT_DATE | |
---|---|---|
email1 | prop1 | 2024-06-25 |
email2 | prop1 | 2024-06-16 |
email3 | prop2 | 2024-06-02 |
email4 | prop3 | 2024-06-25 |
email1 | prop2 | 2024-06-26 |
email1 | prop3 | 2024-06-10 |
So I want to write a query that will return the emails in June but exist more than 3 times.
SELECT * FROM `myTable`
WHERE
(
(`last_sent_date` BETWEEN '2024-06-01' AND '2024-06-30')
AND
(//...SOME PROPERTY COUNT HERE)
)
The results should look like this:
PROPERTY | LAST_SENT_DATE | |
---|---|---|
email1 | prop1 | 2024-06-25 |
email1 | prop2 | 2024-06-26 |
email1 | prop3 | 2024-06-10 |
I’m guessing it needs to do some kind of COUNT on PROPERTY, but I’m not sure where or how to add it.
5
WITH
cte AS (
SELECT email, property, last_sent_date,
COUNT(*) OVER (PARTITION BY email) cnt
FROM test
WHERE last_sent_date BETWEEN '2024-06-01' AND '2024-06-30'
)
SELECT email, property, last_sent_date
FROM cte
WHERE cnt >= 3;
property | last_sent_date | |
---|---|---|
email1 | prop1 | 2024-06-25 |
email1 | prop2 | 2024-06-26 |
email1 | prop3 | 2024-06-10 |
fiddle
4
Well, it’s pretty much straightforward, you can use an EXIST
clause. this is the more efficient way because it allows the database to stop searching as soon it finds a matching row:
Here is the code:
SELECT mt.*
FROM `myTable` mt
WHERE mt.`last_sent_date` BETWEEN '2024-06-01' AND '2024-06-30'
AND EXISTS (
SELECT 1
FROM `myTable` mt2
WHERE mt2.`email` = mt.`email`
GROUP BY mt2.`email`
HAVING COUNT(DISTINCT mt2.`property`) > 3
)
This query uses an EXIST
clause to check if there is at least one row in the subquery is matches the email
column.
Just advice: consider creating an index on the email
and last_sent_date
columns to improve the performance.
5
Find below some of the options to get your expected result with the data provided:
-- S a m p l e D a t a :
Create Table tbl ( EMAIL Text, PROPERTY Text, LAST_SENT_DATE Date );
Insert Into tbl
Select 'email1', 'prop1', '2024-06-25' Union All
Select 'email2', 'prop1', '2024-06-16' Union All
Select 'email3', 'prop2', '2024-06-02' Union All
Select 'email4', 'prop3', '2024-06-25' Union All
Select 'email1', 'prop2', '2024-06-26' Union All
Select 'email1', 'prop3', '2024-06-10';
1. Inner Join with filtered data
SELECT t.*
FROM ( Select EMAIL
From tbl
Where LAST_SENT_DATE Between '2024-06-01' And '2024-06-30'
Group By EMAIL
Having Count(EMAIL) >= 3 ) e
INNER JOIN tbl t ON(t.EMAIL = e.EMAIL)
2. Filtered using Where Clause’s IN() condition
SELECT t.*
FROM tbl t
WHERE t.EMAIL IN( Select EMAIL
From tbl
Where LAST_SENT_DATE Between '2024-06-01' And '2024-06-30'
Group By EMAIL
Having Count(EMAIL) >= 3
)
3. Filtered with EXISTS(corelated subquery)
SELECT t.*
FROM tbl t
WHERE LAST_SENT_DATE Between '2024-06-01' And '2024-06-30' And
EXISTS ( Select 1
From tbl
Where EMAIL = t.EMAIL
Group By EMAIL
Having Count(EMAIL) >= 3
)
… all of the above sql statements have the same outcome:
/* R e s u l t :
EMAIL PROPERTY LAST_SENT_DATE
------ ---------- --------------
email1 prop1 2024-06-25
email1 prop2 2024-06-26
email1 prop3 2024-06-10 */
See the fiddle here.
NOTE:
With real data the results will not be the same depending on an open question about number of rows with the same EMAIL should be only within the date range or it can be counted together with rows before the range / after the range / before and after the range. Where condition filtering the date range should be moved accordingly (and/or adjusted) to get what you realy want.
Code no 1. and 2. filter the rows counting EMAIL appearances just within the date range. Code no. 3 does the counting over all the data.