Using nested lag or any other window function

I am working on a transaction dataset with one date column & one identifier column. Now retroactively I want to create a boolean column to check whether any particular activity (e.g. marketing campaign) should have been sent for that identifier if the last time a campaign sent was before 30 days. I can use lag function but the problem is that the starting point would keep on changing.

For example, on first transaction date a campaign would be sent, then for any transaction within next 30 days no campaign; then, let’s say next transaction happens on 35th day from first transaction; then, send a campaign and now the new counter for next 30 days should start from 35th day.

I’m haven’t been able to figure out how I can achieve it. I am using redshift sql (table for reference below)

Identifier	Date (YYYY-MM-DD time)	desired column
A	2024-06-30 11:22:03	FALSE
A	2024-06-16 14:02:36	FALSE
A	2024-06-15 8:15:57	TRUE
A	2024-05-24 14:30:57	FALSE
A	2024-05-10 4:45:06	FALSE
A	2024-05-08 15:19:48	FALSE
A	2024-05-07 15:09:14	FALSE
A	2024-05-06 4:16:39	FALSE
A	2024-05-04 10:37:16	TRUE
A	2024-04-08 5:02:00	FALSE
A	2024-04-06 9:03:23	FALSE
A	2024-03-30 11:05:55	TRUE
A	2024-03-16 8:39:56	FALSE
A	2024-03-15 14:06:10	FALSE
A	2024-02-28 16:55:28	TRUE

Thanks for the help!

You are looking for an iteration. The first date is 2024-02-28. With this date you are looking for the first date that is after its 30 days range. That date is 2024-03-30. Then again you want to find the first date after its 30 day range, and so on.

Iteration is done with recursive queries in SQL.

with recursive
  starters (identifier, ts) as
  (
    select identifier, min(ts)
    from mytable
    group by identifier
     union all
    select t.identifier, min(t.ts)
    from starters s
    join mytable t on t.identifier = s.identifier
                  and t.ts > s.ts + interval '30' day
    group by t.identifier
  )
select 
  t.*,
  exists 
  (
    select null
    from starters s
    where s.identifier = t.identifier
    and s.ts = t.ts
  ) as flag
from mytable t 
order by t.identifier, t.ts;

If redshift is not yet capable of handling aggregation in recursive CTEs as you say below in the comments, then use another way to get the top 1 row per group. Getting the top 1 means there does not exist a better one, so you may be able to solve this with NOT EXISTS.

The part

select identifier, min(ts)
from mytable
group by identifier

can also be written as

select identifier, ts
from mytable t
where not exists
(
  select null
  from mytable better
  where better.identifier = t.identifier
  and better.ts < t.ts
)

for instance.

I tried kind of a hacky solution for now which worked –

with min_ts as
(
    select
        identifier,
        min(ts) as min_ts
    from my_table
    group by 1
),
step_2 as
(
    select
        a.*,
        floor(datediff(second, b.min_ts, a.ts)/2592000::DOUBLE PRECISION) as floor_days_between_ts
    from my_table a
    inner join min_ts b
    on a.identifier = b.identifier
),
step_3 as
(
    select
        identifier,
        floor_days_between_ts,
        min(ts) as min_ts
    from step_2
    group by 1,2
)
select
    a.*,
    case when a.ts = b.min_ts then True::bool
         else False::bool end as campaign
from step_2 a
left join step_3 b
on a.identifier = b.identifier and a.floor_days_between_ts = b.floor_days_between_ts
order by a.identifier, a.ts desc
;

One option to do it is to find refference dates. You could try to do it combining dates and differences in days between them with a few ctes…

--    S a m p l e    D a t a :
Create Table tbl (identifier Varchar(6), dt TimeStamp);
Insert Into tbl VALUES 
('A',   '2024-06-30 11:22:03'),
('A',   '2024-06-16 14:02:36'),
('A',   '2024-06-15 08:15:57'),
('A',   '2024-05-24 14:30:57'), 
('A',   '2024-05-10 04:45:06'),
('A',   '2024-05-08 15:19:48'), 
('A',   '2024-05-07 15:09:14'), 
('A',   '2024-05-06 04:16:39'), 
('A',   '2024-05-04 10:37:16'),
('A',   '2024-04-08 05:02:00'),
('A',   '2024-04-06 09:03:23'), 
('A',   '2024-03-30 11:05:55'), 
('A',   '2024-03-16 08:39:56'), 
('A',   '2024-03-15 14:06:10'),
('A',   '2024-02-28 16:55:28');

… removed the time part and combined the dates (practicaly cross join) to get all day differences between them …

WITH
  dates  as
    ( Select    Row_Number() Over(Partition By d.identifier, d.dt_0 Order By d.dt_1) as rn_0, 
                d.identifier, d.dt_0, d.dt_1, d.diff
      From      ( Select t1.identifier, Date_Trunc('day', t1.dt) as dt_1, 
                         Case When DATE_PART('day', Date_Trunc('day', t1.dt) - Date_Trunc('day', t2.dt)) >= 30 
                              Then Date_Trunc('day', t2.dt)
                         End as dt_0, 
                         DATE_PART('day', Date_Trunc('day', t1.dt) - Date_Trunc('day', t2.dt)) as diff,
                         Min(Date_Trunc('day', t1.dt)) Over(Partition By t1.identifier) as start_dt
                   From   tbl t1
                   Inner Join tbl t2 ON(t1.identifier = t2.identifier) 
                ) d
      Where     diff >= 30 
   ),

… filtering the resultset …

  grid as 
    ( Select  Distinct d0.*, d1.dt_0 as dt_00, d1.dt_1 as dt_11
      From    dates d0
      Inner Join dates d1 on(d1.identifier = d0.identifier And 
                             d1.dt_0 = d0.dt_1 And 
                            d0.rn_0 = 1 And d1.rn_0 = 1
                           )
      Where d0.rn_0 = 1 And d0.dt_1 = d1.dt_0
   ),

… getting refference dates using UNION (not UNION ALL) to get ridd of duplicates …

  reff_dates as
   ( Select identifier, Min(dt_0) as dt_0 From grid Group By identifier UNION
     Select identifier, Min(dt_1)  From grid Group By identifier UNION
     Select identifier, Min(dt_11) From grid Group By identifier UNION
     Select Distinct identifier, dt_11 From grid
  )

… reference dates are left joined to your data defining the values ‘TRUE’/’FALSE’ for the new column using Case expressions …

--      M a i n    S Q L :
Select  t.identifier, t.dt,
        Case When r.dt_0 Is Null Then 'FALSE'
        Else 'TRUE'
        End as new_column
From    tbl t
Left Join reff_dates r ON( r.dt_0 = Date_Trunc('day', t.dt) )
Order By t.dt Desc

/*      R e s u l t :
identifier  dt                      new_column
---------   ----------------------  ----------
A           2024-06-30 11:22:03     FALSE
A           2024-06-16 14:02:36     FALSE
A           2024-06-15 08:15:57     TRUE
A           2024-05-24 14:30:57     FALSE
A           2024-05-10 04:45:06     FALSE
A           2024-05-08 15:19:48     FALSE
A           2024-05-07 15:09:14     FALSE
A           2024-05-06 04:16:39     FALSE
A           2024-05-04 10:37:16     TRUE
A           2024-04-08 05:02:00     FALSE
A           2024-04-06 09:03:23     FALSE
A           2024-03-30 11:05:55     TRUE
A           2024-03-16 08:39:56     FALSE
A           2024-03-15 14:06:10     FALSE
A           2024-02-28 16:55:28     TRUE         */

See the fiddle here.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 11:27

Thẻ: sqlpostgresqldateamazon-redshiftwindow-functions

Thiết kế website giá rẻ

Danh mục

Using nested lag or any other window function