For my data in question, I’m tracking student long-term absences from a PHD program. This could be due to an internship, medical issue, or personal reasons. The data looks like this:
Student ID | Action | Effective Date |
---|---|---|
A1 | Leave | 2024-01-01 |
A1 | Return | 2024-03-01 |
A1 | Leave | 2024-06-01 |
A1 | Return | 2024-08-01 |
A1 | Leave | 2025-01-01 |
A1 | Return | 2025-06-01 |
A2 | Leave | 2024-01-01 |
A2 | Return | 2024-05-01 |
Assume students can leave and return 3 separate times maximum
The end result of what I want is:
Student ID | Absence Number | Leave Date | Return Date |
---|---|---|---|
A1 | 1 | 2024-01-01 | 2024-03-01 |
A1 | 2 | 2024-06-01 | 2024-08-01 |
A1 | 3 | 2025-01-01 | 2025-06-01 |
A2 | 1 | 2024-01-01 | 2024-05-01 |
Wondering how to best achieve this.. I know CTEs are an option.. but the confusing part is how to get the first leave date vs the second leave date etc. Would I use RANK or ROW_NUMBER here, or something else?
0
Here’s an option using the window function row_number()
and a conditional aggregation
Example
with cte as (
Select *
,Grp = row_number() over (partition by [Student ID],Action order by [Effective Date])
From @YourTable
)
Select [Student ID]
,[Leave Date] = max(case when Action='Leave' then [Effective Date] end)
,[Return Date] = max(case when Action='Return' then [Effective Date] end)
From cte
Group By [Student ID]
,Grp
Order By [Student ID],[Leave Date]
Results
Student ID Leave Date Return Date
A1 2024-01-01 2024-03-01
A1 2024-06-01 2024-08-01
A2 2024-01-01 2024-05-01
A3 2025-01-01 NULL
1
here is one way including the absence_count :
with cte as (
select *
, row_number() over (partition by student_id,action order by effective_date) as rn
from temp_student_actions
)
select student_id
, count(*) over (partition by student_id rows unbounded preceding) no_of_absence
, max(case when action='Leave' then effective_date end) leave_date
, max(case when action='Return' then effective_date end) return_date
from cte
group by student_id, rn
order by student_id, leave_date
output:
student_id | no_of_absence | leave_date | return_date |
---|---|---|---|
A1 | 1 | 2024-01-01 | 2024-03-01 |
A1 | 2 | 2024-06-01 | 2024-08-01 |
A1 | 3 | 2025-01-01 | 2025-06-01 |
A2 | 1 | 2024-01-01 | 2024-05-01 |
fiddle
Yes, there is always a way to achieve this. Here is how it can be done by using the combination of ROW_NUMBER()
and LAG()
functions in the SQL server.
Here is the query
WITH Absences AS (
SELECT StudentID, Action, EffectiveDate,
ROW_NUMBER() OVER (PARTITION BY StudentID, Action ORDER BY EffectiveDate) AS RowNum,
CASE Action WHEN 'Leave' THEN 1 ELSE -1 END AS ActionSign
FROM YourTable
)
SELECT StudentID,
SUM(ActionSign) OVER (PARTITION BY StudentID ORDER BY EffectiveDate) AS AbsenceNumber,
CASE Action WHEN 'Leave' THEN EffectiveDate END AS LeaveDate,
LAG(CASE Action WHEN 'Return' THEN EffectiveDate END, 1, NULL) OVER (PARTITION BY StudentID ORDER BY EffectiveDate) AS ReturnDate
FROM Absences
WHERE Action IN ('Leave', 'Return')
ORDER BY StudentID, EffectiveDate;
The first part is CTE I called it Absences
assigns a row to each row within each student’s leave/return sequence using ROW_NUMBER()
The main query uses a window function (SUM
and OVER
) to calculate the absence number for each student.
The CASE
statement filters and LAG
function is use to get the previous data.
Here is the output.
StudentID AbsenceNumber LeaveDate ReturnDate
A1 1 2024-01-01 2024-03-01
A1 2 2024-06-01 2024-08-01
A1 3 2025-01-01 2025-06-01
A2 1 2024-01-01 2024-05-01