View to Fill Time Gaps in SQL

I am working on onboarding data onto a platform which can visualize data only at hour grain or day grain. I provided them with connection string to my db and that system pulls data from it and shows it as a graph and do a bunch of slice and dice operations. Problem is its chart based visualization works only at hour or day grain and my data is once every 4 hours. It considers this as missing data and the graph comes up as dots instead of continuous line. So I thought of exposing a view which can time gap fill data at hour grain with the last data used and the platform can use it. This is presenting with some issues.

DB schema: record_date, segment_key, segment_value, avg_consumption, avg_charge … (more numeric columns)

Each segment_key, segment_value combination is updated every 4 hours and I was able to form a query to gap fill following this guide Fill time gaps and imputing missing values

But this is a query with a temporary table as a solution, not a view and the original table can contain lot of data a few years worth which can make this non-performant. Also the start/end times for the temp table cannot be hard coded as this is rolling data.

Is there a more elegant way I can do it where I gap fill without using any joins? And if use joins it still is performant.

Tried query following the guide from [Microsoft website](Fill time gaps and imputing missing values
). But it reads all the data from the table before filling and exposing.

From the description, there are two parts to this problem:

Get a contiguous range of dates/times – even if any point in the range is not present in the data.
In the result, for any missing point, pull the latest non-NULL value from previous points in the range.

For Azure SQL Database, SQL Server 2022, and the four people using Managed Instance, it is quite easy to generate a contiguous series of dates or times using GENERATE_SERIES. It is going to be both simpler and more efficient than a solution involving looping to populate a #temp table. You could create a table-valued function, depending on the interval you want, and pass in start/end dates. This one handles days or hours, and could easily be expanded to handle minutes, weeks, months, etc.:

CREATE OR ALTER FUNCTION dbo.GetInterval
(
  @part  varchar(4), 
  @start date, 
  @end   date
)
RETURNS TABLE WITH SCHEMABINDING
AS
  RETURN 
  (
    SELECT period_start = s, 
           period_end   = CASE @part
              WHEN 'day'  THEN DATEADD(DAY,  1, s) 
              WHEN 'hour' THEN DATEADD(HOUR, 1, s) 
           END
      FROM
      (
        SELECT s = DATEADD(HOUR, Value-1, CONVERT(datetime, @start))
          FROM GENERATE_SERIES(1, 1+DATEDIFF(HOUR, @start, @end))
         WHERE @part = 'hour'

        UNION ALL

        SELECT s = DATEADD(DAY, Value-1, @start)
          FROM GENERATE_SERIES(1, 1+DATEDIFF(DAY, @start, @end))
         WHERE @part = 'day'
    ) AS x
  );

Now for the data. You didn’t provide any, so I’ll make some up:

CREATE TABLE #sales
(
  record_date   date, 
  segment_key   int, 
  segment_value int, 
  consumption   decimal(10,2), 
  charge        decimal(12,2)
);

INSERT #sales VALUES
  ('20240101',1,1,10,40),('20240101',1,2,20,80),
  ('20240101',1,1,20,70),('20240101',1,2,30,100),
  ('20240102',1,1,10,90),('20240102',1,2,25,150),
  /* no rows at all for Jan 3 */
  ('20240104',1,1,12,48), /* missing segment_value */
  ('20240105',1,1,15,75),('20240105',1,2,11,44);
    
SELECT * 
  FROM #sales 
 ORDER BY record_date;

Output:

I’m also going to make up your desired results, because you didn’t provide those, either. I’m guessing you want aggregates per combo for every {period}, and if a row is missing, take the most recent one, e.g. by day it would be:

The problem now is twofold: you need to fill in gaps for days that are missing altogether, and you need to fill in gaps for segment combos that are missing. An ordinary aggregate query:

  SELECT record_date, segment_key, segment_value, 
    avg_consumption = AVG(consumption),
    avg_charge      = AVG(charge)
  FROM #sales 
  GROUP BY record_date, segment_key, segment_value;

Will produce these results:

To fill those gaps, you must first generate a list of all possible segment combos and all possible dates. You can generate that full set using:

DECLARE @start date = '20240101', @end date = '20240105';

;WITH all_possible_segments AS 
(
  SELECT segment_key, segment_value
    FROM #sales
   WHERE record_date >= @start 
     AND record_date <  DATEADD(DAY, 1, @end)
   GROUP BY segment_key, segment_value
),
all_possible_dates AS
(
  SELECT period_start, next_period_start = period_end
    FROM dbo.GetInterval('day', @start, DATEADD(DAY, 1, @end))
)
SELECT * FROM all_possible_segments
 CROSS JOIN all_possible_dates;

In this case it produces 10 rows, like so:

Now you can outer join to your actual data, e.g.:

DECLARE @start date = '20240101', @end date = '20240105';

;WITH all_possible_segments AS 
(
  SELECT segment_key, segment_value
    FROM #sales
   WHERE record_date >= @start 
     AND record_date <  DATEADD(DAY, 1, @end)
   GROUP BY segment_key, segment_value
),
all_possible_dates AS
(
  SELECT period_start, next_period_start = period_end
    FROM dbo.GetInterval('day', @start, DATEADD(DAY, 1, @end))
)
SELECT dates.period_start, 
       combos.segment_key, 
       combos.segment_value,
       avg_consumption = AVG(sales.consumption),
       avg_charge      = AVG(sales.charge)
  FROM all_possible_segments AS combos
 CROSS JOIN all_possible_dates AS dates
  LEFT OUTER JOIN #sales AS sales
    ON sales.segment_key   =  combos.segment_key
   AND sales.segment_value =  combos.segment_value
   AND sales.record_date   >= dates.period_start
   AND sales.record_date   <  dates.next_period_start
 GROUP BY dates.period_start, 
          combos.segment_key, 
          combos.segment_value;

But this still produces NULL for those missing rows (or 0 if you use COALESCE()), which makeds:

To fill in any missing row’s value with the previous value (which is weird because it misrepresents the actual data, unless you were filling in rolling SUMs, but if making the line look straight is paramount), you can use the new functionality for LAST_VALUE(): IGNORE NULLS. This says pull from a different (usually the most recent) row with a non-null value:

DECLARE @start date = '20240101', @end date = '20240105';

;WITH all_possible_segments AS 
(
  SELECT segment_key, segment_value
    FROM #sales
   WHERE record_date >= @start 
     AND record_date <  DATEADD(DAY, 1, @end)
   GROUP BY segment_key, segment_value
),
all_possible_dates AS
(
  SELECT period_start, next_period_start = period_end
    FROM dbo.GetInterval('day', @start, DATEADD(DAY, 1, @end))
)
SELECT period_start, segment_key, segment_value,
       avg_consumption = LAST_VALUE(avg_cons) IGNORE NULLS OVER 
           (PARTITION BY segment_key, segment_value ORDER BY period_start),
       avg_charge      = LAST_VALUE(avg_chrg) IGNORE NULLS OVER 
           (PARTITION BY segment_key, segment_value ORDER BY period_start)
FROM 
(
  SELECT dates.period_start, 
         combos.segment_key, 
         combos.segment_value,
         avg_cons = AVG(sales.consumption),
         avg_chrg = AVG(sales.charge)
    FROM all_possible_segments AS combos
   CROSS JOIN all_possible_dates AS dates
    LEFT OUTER JOIN #sales AS sales
      ON sales.segment_key   =  combos.segment_key
     AND sales.segment_value =  combos.segment_value
     AND sales.record_date   >= dates.period_start
     AND sales.record_date   <  dates.next_period_start
   GROUP BY dates.period_start, 
            combos.segment_key, 
            combos.segment_value
) AS grp
ORDER BY period_start, 
         segment_key, 
         segment_value;

Output:

If this doesn’t perform well, it’s probably because the underlying table isn’t properly indexed to support range queries (e.g. covering and leading on date/time), or your range was too large to do anything but scan. Or the performance problem is not the query at all, but simply trying to return / render too many rows. If you get three months of minute intervals and aggregate against 400 products, well, that’s a lot of data to return (over 50 million rows) no matter how fast the query executes to find those rows.
(Compare timings of SELECT to SELECT .,.. INTO #trash to see if this might be contributing to the problem.)

For readers using older versions, well, you can’t use IGNORE NULLS, but other ideas at least for replacing GENERATE_SERIES here:

Time Slots for a schedule between two Datetimes

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 06:42

Thẻ: sqlsql-serverazure-sql-database

Thiết kế website giá rẻ

Danh mục

View to Fill Time Gaps in SQL