Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
hirokichi
Regular Visitor

Performance of counting overlapped datetime

Hi there.

The dataset consists of columns for ID, start time, and end time, with tens of thousands of rows.
I would like to aggregate other rows that overlap in time based on the start time of each row.
I created the query using the answer , but the performance is poor.
How do I create a better query?

 

#DataSet

Download 

 

#Example

IDStartDateTimeEndDateTimeOverlap
12021/11/25 00:00:002021/11/25 02:00:00??? -> 1
22021/11/25 01:00:002021/11/25 02:00:00??? -> 2
32021/11/25 03:00:002021/11/25 08:00:00??? -> 1
42021/11/25 04:00:002021/11/25 07:00:00??? -> 2
52021/11/25 05:00:002021/11/25 08:00:00??? -> 3
   
100000   

 

#Query1

 

 

let
    source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
    header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    convertType = Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}}),
    addColumn = Table.AddColumn(convertType, "Temp", each convertType),
    expandColumn = Table.ExpandTableColumn(addColumn, "Temp", {"StartDate", "EndDate"}, {"Temp.StartDate", "Temp.EndDate"}),
    convertType1 = Table.TransformColumnTypes(expandColumn,{{"Temp.StartDate", type datetime}, {"Temp.EndDate", type datetime}}),
    addColumn1 = Table.AddColumn(convertType1, "Compare", each ([Temp.StartDate] <= [StartDate]) and ([StartDate] <= [Temp.EndDate])),
    filterRow = Table.SelectRows(addColumn1, each ([Compare] = true)),
    groupBy = Table.Group(filterRow, {"Id"}, {{"Concurrent", each Table.RowCount(_), Int64.Type}})
in
    groupBy

 

 

 

Query#2

 

 

let
    source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
    header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    convertType = Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}}),
    addColumn = Table.AddColumn(convertType, "Overlap", each Table.RowCount(Table.SelectRows(convertType, (x) => x[StartDate] <= [StartDate] and [StartDate] <= x[EndDate])))
in
    addColumn

 

 

 

Best regards,

1 ACCEPTED SOLUTION
Vera_33
Resident Rockstar
Resident Rockstar

Hi @hirokichi 

 

Groupby is slow, so did those 2 queries above achieve what you wanted? How about buffer the table in Query 2?

let
    source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
    header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    convertType =Table.Buffer( Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}})),
    addColumn = Table.AddColumn(convertType, "Overlap", each Table.RowCount(Table.SelectRows(convertType, (x) => x[StartDate] <= [StartDate] and [StartDate] <= x[EndDate])))
in
    addColumn

 

try query 3

let
    source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
    header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    convertType = Table.Buffer( Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}})),
    Custom = List.Buffer( Table.AddColumn(convertType, "Custom", each {[StartDate], [EndDate]})[Custom]),
    #"Added Custom1" = Table.AddColumn(convertType, "Overlap", (x)=> List.Count(List.Select( List.Transform(Custom, each _{0} <=x[StartDate] and _{1}>=x[StartDate]), each _=true)))
in
    #"Added Custom1"
    

View solution in original post

2 REPLIES 2
Vera_33
Resident Rockstar
Resident Rockstar

Hi @hirokichi 

 

Groupby is slow, so did those 2 queries above achieve what you wanted? How about buffer the table in Query 2?

let
    source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
    header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    convertType =Table.Buffer( Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}})),
    addColumn = Table.AddColumn(convertType, "Overlap", each Table.RowCount(Table.SelectRows(convertType, (x) => x[StartDate] <= [StartDate] and [StartDate] <= x[EndDate])))
in
    addColumn

 

try query 3

let
    source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=932, QuoteStyle=QuoteStyle.None]),
    header = Table.PromoteHeaders(source, [PromoteAllScalars=true]),
    convertType = Table.Buffer( Table.TransformColumnTypes(header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}})),
    Custom = List.Buffer( Table.AddColumn(convertType, "Custom", each {[StartDate], [EndDate]})[Custom]),
    #"Added Custom1" = Table.AddColumn(convertType, "Overlap", (x)=> List.Count(List.Select( List.Transform(Custom, each _{0} <=x[StartDate] and _{1}>=x[StartDate]), each _=true)))
in
    #"Added Custom1"
    

Thank you for your advice.
Queries are much faster!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors