Solved: Re: Measurement performance for calculating the nu...

hirokichi · ‎11-28-2021

Hi there,

The dataset consists of columns for ID, start time, and end time, with tens of thousands of rows.

To observe the number of simultaneous events in minutes, I created the following query.
However, the performance is poor.
Please let me know if there is a better way.

#DataSet

Download

#Query

SampleTime

let
Source = Csv.Document(File.Contents("C:\SampleTime.csv"),[Delimiter=",", Columns=3, Encoding=65001, QuoteStyle=QuoteStyle.None]),
Header = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
ConvertType = Table.TransformColumnTypes(Header,{{"Id", Int64.Type}, {"StartDate", type datetime}, {"EndDate", type datetime}}),
Custom = Table.AddColumn(ConvertType, "Round_StartDate", each DateTime.From(Number.RoundDown(Number.From(DateTime.From([StartDate]))*(24*60))/(24*60))),
Custom1 = Table.AddColumn(Custom, "Round_EndDate", each DateTime.From(Number.RoundUp(Number.From(DateTime.From([EndDate]))*(24*60))/(24*60)))
in
Custom1

Concurrent

let
    StartDateTime = DateTime.From(Date.From(List.Min(SampleTime[StartDate]))),
    EndDateTime = Date.AddDays(DateTime.From(Date.From(List.Max(SampleTime[EndDate]))), 1),
    Source = List.DateTimes(StartDateTime, Duration.TotalMinutes(EndDateTime - StartDateTime), #duration(0, 0, 1, 0)),
    ConvertToTable = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    RenameColumn = Table.RenameColumns(ConvertToTable,{{"Column1", "DateTime"}}),
    ConvertType = Table.TransformColumnTypes(RenameColumn,{{"DateTime", type datetime}}),
    TableBuffer = Table.Buffer(SampleTime),
    Custom = Table.AddColumn(ConvertType, "Concurrent", each Table.RowCount(Table.SelectRows(TableBuffer, (x) => x[Round_StartDate] <= [DateTime] and x[Round_EndDate] >= [DateTime])))
in
    Custom

AlexisOlson · ‎11-28-2021

Your Custom step is iterating through the entire TableBuffer and filtering for each of the 100k+ rows of ConvertType. Not terribly efficient.

I propose the following redefinition of Concurrent:

let
    Source = SampleTime,
    #"Added Custom" =
        Table.AddColumn(Source, "Custom", each
            List.DateTimes(
                [Round_StartDate],
                Duration.TotalMinutes([Round_EndDate] - [Round_StartDate]),
                #duration(0, 0, 1, 0))
        ),
    #"Removed Columns" = Table.SelectColumns(#"Added Custom",{"Id", "Custom"}),
    #"Expanded Custom" = Table.ExpandListColumn(#"Removed Columns", "Custom"),
    #"Grouped Rows" = Table.Group(#"Expanded Custom", {"Custom"}, {{"IdCount", each Table.RowCount(Table.Distinct(_)), Int64.Type}})
in
    #"Grouped Rows"

This evaluates much faster.

View solution in original post

AlexisOlson · ‎11-28-2021

Your Custom step is iterating through the entire TableBuffer and filtering for each of the 100k+ rows of ConvertType. Not terribly efficient.

I propose the following redefinition of Concurrent:

let
    Source = SampleTime,
    #"Added Custom" =
        Table.AddColumn(Source, "Custom", each
            List.DateTimes(
                [Round_StartDate],
                Duration.TotalMinutes([Round_EndDate] - [Round_StartDate]),
                #duration(0, 0, 1, 0))
        ),
    #"Removed Columns" = Table.SelectColumns(#"Added Custom",{"Id", "Custom"}),
    #"Expanded Custom" = Table.ExpandListColumn(#"Removed Columns", "Custom"),
    #"Grouped Rows" = Table.Group(#"Expanded Custom", {"Custom"}, {{"IdCount", each Table.RowCount(Table.Distinct(_)), Int64.Type}})
in
    #"Grouped Rows"

This evaluates much faster.

hirokichi · ‎11-29-2021

Thank you for your advice!
Queries are much faster!

Measurement performance for calculating the number of concurrent events

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly