Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
mark_carlisle
Advocate IV
Advocate IV

Running Total Performance Improvement Tips

I was kindly supplied a query to calculate running totals for call data, original post here. This works perfectly for the data shown in the example however this is not scaling and I suspect thats because of the recursive method used.

 

Are there any more efficent ways to acheive what I want to do? The current query shown below, for clarity the data is 1,760,813 rows at the step marked in red;

 

 

let
    Source = Sql.Databases("REDACTED"),
    REDACTED = Source{[Name="REDACTED"]}[Data],
    REDACTED = REDACTED{[Schema="dbo",Item="REDACTED"]}[Data],
    #"Filtered CallDate" = Table.SelectRows(REDACTED, each Date.IsInPreviousNDays([CallDate], 365)),
    #"Filtered AccountNumber" = Table.SelectRows(#"Filtered CallDate", each [AccountNumber] <> "REDACTED" and [AccountNumber] <> "INTERNAL"),
    #"Added IHT" = Table.AddColumn(#"Filtered AccountNumber", "IHT", each if [AccountDuration] <> null then [AccountDuration] else if [IncidentDuration] <> null then [IncidentDuration] else if [ClassificationDuration] <> null then [ClassificationDuration] else 0),
    #"Removed Duplicates" = Table.Distinct(#"Added IHT", {"CallDate", "OpenAccount", "CreateStaffDetailId"}),
    #"Extracted Text After Delimiter" = Table.TransformColumns(#"Removed Duplicates", {{"RegDescription", each Text.AfterDelimiter(_, " - ", {0, RelativePosition.FromEnd}), type text}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Extracted Text After Delimiter",{{"CallDate", type date}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"IncidentdetailId", "AccountNumber", "CallDate", "IHT"}),
    fnGetAllCallsWithinSevenDays = (tbl as table, accountNumber as any, curDate as date, curIncidentdetailIds as list) as table =>
        let
            // filter out all rows having max distance of 7 days of the same AccountNumber
            FilteredTable = Table.SelectRows(
                tbl,
                each 
                    [AccountNumber] = accountNumber 
                    and Number.Abs(Duration.Days([CallDate] - curDate)) <= 7
                    and not List.Contains(curIncidentdetailIds, [IncidentdetailId])
            ),
            Result = 
                if Table.IsEmpty(FilteredTable) then
                    FilteredTable // return from recursion
                else
                    let 
                        // a list of all visited incidents
                        newIncidentdetailIds = List.Combine({curIncidentdetailIds, Table.Column(FilteredTable, "IncidentdetailId")}),
                        // recursive all incidents before minimum  CallDate of already found incidents
                        checkBeforeCurrentDate = @fnGetAllCallsWithinSevenDays(
                            tbl, 
                            accountNumber, 
                            Table.Min(FilteredTable, "CallDate")[CallDate],
                            newIncidentdetailIds
                        ),
                        // recursive all incidents after maximum  CallDate of already found incidents
                        checkAfterCurrentDate = @fnGetAllCallsWithinSevenDays(
                            tbl, 
                            accountNumber, 
                            Table.Max(FilteredTable, "CallDate")[CallDate],
                            newIncidentdetailIds
                        ),
                        // combine all tables coming from recursion
                        combinedResult = Table.Combine({checkBeforeCurrentDate, FilteredTable, checkAfterCurrentDate})
                    in
                        combinedResult
        in
            Result,

    // unique records
    fnGetUniqueCallsWithinSevenDays = (tbl as table, accountNumber as any, curDate as date) as table =>
        let  
            distinctTable = Table.Distinct(fnGetAllCallsWithinSevenDays(tbl, accountNumber, curDate, {}), "IncidentdetailId"),
            indexedTable = Table.AddIndexColumn(distinctTable, "Index", 1)
        in
            indexedTable,
    
    // unique records within seven days
    runningTotalsTable = Table.AddColumn(#"Removed Other Columns", "runningTotalsTable", each fnGetUniqueCallsWithinSevenDays(#"Changed Type", [AccountNumber], [CallDate])),

    // aggregations
    #"Added TotalCallsInRepeatPeriod" = Table.AddColumn(runningTotalsTable, "TotalCallsInRepeatPeriod", each Table.RowCount([runningTotalsTable]), Int64.Type),
    #"Added CallNumber" = Table.AddColumn(#"Added TotalCallsInRepeatPeriod", "CallNumber", (rootRecord) => Table.SelectRows(rootRecord[runningTotalsTable], (childRecord) => rootRecord[IncidentdetailId] = childRecord[IncidentdetailId]){0}[Index]),
    #"Added TotalIHTInRepeatPeriod" = Table.AddColumn(#"Added CallNumber", "TotalIHTInRepeatPeriod", each List.Sum(Table.Column([runningTotalsTable], "IHT"))),
    #"Added RunningTotalIHTInRepeatPeriod" = Table.AddColumn(#"Added TotalIHTInRepeatPeriod", "RunningTotalIHTInRepeatPeriod", (rootRecord) => List.Sum(Table.SelectRows(rootRecord[runningTotalsTable], (childRecord) => rootRecord[IncidentdetailId] >= childRecord[IncidentdetailId])[IHT])),
    #"Added FirstIncidentDetailId" = Table.AddColumn(#"Added RunningTotalIHTInRepeatPeriod", "FirstIncidentDetailId", each List.Min(Table.Column([runningTotalsTable], "IncidentdetailId")))
in
    #"Added FirstIncidentDetailId"

 

 

Current Data

 

IncidentDetailIdAccountNameAccountNumberAccountTypeCallDateCallTimeMajorCommentMinorCommentThirdTierAccountDuration
33059568Example Company98765432Customer04/12/201816:20:26PayrollRTIFPS728
33925147Example Company98765432Customer03/05/201909:50:39PayrollInstallation and ActivationInstall Program735
33928425Example Company98765432Customer03/05/201914:13:58PayrollInstallation and ActivationInstall Program828
33928851Example Company98765432Customer03/05/201914:46:08PayrollInstallation and ActivationInstall Program429
33929819Example Company98765432Customer03/05/201916:13:20PayrollInstallation and ActivationInstall Program1255
33930022Example Company98765432Customer03/05/201916:29:46PayrollInstallation and ActivationInstall Program561
33930533Example Company98765432Customer03/05/201917:40:37PayrollInstallation and ActivationInstall Program1952
33935776Example Company98765432Customer04/05/201915:52:54PayrollSetup and ManageEmployee206
33936491Example Company98765432Customer04/05/201916:58:43PayrollProcessingCorrecting a mistake3359
33942334Example Company98765432Customer05/05/201916:34:21PayrollProcessingCorrecting a mistake3767
33951852Example Company98765432Customer11/05/201913:32:33PayrollProcessingCalculations266
33935776Example Company98765432Customer10/06/201915:52:54PayrollSetup and ManageEmployee206

 

What I want to achieve in bold

 

IncidentDetailIdAccountNameAccountNumberAccountTypeCallDateCallTimeMajorCommentMinorCommentThirdTierAccountDurationTotalCallsInRepeatPeriodCallNumberTotalIHTInRepeatPeriodRunningTotalIHTInRepeatPeriodFirstIncidentDetailId
33059568Example Company98765432Customer04/12/201816:20:26PayrollRTIFPS7281172872833059568
33925147Example Company98765432Customer03/05/201909:50:39PayrollInstallation and ActivationInstall Program7351011335873533925147
33928425Example Company98765432Customer03/05/201914:13:58PayrollInstallation and ActivationInstall Program82810213358156333925147
33928851Example Company98765432Customer03/05/201914:46:08PayrollInstallation and ActivationInstall Program42910313358199233925147
33929819Example Company98765432Customer03/05/201916:13:20PayrollInstallation and ActivationInstall Program125510413358324733925147
33930022Example Company98765432Customer03/05/201916:29:46PayrollInstallation and ActivationInstall Program56110513358380833925147
33930533Example Company98765432Customer03/05/201917:40:37PayrollInstallation and ActivationInstall Program195210613358576033925147
33935776Example Company98765432Customer04/05/201915:52:54PayrollSetup and ManageEmployee20610713358596633925147
33936491Example Company98765432Customer04/05/201916:58:43PayrollProcessingCorrecting a mistake335910813358932533925147
33942334Example Company98765432Customer05/05/201916:34:21PayrollProcessingCorrecting a mistake3767109133581309233925147
33951852Example Company98765432Customer11/05/201913:32:33PayrollProcessingCalculations2661010133581335833925147
33935776Example Company98765432Customer10/06/201915:52:54PayrollSetup and ManageEmployee2061120620733935776
1 ACCEPTED SOLUTION
Nolock
Resident Rockstar
Resident Rockstar

Hi @mark_carlisle,

 

I'm still thinking how to improve the performance. I've implemented another way without recursion, check it out and please test with your big dataset.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("pdRLi9swEADgv2JyXoI0oxk9bmVpYQ+F0O1tyUGkJoT6EWynNP++kh27cjEFxweLkR/yx2hGHx87REGW2Oxedp9/+/Ja5NlrXV59dQ93rNFMCiGEr7e2q8u8CaFQewl7EDJ+JNmBcMAhPPh7UxdFiL59fwvjl8N7GDX0rz2uYTaM06+PL9FhgaTSaxy4FxQdNk6sI+HQzhxvVdv5ovDdpa4yX/3IPp26y69++vdpdmjqc+PL6EKKTjGCEclMtyfhyDUK6FmuVE6i61d/nmuG1EYuJFxJjIteQ3KDV7ET27wK7OjF1GstLHmt6X/9nJdjfkFs8kqgqR5UAkbo6/RfMAoBsAEMNuR4E5hYjl5KvUaYRS8hPu3VToV+09sSbAlGMCdg0iyWwKQ1rzuoEjA5AkdqBn7Pu9u1l371lT/ncfGwdH3PYwiCR51OdZZ5ScfKruovNd9+Mk7hTBcSdcrb9lKd45d10+QhndU581l5Can8mfcKmprKJEiLsHRoKUBUa5A0R6JyINcjNesRadPOR7HY+iQNrekkKVMlOgSH/0ulL063oUrbuM08bfMwTLpH+2wsQyn2greX4eMaZiAeJ9CAOR7/AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [IncidentId = _t, AccountName = _t, AccountNumber = _t, AccountType = _t, CallDate = _t, CallTime = _t, MajorComment = _t, MinorComment = _t, ThirdTier = _t, AccountDuration = _t, TotalCalls = _t, CallNumber = _t, TotalDuration = _t, SeqDuration = _t, FirstIncidentId = _t]),
    #"Removed Columns" = Table.RemoveColumns(Source,{"AccountName", "AccountType", "MajorComment", "MinorComment", "ThirdTier", "CallTime", "TotalCalls", "CallNumber", "TotalDuration", "SeqDuration", "FirstIncidentId"}),
    #"Changed Type with Locale" = Table.TransformColumnTypes(#"Removed Columns", {{"CallDate", type date}}, "de-DE"),
    #"Changed Type" = Table.TransformColumnTypes(#"Changed Type with Locale",{{"AccountDuration", Int64.Type}}),
    
    fnSummarizeIncidents = (tbl as table) as table =>
        let
            SortByCallDate = Table.Sort(tbl, {{"CallDate", Order.Ascending}}),
            IndexColumn = Table.AddIndexColumn(SortByCallDate, "Index", 0, 1),

            // a new serie starts if the count of days between 2 incidents of one account is longer than  days
            IncidentSerieStart = Table.AddColumn(IndexColumn, "IncidentSerieStart", each 
                if [Index] = 0 then 
                    1 
                else if Duration.Days([CallDate] - tbl[CallDate]{[Index]-1}) >= 7 then 
                    1 
                else 
                    0
            ),
            // sum all previous IncidentSerieStarts to have an ID for a serie
            IncidentSerieNumber = Table.AddColumn(IncidentSerieStart, "IncidentSerieNumber", 
                (parentRecord) => 
                    List.Sum(
                        Table.Column(
                            Table.SelectRows(
                                IncidentSerieStart, 
                                (childRecord) => parentRecord[Index] >= childRecord[Index]
                            ), 
                            "IncidentSerieStart"
                        )
                    )
            ),
            // group by series
            Groups = Table.Group(IncidentSerieNumber, {"IncidentSerieNumber"}, {{"IncidentSerie", each _}}),
            // remove incident serie number
            RemoveIncidentSerieNumber = Table.RemoveColumns(Groups,{"IncidentSerieNumber"}),

            // add an inner index for every group
            GroupInnerIndex = Table.TransformColumns(RemoveIncidentSerieNumber, {"IncidentSerie", each Table.AddIndexColumn(_, "SerieIndex", 1, 1)}),
            // calculate a running total of AccountDuration
            GroupInnerRunningTotalOfAccountDuration = Table.TransformColumns(GroupInnerIndex, {"IncidentSerie", (tbl) => Table.AddColumn(tbl, "Sequence Duration", (rec) => 
                List.Sum(Table.Column(Table.SelectRows(tbl, each [SerieIndex] <= rec[SerieIndex]), "AccountDuration"))
            )}),

            // aggregations
            TotalCalls = Table.AddColumn(GroupInnerRunningTotalOfAccountDuration, "Total Calls", each Table.RowCount([IncidentSerie]), Int64.Type),
            TotalDuration = Table.AddColumn(TotalCalls, "Total Duration", each List.Sum(Table.Column([IncidentSerie], "AccountDuration"))),
            FirstIncident = Table.AddColumn(TotalDuration, "First Incident", each List.Min(Table.Column([IncidentSerie], "IncidentId"))),
            // expand group
            ExpandTable = Table.ExpandTableColumn(
                FirstIncident, 
                "IncidentSerie", 
                {"IncidentId", "AccountNumber", "CallDate", "AccountDuration", "SerieIndex", "Sequence Duration"}
            )
        in
            ExpandTable,

    Groups = Table.Group(#"Changed Type", {"AccountNumber"}, {{"PerAccount", each _}}),
    RemoveAccountNumber = Table.RemoveColumns(Groups, "AccountNumber"),
    GroupsSummarized = Table.TransformColumns(RemoveAccountNumber, {"PerAccount", fnSummarizeIncidents}),
    ExpandTable = Table.ExpandTableColumn(GroupsSummarized, "PerAccount", {"IncidentId", "AccountNumber", "CallDate", "AccountDuration", "SerieIndex", "Sequence Duration", "Total Calls", "Total Duration", "First Incident"})

in
    ExpandTable

View solution in original post

3 REPLIES 3
Nolock
Resident Rockstar
Resident Rockstar

Hi @mark_carlisle,

I've tried some optimizations but I've appended them to the old post: https://community.powerbi.com/t5/Power-Query/Dates-in-period-for-a-dynamic-period/m-p/714354/highlig...

Nolock
Resident Rockstar
Resident Rockstar

Hi @mark_carlisle,

 

I'm still thinking how to improve the performance. I've implemented another way without recursion, check it out and please test with your big dataset.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("pdRLi9swEADgv2JyXoI0oxk9bmVpYQ+F0O1tyUGkJoT6EWynNP++kh27cjEFxweLkR/yx2hGHx87REGW2Oxedp9/+/Ja5NlrXV59dQ93rNFMCiGEr7e2q8u8CaFQewl7EDJ+JNmBcMAhPPh7UxdFiL59fwvjl8N7GDX0rz2uYTaM06+PL9FhgaTSaxy4FxQdNk6sI+HQzhxvVdv5ovDdpa4yX/3IPp26y69++vdpdmjqc+PL6EKKTjGCEclMtyfhyDUK6FmuVE6i61d/nmuG1EYuJFxJjIteQ3KDV7ET27wK7OjF1GstLHmt6X/9nJdjfkFs8kqgqR5UAkbo6/RfMAoBsAEMNuR4E5hYjl5KvUaYRS8hPu3VToV+09sSbAlGMCdg0iyWwKQ1rzuoEjA5AkdqBn7Pu9u1l371lT/ncfGwdH3PYwiCR51OdZZ5ScfKruovNd9+Mk7hTBcSdcrb9lKd45d10+QhndU581l5Can8mfcKmprKJEiLsHRoKUBUa5A0R6JyINcjNesRadPOR7HY+iQNrekkKVMlOgSH/0ulL063oUrbuM08bfMwTLpH+2wsQyn2greX4eMaZiAeJ9CAOR7/AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [IncidentId = _t, AccountName = _t, AccountNumber = _t, AccountType = _t, CallDate = _t, CallTime = _t, MajorComment = _t, MinorComment = _t, ThirdTier = _t, AccountDuration = _t, TotalCalls = _t, CallNumber = _t, TotalDuration = _t, SeqDuration = _t, FirstIncidentId = _t]),
    #"Removed Columns" = Table.RemoveColumns(Source,{"AccountName", "AccountType", "MajorComment", "MinorComment", "ThirdTier", "CallTime", "TotalCalls", "CallNumber", "TotalDuration", "SeqDuration", "FirstIncidentId"}),
    #"Changed Type with Locale" = Table.TransformColumnTypes(#"Removed Columns", {{"CallDate", type date}}, "de-DE"),
    #"Changed Type" = Table.TransformColumnTypes(#"Changed Type with Locale",{{"AccountDuration", Int64.Type}}),
    
    fnSummarizeIncidents = (tbl as table) as table =>
        let
            SortByCallDate = Table.Sort(tbl, {{"CallDate", Order.Ascending}}),
            IndexColumn = Table.AddIndexColumn(SortByCallDate, "Index", 0, 1),

            // a new serie starts if the count of days between 2 incidents of one account is longer than  days
            IncidentSerieStart = Table.AddColumn(IndexColumn, "IncidentSerieStart", each 
                if [Index] = 0 then 
                    1 
                else if Duration.Days([CallDate] - tbl[CallDate]{[Index]-1}) >= 7 then 
                    1 
                else 
                    0
            ),
            // sum all previous IncidentSerieStarts to have an ID for a serie
            IncidentSerieNumber = Table.AddColumn(IncidentSerieStart, "IncidentSerieNumber", 
                (parentRecord) => 
                    List.Sum(
                        Table.Column(
                            Table.SelectRows(
                                IncidentSerieStart, 
                                (childRecord) => parentRecord[Index] >= childRecord[Index]
                            ), 
                            "IncidentSerieStart"
                        )
                    )
            ),
            // group by series
            Groups = Table.Group(IncidentSerieNumber, {"IncidentSerieNumber"}, {{"IncidentSerie", each _}}),
            // remove incident serie number
            RemoveIncidentSerieNumber = Table.RemoveColumns(Groups,{"IncidentSerieNumber"}),

            // add an inner index for every group
            GroupInnerIndex = Table.TransformColumns(RemoveIncidentSerieNumber, {"IncidentSerie", each Table.AddIndexColumn(_, "SerieIndex", 1, 1)}),
            // calculate a running total of AccountDuration
            GroupInnerRunningTotalOfAccountDuration = Table.TransformColumns(GroupInnerIndex, {"IncidentSerie", (tbl) => Table.AddColumn(tbl, "Sequence Duration", (rec) => 
                List.Sum(Table.Column(Table.SelectRows(tbl, each [SerieIndex] <= rec[SerieIndex]), "AccountDuration"))
            )}),

            // aggregations
            TotalCalls = Table.AddColumn(GroupInnerRunningTotalOfAccountDuration, "Total Calls", each Table.RowCount([IncidentSerie]), Int64.Type),
            TotalDuration = Table.AddColumn(TotalCalls, "Total Duration", each List.Sum(Table.Column([IncidentSerie], "AccountDuration"))),
            FirstIncident = Table.AddColumn(TotalDuration, "First Incident", each List.Min(Table.Column([IncidentSerie], "IncidentId"))),
            // expand group
            ExpandTable = Table.ExpandTableColumn(
                FirstIncident, 
                "IncidentSerie", 
                {"IncidentId", "AccountNumber", "CallDate", "AccountDuration", "SerieIndex", "Sequence Duration"}
            )
        in
            ExpandTable,

    Groups = Table.Group(#"Changed Type", {"AccountNumber"}, {{"PerAccount", each _}}),
    RemoveAccountNumber = Table.RemoveColumns(Groups, "AccountNumber"),
    GroupsSummarized = Table.TransformColumns(RemoveAccountNumber, {"PerAccount", fnSummarizeIncidents}),
    ExpandTable = Table.ExpandTableColumn(GroupsSummarized, "PerAccount", {"IncidentId", "AccountNumber", "CallDate", "AccountDuration", "SerieIndex", "Sequence Duration", "Total Calls", "Total Duration", "First Incident"})

in
    ExpandTable

Thanks for all this. I've optimised by initial query to the SQL DB to take advantage of folding right up the point of the function now so that's a bit faster.

 

This version of the function, with some additional steps after it to complete the query, is managing to complete for a date range of 01/05/2019 - 17/06/2019 (approx 185,000 rows) in around 15 minutes on my laptop. The recursive version was not able to do a single day so this is a definite improvement.

 

I'm now working on add the whole thing to PBI Dataflows to automate the whole process.

 

Thanks again for the help.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors