Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I was kindly supplied a query to calculate running totals for call data, original post here. This works perfectly for the data shown in the example however this is not scaling and I suspect thats because of the recursive method used.
Are there any more efficent ways to acheive what I want to do? The current query shown below, for clarity the data is 1,760,813 rows at the step marked in red;
let Source = Sql.Databases("REDACTED"), REDACTED = Source{[Name="REDACTED"]}[Data], REDACTED = REDACTED{[Schema="dbo",Item="REDACTED"]}[Data], #"Filtered CallDate" = Table.SelectRows(REDACTED, each Date.IsInPreviousNDays([CallDate], 365)), #"Filtered AccountNumber" = Table.SelectRows(#"Filtered CallDate", each [AccountNumber] <> "REDACTED" and [AccountNumber] <> "INTERNAL"), #"Added IHT" = Table.AddColumn(#"Filtered AccountNumber", "IHT", each if [AccountDuration] <> null then [AccountDuration] else if [IncidentDuration] <> null then [IncidentDuration] else if [ClassificationDuration] <> null then [ClassificationDuration] else 0), #"Removed Duplicates" = Table.Distinct(#"Added IHT", {"CallDate", "OpenAccount", "CreateStaffDetailId"}), #"Extracted Text After Delimiter" = Table.TransformColumns(#"Removed Duplicates", {{"RegDescription", each Text.AfterDelimiter(_, " - ", {0, RelativePosition.FromEnd}), type text}}), #"Changed Type" = Table.TransformColumnTypes(#"Extracted Text After Delimiter",{{"CallDate", type date}}), #"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"IncidentdetailId", "AccountNumber", "CallDate", "IHT"}), fnGetAllCallsWithinSevenDays = (tbl as table, accountNumber as any, curDate as date, curIncidentdetailIds as list) as table => let // filter out all rows having max distance of 7 days of the same AccountNumber FilteredTable = Table.SelectRows( tbl, each [AccountNumber] = accountNumber and Number.Abs(Duration.Days([CallDate] - curDate)) <= 7 and not List.Contains(curIncidentdetailIds, [IncidentdetailId]) ), Result = if Table.IsEmpty(FilteredTable) then FilteredTable // return from recursion else let // a list of all visited incidents newIncidentdetailIds = List.Combine({curIncidentdetailIds, Table.Column(FilteredTable, "IncidentdetailId")}), // recursive all incidents before minimum CallDate of already found incidents checkBeforeCurrentDate = @fnGetAllCallsWithinSevenDays( tbl, accountNumber, Table.Min(FilteredTable, "CallDate")[CallDate], newIncidentdetailIds ), // recursive all incidents after maximum CallDate of already found incidents checkAfterCurrentDate = @fnGetAllCallsWithinSevenDays( tbl, accountNumber, Table.Max(FilteredTable, "CallDate")[CallDate], newIncidentdetailIds ), // combine all tables coming from recursion combinedResult = Table.Combine({checkBeforeCurrentDate, FilteredTable, checkAfterCurrentDate}) in combinedResult in Result, // unique records fnGetUniqueCallsWithinSevenDays = (tbl as table, accountNumber as any, curDate as date) as table => let distinctTable = Table.Distinct(fnGetAllCallsWithinSevenDays(tbl, accountNumber, curDate, {}), "IncidentdetailId"), indexedTable = Table.AddIndexColumn(distinctTable, "Index", 1) in indexedTable, // unique records within seven days runningTotalsTable = Table.AddColumn(#"Removed Other Columns", "runningTotalsTable", each fnGetUniqueCallsWithinSevenDays(#"Changed Type", [AccountNumber], [CallDate])), // aggregations #"Added TotalCallsInRepeatPeriod" = Table.AddColumn(runningTotalsTable, "TotalCallsInRepeatPeriod", each Table.RowCount([runningTotalsTable]), Int64.Type), #"Added CallNumber" = Table.AddColumn(#"Added TotalCallsInRepeatPeriod", "CallNumber", (rootRecord) => Table.SelectRows(rootRecord[runningTotalsTable], (childRecord) => rootRecord[IncidentdetailId] = childRecord[IncidentdetailId]){0}[Index]), #"Added TotalIHTInRepeatPeriod" = Table.AddColumn(#"Added CallNumber", "TotalIHTInRepeatPeriod", each List.Sum(Table.Column([runningTotalsTable], "IHT"))), #"Added RunningTotalIHTInRepeatPeriod" = Table.AddColumn(#"Added TotalIHTInRepeatPeriod", "RunningTotalIHTInRepeatPeriod", (rootRecord) => List.Sum(Table.SelectRows(rootRecord[runningTotalsTable], (childRecord) => rootRecord[IncidentdetailId] >= childRecord[IncidentdetailId])[IHT])), #"Added FirstIncidentDetailId" = Table.AddColumn(#"Added RunningTotalIHTInRepeatPeriod", "FirstIncidentDetailId", each List.Min(Table.Column([runningTotalsTable], "IncidentdetailId"))) in #"Added FirstIncidentDetailId"
Current Data
IncidentDetailId | AccountName | AccountNumber | AccountType | CallDate | CallTime | MajorComment | MinorComment | ThirdTier | AccountDuration |
33059568 | Example Company | 98765432 | Customer | 04/12/2018 | 16:20:26 | Payroll | RTI | FPS | 728 |
33925147 | Example Company | 98765432 | Customer | 03/05/2019 | 09:50:39 | Payroll | Installation and Activation | Install Program | 735 |
33928425 | Example Company | 98765432 | Customer | 03/05/2019 | 14:13:58 | Payroll | Installation and Activation | Install Program | 828 |
33928851 | Example Company | 98765432 | Customer | 03/05/2019 | 14:46:08 | Payroll | Installation and Activation | Install Program | 429 |
33929819 | Example Company | 98765432 | Customer | 03/05/2019 | 16:13:20 | Payroll | Installation and Activation | Install Program | 1255 |
33930022 | Example Company | 98765432 | Customer | 03/05/2019 | 16:29:46 | Payroll | Installation and Activation | Install Program | 561 |
33930533 | Example Company | 98765432 | Customer | 03/05/2019 | 17:40:37 | Payroll | Installation and Activation | Install Program | 1952 |
33935776 | Example Company | 98765432 | Customer | 04/05/2019 | 15:52:54 | Payroll | Setup and Manage | Employee | 206 |
33936491 | Example Company | 98765432 | Customer | 04/05/2019 | 16:58:43 | Payroll | Processing | Correcting a mistake | 3359 |
33942334 | Example Company | 98765432 | Customer | 05/05/2019 | 16:34:21 | Payroll | Processing | Correcting a mistake | 3767 |
33951852 | Example Company | 98765432 | Customer | 11/05/2019 | 13:32:33 | Payroll | Processing | Calculations | 266 |
33935776 | Example Company | 98765432 | Customer | 10/06/2019 | 15:52:54 | Payroll | Setup and Manage | Employee | 206 |
What I want to achieve in bold
IncidentDetailId | AccountName | AccountNumber | AccountType | CallDate | CallTime | MajorComment | MinorComment | ThirdTier | AccountDuration | TotalCallsInRepeatPeriod | CallNumber | TotalIHTInRepeatPeriod | RunningTotalIHTInRepeatPeriod | FirstIncidentDetailId |
33059568 | Example Company | 98765432 | Customer | 04/12/2018 | 16:20:26 | Payroll | RTI | FPS | 728 | 1 | 1 | 728 | 728 | 33059568 |
33925147 | Example Company | 98765432 | Customer | 03/05/2019 | 09:50:39 | Payroll | Installation and Activation | Install Program | 735 | 10 | 1 | 13358 | 735 | 33925147 |
33928425 | Example Company | 98765432 | Customer | 03/05/2019 | 14:13:58 | Payroll | Installation and Activation | Install Program | 828 | 10 | 2 | 13358 | 1563 | 33925147 |
33928851 | Example Company | 98765432 | Customer | 03/05/2019 | 14:46:08 | Payroll | Installation and Activation | Install Program | 429 | 10 | 3 | 13358 | 1992 | 33925147 |
33929819 | Example Company | 98765432 | Customer | 03/05/2019 | 16:13:20 | Payroll | Installation and Activation | Install Program | 1255 | 10 | 4 | 13358 | 3247 | 33925147 |
33930022 | Example Company | 98765432 | Customer | 03/05/2019 | 16:29:46 | Payroll | Installation and Activation | Install Program | 561 | 10 | 5 | 13358 | 3808 | 33925147 |
33930533 | Example Company | 98765432 | Customer | 03/05/2019 | 17:40:37 | Payroll | Installation and Activation | Install Program | 1952 | 10 | 6 | 13358 | 5760 | 33925147 |
33935776 | Example Company | 98765432 | Customer | 04/05/2019 | 15:52:54 | Payroll | Setup and Manage | Employee | 206 | 10 | 7 | 13358 | 5966 | 33925147 |
33936491 | Example Company | 98765432 | Customer | 04/05/2019 | 16:58:43 | Payroll | Processing | Correcting a mistake | 3359 | 10 | 8 | 13358 | 9325 | 33925147 |
33942334 | Example Company | 98765432 | Customer | 05/05/2019 | 16:34:21 | Payroll | Processing | Correcting a mistake | 3767 | 10 | 9 | 13358 | 13092 | 33925147 |
33951852 | Example Company | 98765432 | Customer | 11/05/2019 | 13:32:33 | Payroll | Processing | Calculations | 266 | 10 | 10 | 13358 | 13358 | 33925147 |
33935776 | Example Company | 98765432 | Customer | 10/06/2019 | 15:52:54 | Payroll | Setup and Manage | Employee | 206 | 1 | 1 | 206 | 207 | 33935776 |
Solved! Go to Solution.
Hi @mark_carlisle,
I'm still thinking how to improve the performance. I've implemented another way without recursion, check it out and please test with your big dataset.
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("pdRLi9swEADgv2JyXoI0oxk9bmVpYQ+F0O1tyUGkJoT6EWynNP++kh27cjEFxweLkR/yx2hGHx87REGW2Oxedp9/+/Ja5NlrXV59dQ93rNFMCiGEr7e2q8u8CaFQewl7EDJ+JNmBcMAhPPh7UxdFiL59fwvjl8N7GDX0rz2uYTaM06+PL9FhgaTSaxy4FxQdNk6sI+HQzhxvVdv5ovDdpa4yX/3IPp26y69++vdpdmjqc+PL6EKKTjGCEclMtyfhyDUK6FmuVE6i61d/nmuG1EYuJFxJjIteQ3KDV7ET27wK7OjF1GstLHmt6X/9nJdjfkFs8kqgqR5UAkbo6/RfMAoBsAEMNuR4E5hYjl5KvUaYRS8hPu3VToV+09sSbAlGMCdg0iyWwKQ1rzuoEjA5AkdqBn7Pu9u1l371lT/ncfGwdH3PYwiCR51OdZZ5ScfKruovNd9+Mk7hTBcSdcrb9lKd45d10+QhndU581l5Can8mfcKmprKJEiLsHRoKUBUa5A0R6JyINcjNesRadPOR7HY+iQNrekkKVMlOgSH/0ulL063oUrbuM08bfMwTLpH+2wsQyn2greX4eMaZiAeJ9CAOR7/AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [IncidentId = _t, AccountName = _t, AccountNumber = _t, AccountType = _t, CallDate = _t, CallTime = _t, MajorComment = _t, MinorComment = _t, ThirdTier = _t, AccountDuration = _t, TotalCalls = _t, CallNumber = _t, TotalDuration = _t, SeqDuration = _t, FirstIncidentId = _t]), #"Removed Columns" = Table.RemoveColumns(Source,{"AccountName", "AccountType", "MajorComment", "MinorComment", "ThirdTier", "CallTime", "TotalCalls", "CallNumber", "TotalDuration", "SeqDuration", "FirstIncidentId"}), #"Changed Type with Locale" = Table.TransformColumnTypes(#"Removed Columns", {{"CallDate", type date}}, "de-DE"), #"Changed Type" = Table.TransformColumnTypes(#"Changed Type with Locale",{{"AccountDuration", Int64.Type}}), fnSummarizeIncidents = (tbl as table) as table => let SortByCallDate = Table.Sort(tbl, {{"CallDate", Order.Ascending}}), IndexColumn = Table.AddIndexColumn(SortByCallDate, "Index", 0, 1), // a new serie starts if the count of days between 2 incidents of one account is longer than days IncidentSerieStart = Table.AddColumn(IndexColumn, "IncidentSerieStart", each if [Index] = 0 then 1 else if Duration.Days([CallDate] - tbl[CallDate]{[Index]-1}) >= 7 then 1 else 0 ), // sum all previous IncidentSerieStarts to have an ID for a serie IncidentSerieNumber = Table.AddColumn(IncidentSerieStart, "IncidentSerieNumber", (parentRecord) => List.Sum( Table.Column( Table.SelectRows( IncidentSerieStart, (childRecord) => parentRecord[Index] >= childRecord[Index] ), "IncidentSerieStart" ) ) ), // group by series Groups = Table.Group(IncidentSerieNumber, {"IncidentSerieNumber"}, {{"IncidentSerie", each _}}), // remove incident serie number RemoveIncidentSerieNumber = Table.RemoveColumns(Groups,{"IncidentSerieNumber"}), // add an inner index for every group GroupInnerIndex = Table.TransformColumns(RemoveIncidentSerieNumber, {"IncidentSerie", each Table.AddIndexColumn(_, "SerieIndex", 1, 1)}), // calculate a running total of AccountDuration GroupInnerRunningTotalOfAccountDuration = Table.TransformColumns(GroupInnerIndex, {"IncidentSerie", (tbl) => Table.AddColumn(tbl, "Sequence Duration", (rec) => List.Sum(Table.Column(Table.SelectRows(tbl, each [SerieIndex] <= rec[SerieIndex]), "AccountDuration")) )}), // aggregations TotalCalls = Table.AddColumn(GroupInnerRunningTotalOfAccountDuration, "Total Calls", each Table.RowCount([IncidentSerie]), Int64.Type), TotalDuration = Table.AddColumn(TotalCalls, "Total Duration", each List.Sum(Table.Column([IncidentSerie], "AccountDuration"))), FirstIncident = Table.AddColumn(TotalDuration, "First Incident", each List.Min(Table.Column([IncidentSerie], "IncidentId"))), // expand group ExpandTable = Table.ExpandTableColumn( FirstIncident, "IncidentSerie", {"IncidentId", "AccountNumber", "CallDate", "AccountDuration", "SerieIndex", "Sequence Duration"} ) in ExpandTable, Groups = Table.Group(#"Changed Type", {"AccountNumber"}, {{"PerAccount", each _}}), RemoveAccountNumber = Table.RemoveColumns(Groups, "AccountNumber"), GroupsSummarized = Table.TransformColumns(RemoveAccountNumber, {"PerAccount", fnSummarizeIncidents}), ExpandTable = Table.ExpandTableColumn(GroupsSummarized, "PerAccount", {"IncidentId", "AccountNumber", "CallDate", "AccountDuration", "SerieIndex", "Sequence Duration", "Total Calls", "Total Duration", "First Incident"}) in ExpandTable
Hi @mark_carlisle,
I've tried some optimizations but I've appended them to the old post: https://community.powerbi.com/t5/Power-Query/Dates-in-period-for-a-dynamic-period/m-p/714354/highlig...
Hi @mark_carlisle,
I'm still thinking how to improve the performance. I've implemented another way without recursion, check it out and please test with your big dataset.
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("pdRLi9swEADgv2JyXoI0oxk9bmVpYQ+F0O1tyUGkJoT6EWynNP++kh27cjEFxweLkR/yx2hGHx87REGW2Oxedp9/+/Ja5NlrXV59dQ93rNFMCiGEr7e2q8u8CaFQewl7EDJ+JNmBcMAhPPh7UxdFiL59fwvjl8N7GDX0rz2uYTaM06+PL9FhgaTSaxy4FxQdNk6sI+HQzhxvVdv5ovDdpa4yX/3IPp26y69++vdpdmjqc+PL6EKKTjGCEclMtyfhyDUK6FmuVE6i61d/nmuG1EYuJFxJjIteQ3KDV7ET27wK7OjF1GstLHmt6X/9nJdjfkFs8kqgqR5UAkbo6/RfMAoBsAEMNuR4E5hYjl5KvUaYRS8hPu3VToV+09sSbAlGMCdg0iyWwKQ1rzuoEjA5AkdqBn7Pu9u1l371lT/ncfGwdH3PYwiCR51OdZZ5ScfKruovNd9+Mk7hTBcSdcrb9lKd45d10+QhndU581l5Can8mfcKmprKJEiLsHRoKUBUa5A0R6JyINcjNesRadPOR7HY+iQNrekkKVMlOgSH/0ulL063oUrbuM08bfMwTLpH+2wsQyn2greX4eMaZiAeJ9CAOR7/AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [IncidentId = _t, AccountName = _t, AccountNumber = _t, AccountType = _t, CallDate = _t, CallTime = _t, MajorComment = _t, MinorComment = _t, ThirdTier = _t, AccountDuration = _t, TotalCalls = _t, CallNumber = _t, TotalDuration = _t, SeqDuration = _t, FirstIncidentId = _t]), #"Removed Columns" = Table.RemoveColumns(Source,{"AccountName", "AccountType", "MajorComment", "MinorComment", "ThirdTier", "CallTime", "TotalCalls", "CallNumber", "TotalDuration", "SeqDuration", "FirstIncidentId"}), #"Changed Type with Locale" = Table.TransformColumnTypes(#"Removed Columns", {{"CallDate", type date}}, "de-DE"), #"Changed Type" = Table.TransformColumnTypes(#"Changed Type with Locale",{{"AccountDuration", Int64.Type}}), fnSummarizeIncidents = (tbl as table) as table => let SortByCallDate = Table.Sort(tbl, {{"CallDate", Order.Ascending}}), IndexColumn = Table.AddIndexColumn(SortByCallDate, "Index", 0, 1), // a new serie starts if the count of days between 2 incidents of one account is longer than days IncidentSerieStart = Table.AddColumn(IndexColumn, "IncidentSerieStart", each if [Index] = 0 then 1 else if Duration.Days([CallDate] - tbl[CallDate]{[Index]-1}) >= 7 then 1 else 0 ), // sum all previous IncidentSerieStarts to have an ID for a serie IncidentSerieNumber = Table.AddColumn(IncidentSerieStart, "IncidentSerieNumber", (parentRecord) => List.Sum( Table.Column( Table.SelectRows( IncidentSerieStart, (childRecord) => parentRecord[Index] >= childRecord[Index] ), "IncidentSerieStart" ) ) ), // group by series Groups = Table.Group(IncidentSerieNumber, {"IncidentSerieNumber"}, {{"IncidentSerie", each _}}), // remove incident serie number RemoveIncidentSerieNumber = Table.RemoveColumns(Groups,{"IncidentSerieNumber"}), // add an inner index for every group GroupInnerIndex = Table.TransformColumns(RemoveIncidentSerieNumber, {"IncidentSerie", each Table.AddIndexColumn(_, "SerieIndex", 1, 1)}), // calculate a running total of AccountDuration GroupInnerRunningTotalOfAccountDuration = Table.TransformColumns(GroupInnerIndex, {"IncidentSerie", (tbl) => Table.AddColumn(tbl, "Sequence Duration", (rec) => List.Sum(Table.Column(Table.SelectRows(tbl, each [SerieIndex] <= rec[SerieIndex]), "AccountDuration")) )}), // aggregations TotalCalls = Table.AddColumn(GroupInnerRunningTotalOfAccountDuration, "Total Calls", each Table.RowCount([IncidentSerie]), Int64.Type), TotalDuration = Table.AddColumn(TotalCalls, "Total Duration", each List.Sum(Table.Column([IncidentSerie], "AccountDuration"))), FirstIncident = Table.AddColumn(TotalDuration, "First Incident", each List.Min(Table.Column([IncidentSerie], "IncidentId"))), // expand group ExpandTable = Table.ExpandTableColumn( FirstIncident, "IncidentSerie", {"IncidentId", "AccountNumber", "CallDate", "AccountDuration", "SerieIndex", "Sequence Duration"} ) in ExpandTable, Groups = Table.Group(#"Changed Type", {"AccountNumber"}, {{"PerAccount", each _}}), RemoveAccountNumber = Table.RemoveColumns(Groups, "AccountNumber"), GroupsSummarized = Table.TransformColumns(RemoveAccountNumber, {"PerAccount", fnSummarizeIncidents}), ExpandTable = Table.ExpandTableColumn(GroupsSummarized, "PerAccount", {"IncidentId", "AccountNumber", "CallDate", "AccountDuration", "SerieIndex", "Sequence Duration", "Total Calls", "Total Duration", "First Incident"}) in ExpandTable
Thanks for all this. I've optimised by initial query to the SQL DB to take advantage of folding right up the point of the function now so that's a bit faster.
This version of the function, with some additional steps after it to complete the query, is managing to complete for a date range of 01/05/2019 - 17/06/2019 (approx 185,000 rows) in around 15 minutes on my laptop. The recursive version was not able to do a single day so this is a definite improvement.
I'm now working on add the whole thing to PBI Dataflows to automate the whole process.
Thanks again for the help.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
101 | |
49 | |
19 | |
12 | |
11 |