cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
PBoltryk Frequent Visitor
Frequent Visitor

Local Groups in DAX / Power Query optimization

Hi all,

I am trying to calculate unique ID for each group of consecutive days that are "Absence", called absence instances. For that I am using function from M language Table.Group(,,,GroupKind.Local) but sorting of data required for that operation is very taxing and stressing my hardware - source of 700MB worth of several CSV files is querying for several hours, utilizing 16GB of RAM and SSD to 100% and CPU to 30-80%.

Is there an equivalent of local grouping from M in DAX or are there any optimization techniques I could use for the below M code?

Query that retrieves facts table (need it in data model):

let

    Source = Folder.Files("\\Gbdscwssc0271\gdrive$\MI Team\Tri Plant Reports\00-PBI Reports\F_Absence\KRONOS CSV"),
    #"Sorted Rows" = Table.Sort(Source,{{"Date modified", Order.Descending}}),
    #"Invoke Custom Function1" = Table.AddColumn(#"Sorted Rows", "Transform File from KRONOS", each #"Transform File from KRONOS"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Renamed Columns1",{"Source.Name", "Transform File from KRONOS"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns", "Transform File from KRONOS", Table.ColumnNames(#"Transform File from KRONOS"(#"Sample File"))),
    #"Removed Top Rows" = Table.Skip(#"Expanded Table Column1",15),
    #"Replaced Value" = Table.ReplaceValue(#"Removed Top Rows","Employee:","Employee",Replacer.ReplaceText,{"Column1"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","ID:","Payroll_ID",Replacer.ReplaceText,{"Column2"}),
    #"Added Custom3" = Table.AddColumn(#"Replaced Value1", "Source", each if [Column1] = "Employee" then "Source.Name" else [Source.Name]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Added Custom3", [PromoteAllScalars=true]),
    #"Removed Other Columns2" = Table.SelectColumns(#"Promoted Headers",{"Employee", "Payroll_ID", "Emp Type", "Day", "Date", "Pay Code", "Hours", "Money", "Days", "Entered By", "Datasource", "Sector", "Division", "Customer Sub Div", "Site Name", "Contract", "Department", "Job", "First Aider / Hourly or Salaried", "Schedule Group", "Job Role", "WMS User ID ", "Source.Name"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Other Columns2",{{"Source.Name", type text}, {"Employee", type text}, {"Payroll_ID", type text}, {"Emp Type", type text}, {"Day", type text}, {"Date", type date}, {"Pay Code", type text}, {"Hours", type number}, {"Money", Currency.Type}, {"Days", type number}, {"Entered By", type text}, {"Datasource", type text}, {"Sector", type text}, {"Division", type text}, {"Customer Sub Div", type text}, {"Site Name", type text}, {"Contract", type text}, {"Department", type text}, {"Job", type text}, {"First Aider / Hourly or Salaried", type text}, {"Schedule Group", type text}, {"Job Role", type text}, {"WMS User ID ", type text}}),
    #"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type1", {"Payroll_ID"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Errors",{{"Source.Name", "Source"}}),
    #"Removed Blank Rows" = Table.SelectRows(#"Renamed Columns", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    #"Filtered Rows1" = Table.SelectRows(#"Removed Blank Rows", each not Text.Contains([Pay Code], "prem") and not Text.Contains([Pay Code], "Prem") and not Text.Contains([Pay Code], "bonus") and not Text.Contains([Pay Code], "Bonus") and not Text.Contains([Pay Code], "Adj") and not Text.Contains([Pay Code], "ADJ") and not Text.Contains([Pay Code], "BONUS") and not Text.Contains([Pay Code], "Attend") and not Text.Contains([Pay Code], "Pay Code")),
    #"Removed Other Columns1" = Table.SelectColumns(#"Filtered Rows1",{"Employee", "Payroll_ID", "Emp Type", "Date", "Pay Code", "Hours", "Money", "Site Name", "Job", "Job Role", "Source"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Other Columns1", each [Date] <> null and [Date] <> ""),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows", "ID", each [Payroll_ID] & Text.From(Number.From([Date])) & [Pay Code]),
    #"Remove Duplicates" = Table.Distinct(#"Added Custom", {"ID"}),
    #"Removed Columns" = Table.RemoveColumns(#"Remove Duplicates",{"ID"}),
    #"Merged Queries" = Table.NestedJoin(#"Removed Columns",{"Pay Code"},PayCodes,{"Pay Code"},"PayCodes",JoinKind.LeftOuter),
    #"Expanded PayCodes" = Table.ExpandTableColumn(#"Merged Queries", "PayCodes", {"LTS Trigger", "LTA Trigger"}, {"LTS Trigger", "LTA Trigger"}),
    #"Added Custom2" = Table.AddColumn(#"Expanded PayCodes", "ID", each [Payroll_ID] & "-" & Number.ToText([LTA Trigger])),
    #"Added Custom1" = Table.AddColumn(#"Added Custom2", "Weekday", each Date.DayOfWeek([Date], Day.Monday))

in
    #"Added Custom1"


Above Query is referenced to calculate Absence instances for every person (Payroll ID below):

 

let
    Source = KRONOS,
    Sort = Table.Sort(Source,{{"Payroll_ID", Order.Ascending}, {"Date", Order.Descending}}),
    TimeRanges = 
Table.Group(
    Sort,
    "ID", {
         {"Start Date", each List.Min([Date]), type date},
         {"End Date", each List.Max([Date]), type date},
         {"Number of Days", each List.Count([Date]), type number},
         {"Pay Code", each List.First([Pay Code])},
         {"Site", each List.First([Site Name])},
         {"Employee", each List.First([Employee])},
         {"Payroll_ID", each List.First([Payroll_ID])}
        },
    GroupKind.Local
),
    #"Merged Queries" = Table.NestedJoin(TimeRanges,{"Pay Code"},PayCodes,{"Pay Code"},"PayCodes",JoinKind.LeftOuter),
    #"Expanded PayCodes" = Table.ExpandTableColumn(#"Merged Queries", "PayCodes", {"LTS Trigger", "LTA Trigger"}, {"LTS Trigger", "LTA Trigger"}),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded PayCodes",{"Site"},Sites,{"Site Name"},"Sites",JoinKind.LeftOuter),
    #"Expanded Sites" = Table.ExpandTableColumn(#"Merged Queries1", "Sites", {"ix_Locn"}, {"ix_Locn"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Sites",{"Pay Code", "Site"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([LTS Trigger] = 1 or [LTA Trigger] = 1)),
    #"Renamed Columns" = Table.RenameColumns(#"Filtered Rows",{{"ID", "AbsenceLocalGroupID"}})
in
    #"Renamed Columns"


Any help would be much appreciated!

 

2 REPLIES 2
PBoltryk Frequent Visitor
Frequent Visitor

Local Group in DAX / Power Query Optimization

Hi all,

I am trying to calculate amount of absence instances, one instance defined as a group of consecutive absence days per each person. I achieved positive results with Power Query function Table.Group(,,,GroupKind.Local), however the sorting required for this option is so taxing, it's using all of 16GB of RAM and then SSD on top of that. Query is running for 1,5hrs and I am not sure when it will end.

The source data is 700MB worth of several CSV files. Is there a DAX equivalent of local grouping or are there any significant optimizations techniques I could add to M code below?

Facts table that I need in data model below:

let

    Source = Folder.Files("\\Gbdscwssc0271\gdrive$\MI Team\Tri Plant Reports\00-PBI Reports\F_Absence\KRONOS CSV"),
    #"Sorted Rows" = Table.Sort(Source,{{"Date modified", Order.Descending}}),
    #"Invoke Custom Function1" = Table.AddColumn(#"Sorted Rows", "Transform File from KRONOS", each #"Transform File from KRONOS"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Renamed Columns1",{"Source.Name", "Transform File from KRONOS"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns", "Transform File from KRONOS", Table.ColumnNames(#"Transform File from KRONOS"(#"Sample File"))),
    #"Removed Top Rows" = Table.Skip(#"Expanded Table Column1",15),
    #"Replaced Value" = Table.ReplaceValue(#"Removed Top Rows","Employee:","Employee",Replacer.ReplaceText,{"Column1"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","ID:","Payroll_ID",Replacer.ReplaceText,{"Column2"}),
    #"Added Custom3" = Table.AddColumn(#"Replaced Value1", "Source", each if [Column1] = "Employee" then "Source.Name" else [Source.Name]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Added Custom3", [PromoteAllScalars=true]),
    #"Removed Other Columns2" = Table.SelectColumns(#"Promoted Headers",{"Employee", "Payroll_ID", "Emp Type", "Day", "Date", "Pay Code", "Hours", "Money", "Days", "Entered By", "Datasource", "Sector", "Division", "Customer Sub Div", "Site Name", "Contract", "Department", "Job", "First Aider / Hourly or Salaried", "Schedule Group", "Job Role", "WMS User ID ", "Source.Name"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Other Columns2",{{"Source.Name", type text}, {"Employee", type text}, {"Payroll_ID", type text}, {"Emp Type", type text}, {"Day", type text}, {"Date", type date}, {"Pay Code", type text}, {"Hours", type number}, {"Money", Currency.Type}, {"Days", type number}, {"Entered By", type text}, {"Datasource", type text}, {"Sector", type text}, {"Division", type text}, {"Customer Sub Div", type text}, {"Site Name", type text}, {"Contract", type text}, {"Department", type text}, {"Job", type text}, {"First Aider / Hourly or Salaried", type text}, {"Schedule Group", type text}, {"Job Role", type text}, {"WMS User ID ", type text}}),
    #"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type1", {"Payroll_ID"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Errors",{{"Source.Name", "Source"}}),
    #"Removed Blank Rows" = Table.SelectRows(#"Renamed Columns", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    #"Filtered Rows1" = Table.SelectRows(#"Removed Blank Rows", each not Text.Contains([Pay Code], "prem") and not Text.Contains([Pay Code], "Prem") and not Text.Contains([Pay Code], "bonus") and not Text.Contains([Pay Code], "Bonus") and not Text.Contains([Pay Code], "Adj") and not Text.Contains([Pay Code], "ADJ") and not Text.Contains([Pay Code], "BONUS") and not Text.Contains([Pay Code], "Attend") and not Text.Contains([Pay Code], "Pay Code")),
    #"Removed Other Columns1" = Table.SelectColumns(#"Filtered Rows1",{"Employee", "Payroll_ID", "Emp Type", "Date", "Pay Code", "Hours", "Money", "Site Name", "Job", "Job Role", "Source"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Other Columns1", each [Date] <> null and [Date] <> ""),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows", "ID", each [Payroll_ID] & Text.From(Number.From([Date])) & [Pay Code]),
    #"Remove Duplicates" = Table.Distinct(#"Added Custom", {"ID"}),
    #"Removed Columns" = Table.RemoveColumns(#"Remove Duplicates",{"ID"}),
    #"Merged Queries" = Table.NestedJoin(#"Removed Columns",{"Pay Code"},PayCodes,{"Pay Code"},"PayCodes",JoinKind.LeftOuter),
    #"Expanded PayCodes" = Table.ExpandTableColumn(#"Merged Queries", "PayCodes", {"LTS Trigger", "LTA Trigger"}, {"LTS Trigger", "LTA Trigger"}),
    #"Added Custom2" = Table.AddColumn(#"Expanded PayCodes", "ID", each [Payroll_ID] & "-" & Number.ToText([LTA Trigger])),
    #"Added Custom1" = Table.AddColumn(#"Added Custom2", "Weekday", each Date.DayOfWeek([Date], Day.Monday))

in
    #"Added Custom1"


Above query is referenced when calculating Absence Instances for each person with the code below:

let
    Source = KRONOS,
    Sort = Table.Sort(Source,{{"Payroll_ID", Order.Ascending}, {"Date", Order.Descending}}),
    TimeRanges = 
Table.Group(
    Sort,
    "ID", {
         {"Start Date", each List.Min([Date]), type date},
         {"End Date", each List.Max([Date]), type date},
         {"Number of Days", each List.Count([Date]), type number},
         {"Pay Code", each List.First([Pay Code])},
         {"Site", each List.First([Site Name])},
         {"Employee", each List.First([Employee])},
         {"Payroll_ID", each List.First([Payroll_ID])}
        },
    GroupKind.Local
),
    #"Merged Queries" = Table.NestedJoin(TimeRanges,{"Pay Code"},PayCodes,{"Pay Code"},"PayCodes",JoinKind.LeftOuter),
    #"Expanded PayCodes" = Table.ExpandTableColumn(#"Merged Queries", "PayCodes", {"LTS Trigger", "LTA Trigger"}, {"LTS Trigger", "LTA Trigger"}),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded PayCodes",{"Site"},Sites,{"Site Name"},"Sites",JoinKind.LeftOuter),
    #"Expanded Sites" = Table.ExpandTableColumn(#"Merged Queries1", "Sites", {"ix_Locn"}, {"ix_Locn"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Sites",{"Pay Code", "Site"}),
    #"Filtered Rows" = Table.SelectRows(#"Removed Columns", each ([LTS Trigger] = 1 or [LTA Trigger] = 1)),
    #"Renamed Columns" = Table.RenameColumns(#"Filtered Rows",{{"ID", "AbsenceLocalGroupID"}})
in
    #"Renamed Columns"


Thank you all in advance.

 

v-danhe-msft Super Contributor
Super Contributor

Re: Local Groups in DAX / Power Query optimization

Hi @PBoltryk,

Could you please offer me some sample file to have a test and post your desired result if possible?

 

Regards,

Daniel He

Community Support Team _ Daniel He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Community News & Announcements

Community News & Announcements

Get your latest community news and announcements.

Summit North America

Power Platform Summit North America

Register by September 5 to save $200

Virtual Launch Event

Microsoft Business Applications Virtual Launch Event

Watch the event on demand for an in-depth look at the new innovations across Dynamics 365 and the Microsoft Power Platform.

MBAS Gallery

Watch Sessions On Demand!

Continue your learning in our online communities.

Users Online
Currently online: 111 members 1,554 guests
Please welcome our newest community members: