Solved: Re: Grouping or Filtering Data of one column based...

Sachintha · ‎11-11-2021

I get a weekly .CSV file with some numbers for some jobs per work week. Each file has data for 3 work weeks, ending with current WW. They look like this.

202133.csv (= Data file received in WorkWeek 33 of 2021)

Job,WW31,WW32,WW33
JobA,4,4,4
JobB,2,6,3
JobC,2,4,7

202134.csv

Job,WW32,WW33,WW34
JobA,4,6,3
JobB,8,3,4
JobC,4,7,5

202135.csv

Job,WW33,WW34,WW35
JobA,9,3,2
JobB,3,4,1
JobC,7,8,5

I believe you understand the pattern. Data for same jobs for the past 3 weeks, and each week the oldest WW data is removed and the new one added. The caveat is that numbers for the same job for the same WW can be different (this is due to data being revised by another party). For example, in WW33 file, JobA WW33 value is 4. The same value is 6 in WW 34, and 9 in WW35.

I have these files in a folder, and I do a Read from Folder data get and unpivot column to get WW data into columns. My read data function and the data table M-Query looks like this:

fnReadData

= (Parameter1 as binary) => let
        Source = Csv.Document(Parameter1,[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
        #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true])
    in
        #"Promoted Headers"

Data

let
    Source = Folder.Files("Data Path"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"fnReadData"([Content])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Name", "Transform File"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns1",{{"Name", "FileDate"}}),
    #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns",".txt","",Replacer.ReplaceText,{"FileDate"}),
    AllColumnsNames = 
        Table.ColumnNames(
            Table.Combine(#"Replaced Value"[Transform File])
        ),
    #"Expanded Transform File" = Table.ExpandTableColumn(#"Replaced Value", "Transform File", AllColumnsNames, AllColumnsNames),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Transform File", {"FileDate", "Job"}, "WorkWeek", "Amount"),
    #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Amount", Int64.Type}, {"FileDate", Int64.Type}})
in
    #"Changed Type"

The resulting table looks like this.

As you can see, if we focus on JobA data for WW33, there are 3 entries. My next requirement is to filter/group this data set so I only have one row of data per job per WW. In case there are multiple entries as highlighted above, I want the chosen row to be the data from the latest file, or in other words, the latest (largest) number in the FileDate column.

How should I go about this? It seems to me that I need to do a grouped filtering, but I can't figure out how.

HotChilli · ‎11-11-2021

If you Group By on workweek and Job with 2 aggregations: Max of Filedate , All Rows.

You can then expand the 'all' column.

Then Filter the table to keep the rows where the filedate = MaxFiledate (that bit sounds complicated but you can filter from the column header of MaxFileDate -> just choose one date from the dropdown -> then replace the hardcoded date in the formula bar with the name of the filedate column)

--

let me know how it goes.

View solution in original post

HotChilli · ‎11-12-2021

The line for

#"Expanded All Rows" = Table.Expand....

is only returning the Amount from the original data. It needs to return FileDate as well.

The idea behind the algorithm is that the data at this stage is back to the original dataset but with an additional column MaxFileDate which is the max for each workweek/job combination.

--

Then the table gets filtered to retain only the rows where Maxdate = filedate. That's the last part of the algorithm in my previous post. This will be a Table.SelectRows statement. You can write it yourself or get Power Query to write it for a hardcoded value which you can then edit with the name of the column.

Sachintha · ‎11-12-2021

That worked perfectly, thank you!

HotChilli · ‎11-12-2021

What about the last part ? You have to edit the formula afterwards to replace the hardcoded value.

Post the formula here if it's not working

Sachintha · ‎11-12-2021

Posted my code below. The two issues I have:

Once I GroupBy "WorkWeek" and "Job", I no longer have the "FileDate" column in the resulting table.
Even if I did, I don't know how to replace the hardcoded date value with a column name.

let
    Source = Folder.Files("MyPath"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"fnReadData"([Content])),
    #"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Name", "Transform File"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns1",{{"Name", "FileDate"}}),
    #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns",".txt","",Replacer.ReplaceText,{"FileDate"}),
    AllColumnsNames = 
        Table.ColumnNames(
            Table.Combine(#"Replaced Value"[Transform File])
        ),
    #"Expanded Transform File" = Table.ExpandTableColumn(#"Replaced Value", "Transform File", AllColumnsNames, AllColumnsNames),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Transform File", {"FileDate", "Job"}, "WorkWeek", "Amount"),
    #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Amount", Int64.Type}, {"FileDate", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"WorkWeek", "Job"}, {{"Max of Filedate", each List.Max([FileDate]), type nullable number}, {"All Rows", each _, type table [FileDate=nullable number, Job=text, WorkWeek=text, Amount=nullable number]}}),
    #"Expanded All Rows" = Table.ExpandTableColumn(#"Grouped Rows", "All Rows", {"Amount"}, {"Amount"})
in
    #"Expanded All Rows"

HotChilli · ‎11-11-2021

If you Group By on workweek and Job with 2 aggregations: Max of Filedate , All Rows.

You can then expand the 'all' column.

Then Filter the table to keep the rows where the filedate = MaxFiledate (that bit sounds complicated but you can filter from the column header of MaxFileDate -> just choose one date from the dropdown -> then replace the hardcoded date in the formula bar with the name of the filedate column)

--

let me know how it goes.

Sachintha · ‎11-12-2021

I got to the part where you expand the all column, which worked fine.

But I don't understand the last para where you do the filtering. Also, I don't want to filter MaxFileDate to select only one; I still need data from all the data files (because of the rolling pattern, the older files will have data that the new files don't, and I need that data), what I want is to remove duplicate records by keeping the data from most recent FileDate and removing the rest. If I select only one MaxFileDate, that would essentially eliminate the rest, right?

Grouping or Filtering Data of one column based on another

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly