Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I get a weekly .CSV file with some numbers for some jobs per work week. Each file has data for 3 work weeks, ending with current WW. They look like this.
202133.csv (= Data file received in WorkWeek 33 of 2021)
Job,WW31,WW32,WW33
JobA,4,4,4
JobB,2,6,3
JobC,2,4,7
202134.csv
Job,WW32,WW33,WW34
JobA,4,6,3
JobB,8,3,4
JobC,4,7,5
202135.csv
Job,WW33,WW34,WW35
JobA,9,3,2
JobB,3,4,1
JobC,7,8,5
I believe you understand the pattern. Data for same jobs for the past 3 weeks, and each week the oldest WW data is removed and the new one added. The caveat is that numbers for the same job for the same WW can be different (this is due to data being revised by another party). For example, in WW33 file, JobA WW33 value is 4. The same value is 6 in WW 34, and 9 in WW35.
I have these files in a folder, and I do a Read from Folder data get and unpivot column to get WW data into columns. My read data function and the data table M-Query looks like this:
fnReadData
= (Parameter1 as binary) => let
Source = Csv.Document(Parameter1,[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true])
in
#"Promoted Headers"
Data
let
Source = Folder.Files("Data Path"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"fnReadData"([Content])),
#"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Name", "Transform File"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns1",{{"Name", "FileDate"}}),
#"Replaced Value" = Table.ReplaceValue(#"Renamed Columns",".txt","",Replacer.ReplaceText,{"FileDate"}),
AllColumnsNames =
Table.ColumnNames(
Table.Combine(#"Replaced Value"[Transform File])
),
#"Expanded Transform File" = Table.ExpandTableColumn(#"Replaced Value", "Transform File", AllColumnsNames, AllColumnsNames),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Transform File", {"FileDate", "Job"}, "WorkWeek", "Amount"),
#"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Amount", Int64.Type}, {"FileDate", Int64.Type}})
in
#"Changed Type"
The resulting table looks like this.
As you can see, if we focus on JobA data for WW33, there are 3 entries. My next requirement is to filter/group this data set so I only have one row of data per job per WW. In case there are multiple entries as highlighted above, I want the chosen row to be the data from the latest file, or in other words, the latest (largest) number in the FileDate column.
How should I go about this? It seems to me that I need to do a grouped filtering, but I can't figure out how.
Solved! Go to Solution.
If you Group By on workweek and Job with 2 aggregations: Max of Filedate , All Rows.
You can then expand the 'all' column.
Then Filter the table to keep the rows where the filedate = MaxFiledate (that bit sounds complicated but you can filter from the column header of MaxFileDate -> just choose one date from the dropdown -> then replace the hardcoded date in the formula bar with the name of the filedate column)
--
let me know how it goes.
The line for
#"Expanded All Rows" = Table.Expand....
is only returning the Amount from the original data. It needs to return FileDate as well.
The idea behind the algorithm is that the data at this stage is back to the original dataset but with an additional column MaxFileDate which is the max for each workweek/job combination.
--
Then the table gets filtered to retain only the rows where Maxdate = filedate. That's the last part of the algorithm in my previous post. This will be a Table.SelectRows statement. You can write it yourself or get Power Query to write it for a hardcoded value which you can then edit with the name of the column.
That worked perfectly, thank you!
What about the last part ? You have to edit the formula afterwards to replace the hardcoded value.
Post the formula here if it's not working
Posted my code below. The two issues I have:
let
Source = Folder.Files("MyPath"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"fnReadData"([Content])),
#"Removed Other Columns1" = Table.SelectColumns(#"Invoke Custom Function1", {"Name", "Transform File"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns1",{{"Name", "FileDate"}}),
#"Replaced Value" = Table.ReplaceValue(#"Renamed Columns",".txt","",Replacer.ReplaceText,{"FileDate"}),
AllColumnsNames =
Table.ColumnNames(
Table.Combine(#"Replaced Value"[Transform File])
),
#"Expanded Transform File" = Table.ExpandTableColumn(#"Replaced Value", "Transform File", AllColumnsNames, AllColumnsNames),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Expanded Transform File", {"FileDate", "Job"}, "WorkWeek", "Amount"),
#"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Amount", Int64.Type}, {"FileDate", Int64.Type}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"WorkWeek", "Job"}, {{"Max of Filedate", each List.Max([FileDate]), type nullable number}, {"All Rows", each _, type table [FileDate=nullable number, Job=text, WorkWeek=text, Amount=nullable number]}}),
#"Expanded All Rows" = Table.ExpandTableColumn(#"Grouped Rows", "All Rows", {"Amount"}, {"Amount"})
in
#"Expanded All Rows"
If you Group By on workweek and Job with 2 aggregations: Max of Filedate , All Rows.
You can then expand the 'all' column.
Then Filter the table to keep the rows where the filedate = MaxFiledate (that bit sounds complicated but you can filter from the column header of MaxFileDate -> just choose one date from the dropdown -> then replace the hardcoded date in the formula bar with the name of the filedate column)
--
let me know how it goes.
I got to the part where you expand the all column, which worked fine.
But I don't understand the last para where you do the filtering. Also, I don't want to filter MaxFileDate to select only one; I still need data from all the data files (because of the rolling pattern, the older files will have data that the new files don't, and I need that data), what I want is to remove duplicate records by keeping the data from most recent FileDate and removing the rest. If I select only one MaxFileDate, that would essentially eliminate the rest, right?
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
113 | |
97 | |
79 | |
73 | |
56 |
User | Count |
---|---|
145 | |
105 | |
104 | |
90 | |
63 |