Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
To whom it may concern:
I am using power query to combine several different files, all the columns are the same, so it's straight forward in that context. Each row in every file is unique on an ID number, however, file over file there may be duplicates as the row may have been updated between dumps. What I would like is to be able to remove the row from the older file.
The data looks something like this:
File 1 - January 1
ID | Application Received Date | Application Assessed Date | Funding Amount |
1 | December 20, 2020 | December 21, 2020 | $400 |
2 | December 22, 2020 | December 22, 2020 | $500 |
3 | December 22, 2020 | ||
4 | October 2, 2020 | October 4, 2020 | $1000 |
File 2 - January 30
ID | Application Received Date | Application Assessed Date | Funding Amount |
3 | December 22, 2020 | December 24, 2020 | $300 |
4 | November 1, 2020 | November 4, 2020 | $300 |
5 | November 3, 2020 | NOvember 5, 2020 | $500 |
6 | November 9, 2020 | November 11, 2020 | $600 |
All columns can change information except the ID, so I would not be able to group on anything. In the end, I would like to keep the ID files that are common to be from the more recent file, i.e. in the example above, I would like to keep ID 1 and 2 from file 1 and thne 3, 4, 5, and 6 from the second one.
I think the first step is to add in the date of the file as a column, and then group on the ID and date the file was created, but I don't know what to do from there.
Any help would be appreciated.
Thanks in advance,
Van
Solved! Go to Solution.
Hello @Anonymous
what you can do is to take the older file, join it with the newer file using a LeftAnti-join (IDs from the newer file will be removed) and then combining with the newer file. If you have a lot of files this could be some challengen. Another option could be to integrate the creation date of the file or another information that gives you somehow a information of time and then group by id, then taking only the date with max value. Here an example on what I mean
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJJTU7NTUotUjAy0AFiIwMUMUO4mIqJAYgC8gz1DUBIKVYnWskIRbURFhMQYiqm2EwwxqFaAYrRlJsAhfyTS/LBquGKYSImCMsMDUixDSGGZIQxignGCAf45ZdBVCOCBy5ESL8psmJjhH5/qJAprvCC6jdD1m+Jab8hUpSZYRgQCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Application Received Date" = _t, #"Application Assessed Date" = _t, #"Funding Amount" = _t, #"Creation date" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Application Received Date", type date}, {"Application Assessed Date", type date}, {"Funding Amount", type text}, {"Creation date", type date}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"MaxCreationDate", each Table.Max(_,"Creation date")}}),
#"Expanded MaxCreationDate" = Table.ExpandRecordColumn(#"Grouped Rows", "MaxCreationDate", {"Application Received Date", "Application Assessed Date", "Funding Amount", "Creation date"}, {"Application Received Date", "Application Assessed Date", "Funding Amount", "Creation date"})
in
#"Expanded MaxCreationDate"
transforms this
into this
Copy paste this code to the advanced editor in a new blank query to see how the solution works.
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
Hi @Anonymous ,
What you're looking for is perfectly explained in this article https://radacad.com/grouping-in-power-query-getting-the-last-item-in-each-group. There is no much to add 🙂
Good luck!
P.S. I've used it by myself just few days ago and it worked exactly as expected! Radacad articles are always easy to follow.
Hello @Anonymous
what you can do is to take the older file, join it with the newer file using a LeftAnti-join (IDs from the newer file will be removed) and then combining with the newer file. If you have a lot of files this could be some challengen. Another option could be to integrate the creation date of the file or another information that gives you somehow a information of time and then group by id, then taking only the date with max value. Here an example on what I mean
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJJTU7NTUotUjAy0AFiIwMUMUO4mIqJAYgC8gz1DUBIKVYnWskIRbURFhMQYiqm2EwwxqFaAYrRlJsAhfyTS/LBquGKYSImCMsMDUixDSGGZIQxignGCAf45ZdBVCOCBy5ESL8psmJjhH5/qJAprvCC6jdD1m+Jab8hUpSZYRgQCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Application Received Date" = _t, #"Application Assessed Date" = _t, #"Funding Amount" = _t, #"Creation date" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Application Received Date", type date}, {"Application Assessed Date", type date}, {"Funding Amount", type text}, {"Creation date", type date}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"MaxCreationDate", each Table.Max(_,"Creation date")}}),
#"Expanded MaxCreationDate" = Table.ExpandRecordColumn(#"Grouped Rows", "MaxCreationDate", {"Application Received Date", "Application Assessed Date", "Funding Amount", "Creation date"}, {"Application Received Date", "Application Assessed Date", "Funding Amount", "Creation date"})
in
#"Expanded MaxCreationDate"
transforms this
into this
Copy paste this code to the advanced editor in a new blank query to see how the solution works.
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
Are the IDs incrementing over time, always? If so, you could do a group by all other columns (the ones the same) and then do a max of the ID column.
@Anonymous , of coz grouping all combined records by ID is a most straightforward and efficient solution to your issue.
First, combine all files in a CHRONOLOGICAL order (important!);
Secondly, index the combined records;
Thirdly, group the records by ID and keep the record with the max index.
let
File1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUXJJTU7NTUotUjAy0AFiIwMUMUO4mIqJgYFSrE60khGKAiMsmhBiKqZQTcY4FCiAMUiFCZDln1ySD1YAl4eJmCCMNDQAmRkLAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Application Received Date" = _t, #"Application Assessed Date" = _t, #"Funding Amount" = _t]),
File2 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlbSUXJJTU7NTUotUjAy0lEwMjAyQBEzgYupGBsYKMXqRCuZADl++WUQBYZwebgQFi2myPLGCC3+UCFThBZTqBYzZC2WmLYYImxWMQPpiQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Application Received Date" = _t, #"Application Assessed Date" = _t, #"Funding Amount" = _t]),
#"Combined Files" = File1 & File2,
Columns = Table.ColumnNames(#"Combined Files"),
#"Added Index" = Table.AddIndexColumn(#"Combined Files", "Index", 1, 1, Int64.Type),
#"Grouped Rows" = Table.Group(#"Added Index", {"ID"}, {{"ar", each _, type table [ID=nullable text, Application Received Date=nullable text, Application Assessed Date=nullable text, Funding Amount=nullable text, Index=number]}}),
#"Removed Columns" = Table.TransformColumns(Table.RemoveColumns(#"Grouped Rows",{"ID"}), {{"ar", each Table.Last(Table.Sort(_, {"Index", Order.Ascending}))}}),
#"Expanded ar" = Table.ExpandRecordColumn(#"Removed Columns", "ar", Columns, Columns)
in
#"Expanded ar"
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
99 | |
50 | |
19 | |
12 | |
11 |