Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi All,
I am looking for a way to filter or trim a data set in Power Query that has dates from multiple years for over 1k unique IDs. I need to keep the rows with the most recent recorded date for each year, for each unique ID.
Thanks!
Below are a data sample and expected result:
Sample | |||||
Index | ID | SD | Band | Recorded Date | Year |
16 | ABD3100170220.AL35 | 1 | Good | 20/08/2019 | 2019 |
2094 | ABD3100170220.AL35 | 0.8 | Good | 23/06/2020 | 2020 |
3073 | ABD3100170220.AL35 | 0.8 | Good | 10/11/2020 | 2020 |
6126 | ABD3100170220.AL35 | 0.8 | Good | 20/07/2021 | 2021 |
7118 | ABD3100170220.AL35 | 0.8 | Good | 19/10/2021 | 2021 |
7992 | ABD3100170220.AL35 | null | No Data | 20/09/2022 | 2022 |
21 | ABD31001711001320.AL35 | 1.2 | Good | 06/11/2018 | 2018 |
2099 | ABD31001711001320.AL35 | 1.2 | Good | 23/06/2020 | 2020 |
3078 | ABD31001711001320.AL35 | 1.2 | Good | 10/11/2020 | 2020 |
6131 | ABD31001711001320.AL35 | 1.2 | Good | 20/07/2021 | 2021 |
7123 | ABD31001711001320.AL35 | 1.1 | Good | 19/10/2021 | 2021 |
7997 | ABD31001711001320.AL35 | 2.9 | Satisfactory | 20/09/2022 | 2022 |
22 | ABD31001713201540.AL35 | 0.9 | Good | 06/11/2018 | 2018 |
2100 | ABD31001713201540.AL35 | 0.9 | Good | 23/06/2020 | 2020 |
3079 | ABD31001713201540.AL35 | 0.9 | Good | 10/11/2020 | 2020 |
6132 | ABD31001713201540.AL35 | 0.8 | Good | 20/07/2021 | 2021 |
7124 | ABD31001713201540.AL35 | 0.8 | Good | 19/10/2021 | 2021 |
7998 | ABD31001713201540.AL35 | 1.1 | Good | 20/09/2022 | 2022 |
23 | ABD31001715400.AL35 | 1.4 | Good | 06/11/2018 | 2018 |
2101 | ABD31001715400.AL35 | 1.9 | Good | 23/06/2020 | 2020 |
3080 | ABD31001715400.AL35 | 1.9 | Good | 10/11/2020 | 2020 |
6133 | ABD31001715400.AL35 | 2.1 | Good | 20/07/2021 | 2021 |
7125 | ABD31001715400.AL35 | 1.7 | Good | 19/10/2021 | 2021 |
7999 | ABD31001715400.AL35 | 1.1 | Good | 20/09/2022 | 2022 |
17 | ABD310017220440.AL35 | 1.2 | Good | 19/02/2019 | 2019 |
2095 | ABD310017220440.AL35 | 0.7 | Good | 23/06/2020 | 2020 |
3074 | ABD310017220440.AL35 | 0.7 | Good | 10/11/2020 | 2020 |
6127 | ABD310017220440.AL35 | 0.7 | Good | 20/07/2021 | 2021 |
7119 | ABD310017220440.AL35 | 1.3 | Good | 19/10/2021 | 2021 |
7993 | ABD310017220440.AL35 | null | No Data | 20/09/2022 | 2022 |
Solved! Go to Solution.
Hi,
Thanks for your response. However, I don't have duplicate rows in my data - those have already been removed. I need the most recent recording date per year (2018, 2019, 2020, 2021 2022) for each unique ID. Each unique ID should be left with the most recent recording date for every year which would result in 4-5 rows per unique ID.
Hi,
I think your best approach would be using the Table.Buffer function.
First sort the column newest to oldest, then create a custom step using Table.Buffer, then remove duplicates while selecting the year and ID columns together.
Similar example below
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{{"Employee Number", Int64.Type}, {"Employee Name", type text}, {"Date Of Joining"", type datetime}, {"Annual Salary", Int64.Type}, {"Additional Comments", type text}}), #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Date Of Joining", Order.Descending}}), #"Buffered" = Table.Buffer(#"Sorted Rows"), #"Removed Duplicates" = Table.Distinct(#"Buffered", {"Employee Number"}) in #"Removed Duplicates"
Hi,
Thanks for your response. However, I don't have duplicate rows in my data - those have already been removed. I need the most recent recording date per year (2018, 2019, 2020, 2021 2022) for each unique ID. Each unique ID should be left with the most recent recording date for every year which would result in 4-5 rows per unique ID.
Yes, by highlighting ID and Year and removing duplicates, it will only remove rows where the Year and ID are the same. By using sort, then table buffer before doing this then it will keep the latest date for each ID in each year.
removing duplicates does not necessarily remove duplicate rows. It removes duplicate instances of the column(s) selected. While your dataset doesn't have duplicate rows, it does have rows where both the year and the ID are the same.
Many thanks for the additional explanation. The solution worked well.
Thank you!
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
110 | |
94 | |
82 | |
66 | |
58 |
User | Count |
---|---|
151 | |
121 | |
104 | |
87 | |
67 |