Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
I'm trying to figure out how to remove duplicate rows based on certain criteria. If one of the rows matches the criteria, I keep it and delete the duplicates. I can do this in VBA, but I really want to learn how to accomplish the same in M. Goal is to feed my spreadsheet into PowerBI and let it do all the work. But I'm new to M and having trouble finding examples I can learn from.
Here's a snippet of the excel sheet with the 3 columns the script checks.
Column tc_and_lineitem identifies duplicates. The sheet is sorted so the duplicate rows naturally end up next to each other. af2 and af3 are the criteria.
Here's the VBA code that loops through and deletes duplicate rows (please no comments )
Essentially loops through the sheet, when duplicates are found it checks af2 (fund), and in some circumstances af3 (orgn), then deletes the unwanted row.
rowNum = 2 Do While rowNum < lastRowNum startTCLineItem = thisSheet.Range("O" & rowNum) startFund = thisSheet.Range("CT" & rowNum) startOrgn = thisSheet.Range("CU" & rowNum) rowNum = rowNum + 1 thisTCLineItem = thisSheet.Range("O" & rowNum) ' // check if TCLineItems match. When they stop matching while loop returns to outer loop and thisTCLineItem becomes the new startTCLineItem. Do While startTCLineItem = thisTCLineItem thisFund = thisSheet.Range("CT" & rowNum) thisOrgn = thisSheet.Range("CU" & rowNum) ' // Keep fund 16700 and remove the duplicate row, ' // or if fund is 30330 then keep orgn 5770 and remove the duplicate row. If startFund = "16700" Then thisSheet.Rows(rowNum).Delete rowNum = rowNum - 1 lastRowNum = lastRowNum - 1 ElseIf thisFund = "16700" Then z = rowNum - 1 thisSheet.Rows(z).Delete rowNum = rowNum - 1 lastRowNum = lastRowNum - 1 ElseIf startFund = "30330" Then If startOrgn = "5770" Then thisSheet.Rows(rowNum).Delete rowNum = rowNum - 1 lastRowNum = lastRowNum - 1 ElseIf thisOrgn = "5770" Then z = rowNum - 1 thisSheet.Rows(z).Delete rowNum = rowNum - 1 lastRowNum = lastRowNum - 1 End If End If rowNum = rowNum + 1 thisTCLineItem = thisSheet.Range("O" & rowNum) Loop Loop
If somebody could point me in the right direction regarding how to accomplish the same in M, that would be great! I would love to do this better and with less human interaction. Power Query and M seem the way to go.
Thanks
James
Hi @dudeyates ,
You could refer to Remove Duplicates function in query editor:
Result:
You could also download the pbix file to have a view.
Regards,
Daniel He
@v-danhe-msft Thanks for taking the time took look at this. I think that works because coincidentally the sort puts af2 16700 above the duplicate to be removed. But what if the first duplicate had af2 of 12345, and the af2 of 16700 I want to keep ends up being the 2nd duplicate, does the af2 16700 get removed?
For the vast majority of my data, the straight forward Remove Duplicates works. But there are instances when the line I want to keep is the 2nd of the duplicates, and I actually want to remove the first duplicate. And maybe I'm just missing something in your example (I have downloaded and viewed)
Thanks,
James
In query editor; create the criteria column, sort, and then Remove Rows/Delete Duplicates. Like I've done here:
Thanks for the response. I'm not sure of the steps to take to get to the Asset Key you created. How does Removing Duplicates know that I want to keep tc_and_lineitem: "1083 1" with a af2 = "16700", but remove tc_and_lineitem: "1083 1" with a different af2?
Thanks,
James
My column was a simple concatenate basically = [Column1]&[Column3]&[Column5]
yours will likely be more complex, don't really know with the data your showing. Instead of Remove Duplicate you may need to create some sort of key that you could then filter duplicate rows out that way.
User | Count |
---|---|
140 | |
113 | |
104 | |
77 | |
65 |
User | Count |
---|---|
136 | |
118 | |
101 | |
71 | |
61 |