Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
dudeyates
Frequent Visitor

Using Power Query and M to Delete Duplicates Based on Certain Criteria

I'm trying to figure out how to remove duplicate rows based on certain criteria. If one of the rows matches the criteria, I keep it and delete the duplicates. I can do this in VBA, but I really want to learn how to accomplish the same in M. Goal is to feed my spreadsheet into PowerBI and let it do all the work. But I'm new to M and having trouble finding examples I can learn from.

 

Here's a snippet of the excel sheet with the 3 columns the script checks.

Column tc_and_lineitem identifies duplicates. The sheet is sorted so the duplicate rows naturally end up next to each other. af2 and af3 are the criteria.

 

2019-03-05_9-23-10.png

 

Here's the VBA code that loops through and deletes duplicate rows (please no comments Smiley Wink)

Essentially loops through the sheet, when duplicates are found it checks af2 (fund), and in some circumstances af3 (orgn), then deletes the unwanted row.

 

rowNum = 2

Do While rowNum < lastRowNum
    startTCLineItem = thisSheet.Range("O" & rowNum)
    startFund = thisSheet.Range("CT" & rowNum)
    startOrgn = thisSheet.Range("CU" & rowNum)
    
    rowNum = rowNum + 1
    thisTCLineItem = thisSheet.Range("O" & rowNum)
    ' // check if TCLineItems match. When they stop matching while loop returns to outer loop and thisTCLineItem becomes the new startTCLineItem.
    Do While startTCLineItem = thisTCLineItem
        thisFund = thisSheet.Range("CT" & rowNum)
        thisOrgn = thisSheet.Range("CU" & rowNum)
        
        ' // Keep fund 16700 and remove the duplicate row,
        ' //    or if fund is 30330 then keep orgn 5770 and remove the duplicate row.
        If startFund = "16700" Then
            thisSheet.Rows(rowNum).Delete
            rowNum = rowNum - 1
            lastRowNum = lastRowNum - 1
            
        ElseIf thisFund = "16700" Then
            z = rowNum - 1
            thisSheet.Rows(z).Delete
            rowNum = rowNum - 1
            lastRowNum = lastRowNum - 1
            
        ElseIf startFund = "30330" Then
            
            If startOrgn = "5770" Then
                thisSheet.Rows(rowNum).Delete
                rowNum = rowNum - 1
                lastRowNum = lastRowNum - 1
                
            ElseIf thisOrgn = "5770" Then
                z = rowNum - 1
                thisSheet.Rows(z).Delete
                rowNum = rowNum - 1
                lastRowNum = lastRowNum - 1
                
            End If
                        
        End If
                
        rowNum = rowNum + 1
        thisTCLineItem = thisSheet.Range("O" & rowNum)
    Loop
       
Loop

If somebody could point me in the right direction regarding how to accomplish the same in M, that would be great! I would love to do this better and with less human interaction. Power Query and M seem the way to go.

 

Thanks Smiley Happy

James

5 REPLIES 5
v-danhe-msft
Employee
Employee

Hi @dudeyates ,

You could refer to Remove Duplicates function in query editor:

1.PNG

Result:

1.PNG

You could also download the pbix file to have a view.

 

Regards,

Daniel He

Community Support Team _ Daniel He
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

@v-danhe-msft Thanks for taking the time took look at this. I think that works because coincidentally the sort puts af2 16700 above the duplicate to be removed. But what if the first duplicate had af2 of 12345, and the af2 of 16700 I want to keep ends up being the 2nd duplicate, does the af2 16700 get removed? 

 

For the vast majority of my data, the straight forward Remove Duplicates works. But there are instances when the line I want to keep is the 2nd of the duplicates, and I actually want to remove the first duplicate.  And maybe I'm just missing something in your example (I have downloaded and viewed)

 

Thanks,

James

Aron_Moore
Solution Specialist
Solution Specialist

In query editor; create the criteria column, sort, and then Remove Rows/Delete Duplicates. Like I've done here:

 

Capture.PNG

Thanks for the response. I'm not sure of the steps to take to get to the Asset Key you created. How does Removing Duplicates know that I want to keep tc_and_lineitem: "1083 1" with a af2 = "16700", but remove tc_and_lineitem: "1083 1" with a different af2?

 

Thanks, 

James

My column was a simple concatenate basically = [Column1]&[Column3]&[Column5]

 

yours will likely be more complex, don't really know with the data your showing. Instead of Remove Duplicate you may need to create some sort of key that you could then filter duplicate rows out that way.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.