Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi,
Is there a way to pick which row gets removed when you remove duplicates? I have a list of opportunities, some of which are duplicates. They can have a state of Active or Inactive. I want to remove all duplicate opportunities, but if any of the duplicates have a state of Active then I want to keep that one. It doesn't matter if there are multiple opportunities with state Active, just keep one of those if there is one. If there are duplicates and all the duplicates are Inactive, then any of those records that are kept are fine.
Does that make sense? Basically I want to say look at the column of opportunity IDs, if you see duplicates, check the state and if one state is Active and another is Inactive, keep the Active one.
If I knew how it decides which one it keeps when it removes duplicates, that would help.
Thanks
Hey, did you ever get a solution to this? because I have the exact same problem statement
Hello, I have the same need. Could you please tell us how to do it?
The concept is simple: remove duplicates based on a condition (value from another column).
Please, it shouldn't be so complex. Could someone help us?
Thanks a lot. Best regards
hi @Anonymous
I would suggest you do a filter before "Remove duplicates" step.
Regards,
Lin
@Anonymous , You can use remove duplicate in Edit query/data transformation
https://radacad.com/remove-duplicate-doesnt-work-in-power-query-for-power-bi-here-is-the-solution
Another option is that you create a new table using summarize like
summarize(Table,Table[ID],"Status",max(Table[Status]), "Value",min(Table[Value]))
Add remove columns as per need.
Hi @amitchandak
That article doesn't mention anything about removing select duplicates based on values in another column.
Summarize wouldn't work, I need to do this in Power Query because there are a lot more data transformations to do afterwards.
Could you not do it in two steps - before you remove duplicates, make some sort of calculated column that counts the instances of that row's value in the column you're doing the dupe check on, then make a conditional column that sees if the new column's more than 1 and your status is inactive, and use that to delete your inactive dupes first? Not particularly efficient, but it should work
edit - probably need to add in some condition to consider the situation where there's two rows and they're both inactive, but the general principle is the same
User | Count |
---|---|
125 | |
106 | |
99 | |
63 | |
62 |
User | Count |
---|---|
135 | |
116 | |
101 | |
71 | |
61 |