cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
TeeTreeThree
Helper II
Helper II

"Remove Duplicate" doesn't remove all duplicate

Dear all,

 

I have a table with just one column, I tried to remove the duplicate in the column via power query. However, once I loaded to the dashboard I use count and count(distinct) both give me different number as the same number is expected.

 

 

Best regards,

Eric

15 REPLIES 15

Hey @TeeTreeThree@

 

I wrote a post already in February about the different understanding of duplicates of Power Query and Power Pivot here, but @ImkeFs idea of using the Comparer.OrdialIgnoreCase porperty is great and simple. 

 

 

Thanks a lot 🙂

ImkeF
Super User
Super User

This will happen when the terms have different case profiles. Pls check out if this article helps: http://www.thebiccountant.com/2015/08/17/create-a-dimension-table-with-power-query-avoid-the-bug/

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Well, if your table just consists of one column, you can actually use this formula:

 

Table.ExpandListColumn(#table({"ColumnName"}, {{List.Distinct(Source[ColumnName], Comparer.OrdinalIgnoreCase)}}), "ColumnName")

 

It's a bit of a bugger, because the only way I found to use Comparer.OrdinalIgnoreCase (which will ignore case sensitivity) was to use it in list. So if anyone has an idea how to make this a bit smarter, you're more than welcome 🙂

 

http://www.thebiccountant.com/2016/10/27/tame-case-sensitivity-power-query-powerbi/

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

So if you want a distinct of all columns in the table, it's pretty easy:

 

Table.Distinct(Table, Comparer.OrdinalIgnoreCase)

 

 

Still need to figure out how to handle column-selection in it.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

KHorseman
Community Champion
Community Champion

@ImkeF try:

 

Table.Distinct(Table, {"ColumnName", Comparer.OrdinalIgnoreCase})




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@KHorseman

Totally awe!! Thank you so much!

 

& so it looks with multiple columns:

 

= Table.Distinct(Table.FromRecords({[A="one", B=1, C=2], [A="ONe", B=1, C=3]}), {{"A", Comparer.OrdinalIgnoreCase}, {"B", Comparer.OrdinalIgnoreCase}} )

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

KHorseman
Community Champion
Community Champion

@ImkeF that's cool, so you could potentially mix-and-match case sensitivity? Like Column1 ignores case, Column2 doesn't? I didn't test far enough to try anything like that. I just noticed that the second argument in Table.Distinct is a list by default if you let the query editor generate the code for you, so I tried adding Comparer.OrdinalIgnoreCase to the list.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@KHorseman Haven't even thought of that, but computer says "yes"  🙂

 

Table.Distinct(Table.FromRecords({[A="one", B=1, C=2], [A="ONe", B=1, C=3]}), {{"A", Comparer.Ordinal}, {"B", Comparer.OrdinalIgnoreCase}} )

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

KHorseman
Community Champion
Community Champion

@ImkeF nice. Thanks for sharing this. I never would have even noticed this comparer function otherwise.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Greg_Deckler
Super User
Super User

How many rows do you have? I have seen one other user reporting this and that user had millions of rows. I would report this as an Issue. https://ideas.powerbi.com/forums/360879-issues

 

Any chance you can post a link to the data so that this issue can be recreated?


@ me in replies or I'll lose your thread!!!
Check out my External Tool for Power BI Desktop!
Microsoft Hates Greg's Quick Measures
Check out my latest book!

Hi smoupre,

 

Yes I have millions of row in the database. My apology I cannot post the data.

 

I have posted this issue in the link you mention. Hopfully they come out with something more convenient.

 

@KHorseman and @ImkeF my data is not case sensitive. Yet this happen. I'd tried your code just in case but the results are the same.

 

 

@TeeTreeThree your data source may not be case sensitive, but if the columns in question contain letters then Power BI will be case sensitive about them. But I do also like that non-printable character idea @ImkeF.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@TeeTreeThree another thing you can try is to trim & clean before the remove-duplicates-step. Maybe there are some issues with non-printable characters or sth similar:

 

PBI_TrimClean.png

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

@ImkeF and @KHorseman my apology for the late reply.

 

I tried @ImkeF method it still doesnt work however I tried the "grouped by" function in "transform" tab and it works.

 

Just extra step.

Anonymous
Not applicable

Problem is, that Power BI has two different ways of handling data in two different situations.

 

1. Remove duplicates in Query Editor - it IS case sensitive, eg. "EMPLOYER" and "employer" are two different strings (are not duplicates)

2. Building a relation - it IS NOT case sensitive, eg. "EMPLOYER" and "employer" are the same strings (are duplicates), therefore I can't build a relation

 

Microsoft, please, fix this "feature" it is really annoying. Work with data one way in the application.

Helpful resources

Announcements
PBI_User Group Leader_768x460.jpg

Manage your user group events

Check out the News & Announcements to learn more.

Welcome Super Users.jpg

Super User Season 2

Congratulations, the new Super User Season 2 for 2021 has started!

Community Connections 768x460.jpg

Community & How To Videos

Check out the new Power Platform Community Connections gallery!