cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Anonymous
Not applicable

Remove duplicates not working.. Can't create relationship

Hello,

 

I have a dataset with customer and a discount structure for that customer per part. I created a concatenate in the edit query screen of the customer, and there are a couple duplicates. Highlighted the Concatenate column, and clicked remove duplicates.

 

Tried creating a relationship and it didn't work. Created a summary table of the concatenate and COUNTROWS, and lo and behold there are a handful that have a count of two. Why is this happening?

 

Thanks

1 ACCEPTED SOLUTION
Anonymous
Not applicable

@v-sihou-msft

 

Yes, I double and triple checked. Everything was done the same way as in my example. Must be an error due to such a large data set.

View solution in original post

21 REPLIES 21
Frequent Visitor

For those who require a different solution, try using an R script to remove the duplicates.

 

I used the following script to solve the same problem:

 

output<-dataset[!duplicated(dataset$myColumn),]

Advocate II
Advocate II

Another thing to look for - 

 

if you have "null" and blank values, it seems to treat them the same when you "remove duplicates".. but then it considers them not unique in the relationship view...

 

 

what i did that worked, was to click "remove duplicates"  - then inthe "Home" tab in the ribbon, click on the drop-down in "Remove rows", and click on "Remove Blank Rows".

 

after removing those blank rows, everything worked great for me!

 

 

Regular Visitor

Hello,

 

If you are removing duplicates from a text column (costumer) verify if you dont have upper and lower case letters on the same field.


I.e: "Customer1" and "customer1". If you remove duplicates, power query will consider them as unique references and wont remove any of the them. But while creating relationships you will get an error of duplicate values.

 

BR

Regular Visitor

I found a work around to this...

1. Duplicate the query with "Duplicate values"
2. Remove all columns except the one you want to be unique.
3. Group by.
4. Do a "Left Outer" Merge with the original Query

Regular Visitor

I just had the same issue the error having duplicate values on the key I was using to connect two tables.  I had removed duplicate values based on the key column and I was still not able to join the two tables together.

 

It turns out the problem was a null value in the key column.  I did a row filter and removed the null value and the two tables joined together fine.  The error message was misleading.  

Regular Visitor

I just had the same problem with joining two tables after I made sure to remove duplicates on the key column in one of the tables and getting the error message about duplicate values. 

 

The issue actually was a NULL value in a single row in the table in my key column.  Power BI was complaining about duplicates but in reality it was the null value that was preventing it from joining the two tables.  I did a row filter and removed the NULL value and the two tables connected together just fine.

Super User IV
Super User IV

Would have to see some sample data where the problem could be reproduced.


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Anonymous
Not applicable

Sample

 

The same exact steps in the edit query in this situation worked. They do not with my large data set (10 Million lines & about 20 duplicates).

I would report this as an Issue: https://ideas.powerbi.com/forums/360879-issues

 


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Anonymous
Not applicable

Bummer.. Well I just ended up making another table to mirror the first, linked the summary table, and filtered all the ones. it worked, but with this many lines I'd rather have it be more efficient. Thanks for your help @Greg_Deckler

@Anonymous

 

As I tested, the relationship can be created properly between two tables in your sample.

 

66.PNG

Have you get any error when creating relationships? Did you select correct columns?

 

Regards,

Anonymous
Not applicable

@v-sihou-msft

 

Yes, I double and triple checked. Everything was done the same way as in my example. Must be an error due to such a large data set.

View solution in original post

I'm having the same issue but the data set is only around 2000 rows.

 

Note that there seems to be an inconsisencty in how case sensetivity is handled: the Remove Duplicates feature is case sensetive but the uniquemenss check done when created creating realationshiops between queries is not. To work around this I just format the key column in the query to UPPERCASE. But this is an aside, I'm getting the issue even though cases are the same

 

 

Thank you! This solution worked for me to transform the values to UPPERCASE.

Had the same problem. Saw the post about case sensitivity with duplicate removing. Transformed column to all UPPERCASE. Removed duplicates (3rd time)... and then the relationship worked. Make sure everyone reports it so we can get this fixed! 

Thank you so much for mentioning this issue with cases in remove duplicates vs joins! I had been struggling with this all day!

I am getting this issue with the latest version of Desktop as well. My dataset is 4,700 rows of data. Remove Duplicates is applied in the query stage, but when the data is loaded into the data model there are duplicates in the data.

 

I cannot manually search for the duplicate values in the query because it says 'Limit of 1000 values reached'.

@TobiasOwen

 

Create a table and then export data to excel and do a duplicate check there.

I don't need to check duplicates, I need the query to filter them out properly so I can create a many-to-one relationship that doesn't return an error and mess up the scheduled refresh every time another duplicate is entered on the list.

The solution that has worked for me:

 

in the query editor, select your column that you want to be unique and then:

 

1) Click on the "Transform" tab in the ribbon, click on the "Format" button, and then select "UPPERCASE" or "lowercase"

2) make sure the column is still selected, then click on the "Home" Tab, and then click on the "Remove rows" button, then select "Remove Blank Rows"

3) make sure the column is still selected, then click on the "Home" Tab, and then click on the "Remove rows" button, then select "Remove Errors"

4) Finally, click on Remove Duplicates button.

 

 

So far, these steps have always worked for me!

Helpful resources

Announcements
secondImage

Congratulations!

We are excited to announce the Power BI Super Users!

Wave Release 2

Check out the updates in Power BI.

Overview of Power BI 2020 release wave 2!

Microsoft Ignite

Microsoft Ignite

Join digitally, March 2–4, 2021 to explore new tech that's ready to implement. Experience the keynote in mixed reality through AltspaceVR!

secondImage

The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors