cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
jjmauras
Advocate I
Advocate I

Remove or Hide Duplicate Values

Is there a way to Remove or Ignore duplicate values when they occur in a column?

 

I'm pulling data from our SQL Server. I have 2 related tables. Every once in a while a duplicate value will occur in my One to Many relationship which causes failure. I'm looking for a way to remove/ignore a duplicate value so I can avoid this. Thanks.

1 ACCEPTED SOLUTION
Greg_Deckler
Super User IV
Super User IV

There is a Remove Duplicates step when editing your query, it is in the Reduce Rows section of the ribbon, the M code is:

#"Removed Duplicates" = Table.Distinct(#"[Previous Step]", {"[Column Name]"})


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




View solution in original post

12 REPLIES 12
Anonymous
Not applicable

I am having the same issue where I am pulling information from a SQL table but can not force the query to remove duplicates based on newest or most recent. I am attempting to remove duplicates out of an account number column, and wanting to keep only the most recent record for each account number. 

 

I connect to the SQL table. 

Edit Queires,

Sort the Table by Entry Date

Then remove duplicates from Account Number

 

When i check the output, I am still seeing records that are not the most recent for specific account numbers. 

 

I was told it is because itr was not sorted at the source. I do not have access to the source to change the sort order. 

 

What can I add in the language to force the sort in the imported data?

 

let
Source = Sql.Databases("123esco.cloudapp.net"),
123_Esco = Source{[Name="123_Esco"]}[Data],
dbo_Meters = CLOUDDATA{[Schema="dbo",Item="Meters"]}[Data],
#"Sorted Rows" = Table.Sort(dbo_Meters,{{"EntryDate", Order.Descending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"UtilityAccountNumber"})
in
#"Removed Duplicates"

 

@Anonymous, I ran into this issue in Power Query as well. I followed the problem in this article by Ken Puls, and it fixed the problem on the first shot.

Anonymous
Not applicable

Thanks I will take a look

 

@Anonymous If you have access to query the table, you could write a query to only retrieve the latest records. Here are a number of solutions that can be used as a guide for you if you aren't very familiar with SQL. You can write the SQL Query in the Advanced Options when you are connecting to the database.


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG
Greg_Deckler
Super User IV
Super User IV

There is a Remove Duplicates step when editing your query, it is in the Reduce Rows section of the ribbon, the M code is:

#"Removed Duplicates" = Table.Distinct(#"[Previous Step]", {"[Column Name]"})


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




View solution in original post

Using Table.Distinct, which rows are kept (not removed)?

 

For instance, if I try to remove duplicates based on a "user_id" column, the first row with each user_id will be kept, and the following rows with that user_id will be removed? (wondering if I need to sort before to make sure I keep the rows I want to keep)

I have done this on a table but then when I try to join to another table on that field I get the error that I can't create the relationship as one of the columns must have unique values. Isn't applying that step getting me to where it's unique or am I missing a step?

Thanks everyone. @Greg_Deckler I think I'm going to go with this solution for the moment as it worked right away. My question is will it stick? For example is it a one time thing or will it remove duplicates every time the SQL data is refreshed?

Yes, query steps stick. When you run a query, it is really just executing a series of steps. You can see the steps recorded on the righthand side. of the query editing window.

 

Handling this in the source is great, but not always possible. As an end-user focused tool, it is not reasonable to expect that users will have access to the backend to make changes, write stored procedures, expert knowledge to write SQL, etc.


---------------------------------------

@ me in replies or I'll lose your thread!!!

I have a NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




PowerBIGuy
Responsive Resident
Responsive Resident

@jjmauras Best practice would be to handle this at the source. Per @Seth_C_Bauer solution.

Business Intelligence Architect / Consultant

@jjmauras You can handle this in SQL with a CTE or subquery using the row_number() function. This will handle everything automatically.

 


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG
Fabiola_K
Advocate IV
Advocate IV

I'm not an expert, but i've done in these cases is to create a third table with the
distinct values, and use it as a bridge between both tables.

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group!

MBAS on Demand

Microsoft Business Applications Summit sessions

On-demand access to all the great content presented by the product teams and community members! #MSBizAppsSummit #CommunityRocks

MBAS Attendee Badge

Claim Your Badge & Digital Swag!

Check out how to claim yours today!

Top Solution Authors
Top Kudoed Authors