Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Is there a way to Remove or Ignore duplicate values when they occur in a column?
I'm pulling data from our SQL Server. I have 2 related tables. Every once in a while a duplicate value will occur in my One to Many relationship which causes failure. I'm looking for a way to remove/ignore a duplicate value so I can avoid this. Thanks.
Solved! Go to Solution.
There is a Remove Duplicates step when editing your query, it is in the Reduce Rows section of the ribbon, the M code is:
#"Removed Duplicates" = Table.Distinct(#"[Previous Step]", {"[Column Name]"})
I am having the same issue where I am pulling information from a SQL table but can not force the query to remove duplicates based on newest or most recent. I am attempting to remove duplicates out of an account number column, and wanting to keep only the most recent record for each account number.
I connect to the SQL table.
Edit Queires,
Sort the Table by Entry Date
Then remove duplicates from Account Number
When i check the output, I am still seeing records that are not the most recent for specific account numbers.
I was told it is because itr was not sorted at the source. I do not have access to the source to change the sort order.
What can I add in the language to force the sort in the imported data?
let
Source = Sql.Databases("123esco.cloudapp.net"),
123_Esco = Source{[Name="123_Esco"]}[Data],
dbo_Meters = CLOUDDATA{[Schema="dbo",Item="Meters"]}[Data],
#"Sorted Rows" = Table.Sort(dbo_Meters,{{"EntryDate", Order.Descending}}),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"UtilityAccountNumber"})
in
#"Removed Duplicates"
@Anonymous, I ran into this issue in Power Query as well. I followed the problem in this article by Ken Puls, and it fixed the problem on the first shot.
Thanks I will take a look
@Anonymous If you have access to query the table, you could write a query to only retrieve the latest records. Here are a number of solutions that can be used as a guide for you if you aren't very familiar with SQL. You can write the SQL Query in the Advanced Options when you are connecting to the database.
There is a Remove Duplicates step when editing your query, it is in the Reduce Rows section of the ribbon, the M code is:
#"Removed Duplicates" = Table.Distinct(#"[Previous Step]", {"[Column Name]"})
Using Table.Distinct, which rows are kept (not removed)?
For instance, if I try to remove duplicates based on a "user_id" column, the first row with each user_id will be kept, and the following rows with that user_id will be removed? (wondering if I need to sort before to make sure I keep the rows I want to keep)
I have done this on a table but then when I try to join to another table on that field I get the error that I can't create the relationship as one of the columns must have unique values. Isn't applying that step getting me to where it's unique or am I missing a step?
Thanks everyone. @Greg_Deckler I think I'm going to go with this solution for the moment as it worked right away. My question is will it stick? For example is it a one time thing or will it remove duplicates every time the SQL data is refreshed?
Yes, query steps stick. When you run a query, it is really just executing a series of steps. You can see the steps recorded on the righthand side. of the query editing window.
Handling this in the source is great, but not always possible. As an end-user focused tool, it is not reasonable to expect that users will have access to the backend to make changes, write stored procedures, expert knowledge to write SQL, etc.
@jjmauras Best practice would be to handle this at the source. Per @Seth_C_Bauer solution.
@jjmauras You can handle this in SQL with a CTE or subquery using the row_number() function. This will handle everything automatically.
I'm not an expert, but i've done in these cases is to create a third table with the
distinct values, and use it as a bridge between both tables.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
112 | |
99 | |
82 | |
70 | |
63 |
User | Count |
---|---|
145 | |
111 | |
104 | |
84 | |
64 |