Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
CTan42
Helper II
Helper II

Remove Duplicate Datasets

I have 1 table which contains 4 columns - ID, Date, Name, Item Description, Cash Amount. ID is not a unique key. There are some repeat things in the table. Is there anyways to remove the duplicates? Thanks.

 

Regards,

Chung.

1 ACCEPTED SOLUTION
dilumd
Solution Supplier
Solution Supplier

Please check below,

 

Capture.JPG

View solution in original post

7 REPLIES 7
TheOckieMofo
Resolver II
Resolver II

The two way to remove duplicates would be:

 

1. There is a "remove duplicates" operation in the user interface of the "Edit Queries" Section (this is Power Query). Under the Transform tab, there is a "Remove Rows" icon with a small downward triangle next to it. Click that triangle to bring up a list. On that list is an option to remove duplicates. Also, you can Right Click on a column header and that displays a list that also gives you the option to remove duplicates.

 

2. Depending on the shape of your data, duplicates may not be a bad thing. For example, if your "ID" column is, say, an employee ID and your table is showing transactions in a day, the employee ID could be duplicated intentionally if your database is set up for creating a new record for each transaction. Anyway, the point is that you only want to use Option 1 if the duplicate entries are truly duplicates, meaning they provide no extra information. If, on the other hand, the entries are not "true duplicates" then you will want to use the "Group By" functionality. If you have any Microsoft Access (or SQL) experience, this is basically created an aggregate, or totals, query. So this will allow you to group all the duplicate instances of the ID column yet still retain the important details in aggregate. This would mean you would essentially sum up all the cash amount columns for each ID and create 1 row for each ID.

 

I hope that makes sense. Good luck. 

dilumd
Solution Supplier
Solution Supplier

Hi,

 

In the power query right click on the column and click on "Remove Duplicates".

@dilumd @TheOckieMofo

thanks for the help. 

What i am looking for is i wanna remove those repeat rows which are having same ID, name , date, item description, cash amount. It is wrong when you just right click ID column and remove duplicates. Because in my data, you could have same ID but different name and others details. 

@TheOckieMofo@dilumd

for examples:

 

ID          Name          Date          Item Description          Cash Amount

1           A                  1/1            Apple                            $5

1           A                  1/1            Apple                            $5

1           A                  1/2            Apple                            $5

 

Above is the sample of my dataset. So i wanna make sure i remove those repeat rows (for e.g the 1st and 2nd row) which are having exact same ID, Name, Date, Item description and Cash Amount.  I had tried to concenate those columns however it doesnt work as the data type are different (some are text, some are number). 

 

 

The Group by solution will work for you. You can group by all 4 columns and this will essentially remove the duplicates.

dilumd
Solution Supplier
Solution Supplier

Please check below,

 

Capture.JPG

how to remove duplicates from 2 different data sources

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.