Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Remove ALL Duplicates Entirerly

Hi, I'm a bit new to PowerQuery. I created an excel sheet that pulls in data from two other sheets. I want to combine them and remove ALL duplicate rows based on an ID# column. That and the original rows that were duplicated. The "remove duplicates" function leaves a set of the original data. I want it to leave only rows (ID#s) that had NO duplication. e.g. sheet 1 has 300,000 rows, sheet two has the same 300,000 rows PLUS 50,000 new ones. All have a unique ID#. I want to combine the sheets and load only the 50,000 new items.

 

Thanks for any help!

1 ACCEPTED SOLUTION
PradipMCT
Resolver II
Resolver II

Hi @Anonymous 

 

Let me rephrase what i understood, and then provide the solution. 

You have two data sets A and B. A has 300000 rows, B also has 300000 rows and a new set of data i.e. 50000 rows. You would like to filter the newly added records from table B. 

 

If my understanding is correct then below is the solution.

 

Import both A and B in separate queries. Now that you would like to see the remaining rows in Table B or new records in B, please select Query for table B and then follow the steps:

 

Select B Query --> Home --> Merge Queries / Merge Queries as New (if you want the data in a new table then select the second option) --> Select table A from the dropdown list --> Highlight the Unique ID# column in both the tables --> in the JOIN KIND select LEFT ANTI --> Click OK. 

 

You're sorted.

 

If you think this is the solution then please do mark it as a solution

 

Pradip Pardeshi

Microsoft Certified Trainer

Pradip's Youtube Channel

View solution in original post

8 REPLIES 8
PradipMCT
Resolver II
Resolver II

Hi @Anonymous 

 

Let me rephrase what i understood, and then provide the solution. 

You have two data sets A and B. A has 300000 rows, B also has 300000 rows and a new set of data i.e. 50000 rows. You would like to filter the newly added records from table B. 

 

If my understanding is correct then below is the solution.

 

Import both A and B in separate queries. Now that you would like to see the remaining rows in Table B or new records in B, please select Query for table B and then follow the steps:

 

Select B Query --> Home --> Merge Queries / Merge Queries as New (if you want the data in a new table then select the second option) --> Select table A from the dropdown list --> Highlight the Unique ID# column in both the tables --> in the JOIN KIND select LEFT ANTI --> Click OK. 

 

You're sorted.

 

If you think this is the solution then please do mark it as a solution

 

Pradip Pardeshi

Microsoft Certified Trainer

Pradip's Youtube Channel

Anonymous
Not applicable

This was great. How do I get A&B not to load? I only want C (the new merged query) to be visible in the workbook. 

For the dataset you don't want to load into your Power BI model, from Power Query you can -

>right-click the dataset

>uncheck 'Enable load'

>then close & apply

Hi @Servus 

 

This will make the query non-functional.

 

Regards,

Pradip Pardeshi

Pradip's Youtube Channel

Hi @PradipMCT 

 

I believe the connection query will still be valid, however, the dataset will not be included in the report. 

 

If "enable load" is unchecked, the dataset will be included in subsequent refreshes but it won't be loaded into the report. Only the merged query will be loaded. The helps to improve performance. 

@Anonymous 

 

When you import table A and Table B, before merging go to HOME --> Close & Load To --> on the new screen you will find a radio button that says "Connection only", select that --> if you don't want to add the table to data model then uncheck the box --> Add to data model --> OK

 

This will create a connection only query. 

 

Again go back to Power Query Editor --> Merge Query as New --> This creates a new merged table C --> Close & Load To --> Select Table --> Uncheck Add to Data model (If needed) --> Ok.

 

Hope this helps. If this is the solution please do mark this as solution 🙂

 

Pradip Pardeshi

Microsoft Certified Trainer

Pradip's Youtube Channel

v-juanli-msft
Community Support
Community Support

Hi @Anonymous 

If two tables have the same columns and column names, 

First, "append queries", to append two tables,

Then select "Id" column, then select "remove duplicates"

 

Best Regards
Maggie

 

Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Servus
Frequent Visitor

@Anonymous You can try connecting to both sheets separately using Power Query. Then apply 'remove duplicates' on ID column for each dataset or sheet. Finally, run the 'append' operation to join both sheets as a new dataset according to the defined column names.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors