cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
CodeRed2004 Frequent Visitor
Frequent Visitor

Remove ALL Duplicates Entirerly

Hi, I'm a bit new to PowerQuery. I created an excel sheet that pulls in data from two other sheets. I want to combine them and remove ALL duplicate rows based on an ID# column. That and the original rows that were duplicated. The "remove duplicates" function leaves a set of the original data. I want it to leave only rows (ID#s) that had NO duplication. e.g. sheet 1 has 300,000 rows, sheet two has the same 300,000 rows PLUS 50,000 new ones. All have a unique ID#. I want to combine the sheets and load only the 50,000 new items.

 

Thanks for any help!

1 ACCEPTED SOLUTION

Accepted Solutions
PradipMCT Regular Visitor
Regular Visitor

Re: Remove ALL Duplicates Entirerly

Hi @CodeRed2004 

 

Let me rephrase what i understood, and then provide the solution. 

You have two data sets A and B. A has 300000 rows, B also has 300000 rows and a new set of data i.e. 50000 rows. You would like to filter the newly added records from table B. 

 

If my understanding is correct then below is the solution.

 

Import both A and B in separate queries. Now that you would like to see the remaining rows in Table B or new records in B, please select Query for table B and then follow the steps:

 

Select B Query --> Home --> Merge Queries / Merge Queries as New (if you want the data in a new table then select the second option) --> Select table A from the dropdown list --> Highlight the Unique ID# column in both the tables --> in the JOIN KIND select LEFT ANTI --> Click OK. 

 

You're sorted.

 

If you think this is the solution then please do mark it as a solution

 

Pradip Pardeshi

Microsoft Certified Trainer

Pradip's Youtube Channel

View solution in original post

8 REPLIES 8
Servus Frequent Visitor
Frequent Visitor

Re: Remove ALL Duplicates Entirerly

@CodeRed2004 You can try connecting to both sheets separately using Power Query. Then apply 'remove duplicates' on ID column for each dataset or sheet. Finally, run the 'append' operation to join both sheets as a new dataset according to the defined column names.

Community Support Team
Community Support Team

Re: Remove ALL Duplicates Entirerly

Hi @CodeRed2004 

If two tables have the same columns and column names, 

First, "append queries", to append two tables,

Then select "Id" column, then select "remove duplicates"

 

Best Regards
Maggie

 

Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

PradipMCT Regular Visitor
Regular Visitor

Re: Remove ALL Duplicates Entirerly

Hi @CodeRed2004 

 

Let me rephrase what i understood, and then provide the solution. 

You have two data sets A and B. A has 300000 rows, B also has 300000 rows and a new set of data i.e. 50000 rows. You would like to filter the newly added records from table B. 

 

If my understanding is correct then below is the solution.

 

Import both A and B in separate queries. Now that you would like to see the remaining rows in Table B or new records in B, please select Query for table B and then follow the steps:

 

Select B Query --> Home --> Merge Queries / Merge Queries as New (if you want the data in a new table then select the second option) --> Select table A from the dropdown list --> Highlight the Unique ID# column in both the tables --> in the JOIN KIND select LEFT ANTI --> Click OK. 

 

You're sorted.

 

If you think this is the solution then please do mark it as a solution

 

Pradip Pardeshi

Microsoft Certified Trainer

Pradip's Youtube Channel

View solution in original post

CodeRed2004 Frequent Visitor
Frequent Visitor

Re: Remove ALL Duplicates Entirerly

This was great. How do I get A&B not to load? I only want C (the new merged query) to be visible in the workbook. 

PradipMCT Regular Visitor
Regular Visitor

Re: Remove ALL Duplicates Entirerly

@CodeRed2004 

 

When you import table A and Table B, before merging go to HOME --> Close & Load To --> on the new screen you will find a radio button that says "Connection only", select that --> if you don't want to add the table to data model then uncheck the box --> Add to data model --> OK

 

This will create a connection only query. 

 

Again go back to Power Query Editor --> Merge Query as New --> This creates a new merged table C --> Close & Load To --> Select Table --> Uncheck Add to Data model (If needed) --> Ok.

 

Hope this helps. If this is the solution please do mark this as solution 🙂

 

Pradip Pardeshi

Microsoft Certified Trainer

Pradip's Youtube Channel

Servus Frequent Visitor
Frequent Visitor

Re: Remove ALL Duplicates Entirerly

For the dataset you don't want to load into your Power BI model, from Power Query you can -

>right-click the dataset

>uncheck 'Enable load'

>then close & apply

PradipMCT Regular Visitor
Regular Visitor

Re: Remove ALL Duplicates Entirerly

Hi @Servus 

 

This will make the query non-functional.

 

Regards,

Pradip Pardeshi

Pradip's Youtube Channel

Servus Frequent Visitor
Frequent Visitor

Re: Remove ALL Duplicates Entirerly

Hi @PradipMCT 

 

I believe the connection query will still be valid, however, the dataset will not be included in the report. 

 

If "enable load" is unchecked, the dataset will be included in subsequent refreshes but it won't be loaded into the report. Only the merged query will be loaded. The helps to improve performance. 

Helpful resources

Announcements
Exclusive LIVE Community Event #3 – Ask Arun Anything

Exclusive LIVE Community Event #3 – Ask Arun Anything

Join us in the third Triple A event!

Meet the 2020 Season 1 Power BI Super Users!

Meet the 2020 Season 1 Power BI Super Users!

It’s the start of a new Super User season! Learn all about the new Super Users and brand-new tiered recognition system.

Super User Challenge: Can You Solve These?

Super User Challenge: Can You Solve These?

We're celebrating the start of the New Super User season with our first ever Super User 'Can You Solve These?' challenge.

Power BI Desktop Update - February 2020

Power BI Desktop Update - February 2020

We are super excited for our update this month, as we are releasing two of our top community requests!

Power Platform Online Conference

Power Platform Online Conference

Join us for the first ever Power Platform Online Conference!

Top Solution Authors
Top Kudoed Authors