Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi, I'm a bit new to PowerQuery. I created an excel sheet that pulls in data from two other sheets. I want to combine them and remove ALL duplicate rows based on an ID# column. That and the original rows that were duplicated. The "remove duplicates" function leaves a set of the original data. I want it to leave only rows (ID#s) that had NO duplication. e.g. sheet 1 has 300,000 rows, sheet two has the same 300,000 rows PLUS 50,000 new ones. All have a unique ID#. I want to combine the sheets and load only the 50,000 new items.
Thanks for any help!
Solved! Go to Solution.
Hi @Anonymous
Let me rephrase what i understood, and then provide the solution.
You have two data sets A and B. A has 300000 rows, B also has 300000 rows and a new set of data i.e. 50000 rows. You would like to filter the newly added records from table B.
If my understanding is correct then below is the solution.
Import both A and B in separate queries. Now that you would like to see the remaining rows in Table B or new records in B, please select Query for table B and then follow the steps:
Select B Query --> Home --> Merge Queries / Merge Queries as New (if you want the data in a new table then select the second option) --> Select table A from the dropdown list --> Highlight the Unique ID# column in both the tables --> in the JOIN KIND select LEFT ANTI --> Click OK.
You're sorted.
If you think this is the solution then please do mark it as a solution
Pradip Pardeshi
Microsoft Certified Trainer
Hi @Anonymous
Let me rephrase what i understood, and then provide the solution.
You have two data sets A and B. A has 300000 rows, B also has 300000 rows and a new set of data i.e. 50000 rows. You would like to filter the newly added records from table B.
If my understanding is correct then below is the solution.
Import both A and B in separate queries. Now that you would like to see the remaining rows in Table B or new records in B, please select Query for table B and then follow the steps:
Select B Query --> Home --> Merge Queries / Merge Queries as New (if you want the data in a new table then select the second option) --> Select table A from the dropdown list --> Highlight the Unique ID# column in both the tables --> in the JOIN KIND select LEFT ANTI --> Click OK.
You're sorted.
If you think this is the solution then please do mark it as a solution
Pradip Pardeshi
Microsoft Certified Trainer
This was great. How do I get A&B not to load? I only want C (the new merged query) to be visible in the workbook.
For the dataset you don't want to load into your Power BI model, from Power Query you can -
>right-click the dataset
>uncheck 'Enable load'
>then close & apply
Hi @PradipMCT
I believe the connection query will still be valid, however, the dataset will not be included in the report.
If "enable load" is unchecked, the dataset will be included in subsequent refreshes but it won't be loaded into the report. Only the merged query will be loaded. The helps to improve performance.
@Anonymous
When you import table A and Table B, before merging go to HOME --> Close & Load To --> on the new screen you will find a radio button that says "Connection only", select that --> if you don't want to add the table to data model then uncheck the box --> Add to data model --> OK
This will create a connection only query.
Again go back to Power Query Editor --> Merge Query as New --> This creates a new merged table C --> Close & Load To --> Select Table --> Uncheck Add to Data model (If needed) --> Ok.
Hope this helps. If this is the solution please do mark this as solution 🙂
Pradip Pardeshi
Microsoft Certified Trainer
Hi @Anonymous
If two tables have the same columns and column names,
First, "append queries", to append two tables,
Then select "Id" column, then select "remove duplicates"
Best Regards
Maggie
Community Support Team _ Maggie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
@Anonymous You can try connecting to both sheets separately using Power Query. Then apply 'remove duplicates' on ID column for each dataset or sheet. Finally, run the 'append' operation to join both sheets as a new dataset according to the defined column names.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
101 | |
49 | |
19 | |
12 | |
11 |