cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
blytonpereira Regular Visitor
Regular Visitor

Alternative to Merge very large tables

I have 2 tables A and B each with about 500,000 rows of data.

 

I would like to merge the two Tables A and B based on the column called ID.

Since each of the datasets have duplicate values, I created a Bridge table that contains unique values of ID from both Table A and B and appended them.

 

I then connect this Bridge table to the Table A and to Table B via a one to many filter.

However in my Table A there are some ID that are not present in Table B and ....there are some ID present in Table B and not in Table A.

Hence when I drag the ID column from my Bridge Table to my workspace in a Table Visualization, there is usually a row that is blank as its related to these missing ID from either Table A or B.

 

I tried to merge the two tables A and B to only keep ID that are present in both Tables, however this is painfully slow. Can anyone suggest an alternative ?

I would like to somehow only keep ID in Table A and B that are present in both table and filter out all other ones.

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
tex628 New Contributor
New Contributor

Re: Alternative to Merge very large tables

Alright, then you do the following:

* Make a duplicate of your Table 1

* Remove all columns apart from your ID column

* Remove duplicates from that column

* Merge the column with the ID column in Table 2.

* Remove any rows that are blank due to the merge not finding a match. 

* Repeat this but merge Table 2 into Table 1 instead.

This should result in you having only ID's that exist in both of the tables.

Hope it works!

View solution in original post

7 REPLIES 7
tex628 New Contributor
New Contributor

Re: Alternative to Merge very large tables

You need to create a unique key in either table A and B which you can use for the merge. If this isn't possible you can place a report filter on both table A and B to remove blank rows.

blytonpereira Regular Visitor
Regular Visitor

Re: Alternative to Merge very large tables

@tex628When you say create a unique key, I am not clear - do you mean like a key such as that each row in the dataset is unique and can be identified and then do the same for the Table B ....and then perform the merge on the Key ?

tex628 New Contributor
New Contributor

Re: Alternative to Merge very large tables

Yes, a key that becomes unique in atleast one of the tables. 


Example:

DocumentID                   ProductID                        Key
HTYD2457                       Screw-013                      HTYD2457Screw-013


You can see if the column is distinct by comparing rows with distinct rows in the table view:

image.pngimage.pngBottom left
You of course need to have identical keys in both tables for this to work. If all rows are distinct you have a unique column.

blytonpereira Regular Visitor
Regular Visitor

Re: Alternative to Merge very large tables

@tex628ok thank you unfortunately this is not possible in my tables because of the way the data is present.

 

I need to find another solution to somehow only keep the rows in each of the tables when the ID is present in both of the tables. I tried also creating a query of all of the unique ID values from Table A and then another query of the unique ID values from Table B. I then do a merge back to my original tables to delete the missing IDS, but this is extremely slow since the dataset is very big 500K rows

tex628 New Contributor
New Contributor

Re: Alternative to Merge very large tables

Alright, then you do the following:

* Make a duplicate of your Table 1

* Remove all columns apart from your ID column

* Remove duplicates from that column

* Merge the column with the ID column in Table 2.

* Remove any rows that are blank due to the merge not finding a match. 

* Repeat this but merge Table 2 into Table 1 instead.

This should result in you having only ID's that exist in both of the tables.

Hope it works!

View solution in original post

blytonpereira Regular Visitor
Regular Visitor

Re: Alternative to Merge very large tables

Instead of duplication is there any difference of taking only the requiored column and ADD AS NEW QUERY....or also REFERENCE...

 

Is there any performance differences ? In relating the original tables will be replaced with new data so I think the Duplicate query will need to be rerun manually again when this does happen ?

tex628 New Contributor
New Contributor

Re: Alternative to Merge very large tables

If you duplicate the correct query any changes to the original query will also affect the duplicate so it should work, but i think your solution is accually better performance wise. Add the column as a new query instead!

Helpful resources

Announcements
Can You Solve These Challenge

Challenge: Can You Solve These?

Find out how to participate in the first Power BI 'Can You Solve These?' challenge.

Community News & Announcements

Community News & Announcements

Get your latest community news and announcements.

Virtual Launch Event

Microsoft Business Applications October Virtual Launch Event

Join us for an in-depth look at the new innovations across Dynamics 365 and the Microsoft Power Platform.

Community Kudopalooza

Win Power BI Swag with Community Kudopalooza!

Each week, complete activities and be qualified in the drawing for cool Power BI Swag.

Users Online
Currently online: 382 members 3,156 guests
Please welcome our newest community members: