Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
I have 2 tables A and B each with about 500,000 rows of data.
I would like to merge the two Tables A and B based on the column called ID.
Since each of the datasets have duplicate values, I created a Bridge table that contains unique values of ID from both Table A and B and appended them.
I then connect this Bridge table to the Table A and to Table B via a one to many filter.
However in my Table A there are some ID that are not present in Table B and ....there are some ID present in Table B and not in Table A.
Hence when I drag the ID column from my Bridge Table to my workspace in a Table Visualization, there is usually a row that is blank as its related to these missing ID from either Table A or B.
I tried to merge the two tables A and B to only keep ID that are present in both Tables, however this is painfully slow. Can anyone suggest an alternative ?
I would like to somehow only keep ID in Table A and B that are present in both table and filter out all other ones.
Thanks
Solved! Go to Solution.
Alright, then you do the following:
* Make a duplicate of your Table 1
* Remove all columns apart from your ID column
* Remove duplicates from that column
* Merge the column with the ID column in Table 2.
* Remove any rows that are blank due to the merge not finding a match.
* Repeat this but merge Table 2 into Table 1 instead.
This should result in you having only ID's that exist in both of the tables.
Hope it works!
You need to create a unique key in either table A and B which you can use for the merge. If this isn't possible you can place a report filter on both table A and B to remove blank rows.
@tex628When you say create a unique key, I am not clear - do you mean like a key such as that each row in the dataset is unique and can be identified and then do the same for the Table B ....and then perform the merge on the Key ?
Yes, a key that becomes unique in atleast one of the tables.
Example:
DocumentID ProductID Key
HTYD2457 Screw-013 HTYD2457Screw-013
You can see if the column is distinct by comparing rows with distinct rows in the table view:
You of course need to have identical keys in both tables for this to work. If all rows are distinct you have a unique column.
@tex628ok thank you unfortunately this is not possible in my tables because of the way the data is present.
I need to find another solution to somehow only keep the rows in each of the tables when the ID is present in both of the tables. I tried also creating a query of all of the unique ID values from Table A and then another query of the unique ID values from Table B. I then do a merge back to my original tables to delete the missing IDS, but this is extremely slow since the dataset is very big 500K rows
Alright, then you do the following:
* Make a duplicate of your Table 1
* Remove all columns apart from your ID column
* Remove duplicates from that column
* Merge the column with the ID column in Table 2.
* Remove any rows that are blank due to the merge not finding a match.
* Repeat this but merge Table 2 into Table 1 instead.
This should result in you having only ID's that exist in both of the tables.
Hope it works!
Instead of duplication is there any difference of taking only the requiored column and ADD AS NEW QUERY....or also REFERENCE...
Is there any performance differences ? In relating the original tables will be replaced with new data so I think the Duplicate query will need to be rerun manually again when this does happen ?
If you duplicate the correct query any changes to the original query will also affect the duplicate so it should work, but i think your solution is accually better performance wise. Add the column as a new query instead!
User | Count |
---|---|
124 | |
106 | |
99 | |
63 | |
59 |
User | Count |
---|---|
135 | |
116 | |
101 | |
71 | |
61 |