Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
DataGuy2016
Advocate II
Advocate II

Table Merge creates duplicate records

Hi Folks,

 

I am stumped.

 

I am trying to merge two tables into a new table using a LEFT JOIN.  Both tables have unique records on each row.  When I join the tables, BI creates duplicate rows on some records for no apparent reason.  The duplicates are identical in every way.

 

Can anybody explain this?  Is this a bug?

1 ACCEPTED SOLUTION
DataGuy2016
Advocate II
Advocate II

The problem lay with the Data.

 

The LEFT JOIN functionality of PowerBI works perfectly.

View solution in original post

46 REPLIES 46
gpfox
Frequent Visitor

I had same problem. I was joining Table A with Table B on 2 columns, that were numbers but marked as Text type.  I changed the type of these columns I was joining on in both tables to Whole number type. Then the Left Join worked and the duplicates went away!

Anonymous
Not applicable

I had the same problem and found my solution so thought I'd share it, for what its worth.

 

I had a very large table of data and another table of account details that I was merging to my first data set, but was getting duplicate rows.

 

There was a unique key to link between the 2 tables but hadn't created a relationship on the model window. As it turns out it was a many to many relationship and this is what caused the dulpicates. Since the account table was only there to extract additional info to my first data set, in the account table in the Power Query editor window I simply clicked on my key column and selected remove duplicates to force a many to one relationship and this removed the duplication on merge.

I don't really get why the people commenting after my comment are still looking for an answer!!!!

 

my comment was litterally a valid solution, will I actually didn't really understand how merge works back then when I looked for this post, people are still missing merge with the regular vlookup function.

 

vlookup looks only for the first value in a column.

merge brings all the instances of this value no matter how many there are, and it duplicates the value in the first table to match that count, it works this way and there is no problem in this, the logic is different (it simply doesn't know which instance you wish to have so it brings you all of them).

 

all you have to do is to remove duplicates, you have to do it in the second table you're merging with (before you do the merge), or you have to do it the way I described in my last "solution" comment (you may look here in the previous comments).

 

please mark my comment as a solution so people don't keep getting lost everytime they enter this thread.

 

Regards for all of you.

Th steer here spot on - thanks - duplicate records in the right hand linked table caused duplicate rows from left hand  linked table. Unlike Vlookup which only returns first match, the Access query returns a record for each match in the right table. Removing duplicates in right hand table solves problem.

baronraghu
Helper III
Helper III

Hi All,

 

I having similar issue, Below are the tables and the output after using full outer join

 

Table1 

image.png

 

Table 2

image.png

 

Merging query screenshot

image.png

 

Merged table with duplicate entries- actually it copies table 2 for each entry of table 1

image.png

 

Any solution

ksatpati
New Member

Had the same issue, where I was merging queries.

The issue in my instance was that the base query was expanded which resulted in duplicates in the base query.

Once the expansion was removed in the base query the merged query was fine. 

So if data is okay, the issue may be caused by duplicates in the base query.

Anonymous
Not applicable

Can you explain "base query was expanded"?

he meant the other query that you are merging your data with

so if you are in Table A and trying to merge data with Table B, if Table B has duplicates it will result in a duplicates in your Table A after you expand the results, so it's better to have Table B cleaned from duplicates first.

othmanalazzam
Advocate I
Advocate I

Hi people.

 

I've been looking for this issue with no luck as well until I founded a valid work around.

 

the work around goes like this ::

 

-Merge your sheets as you like (you don't have to merge on more than one column, one is fine).

 

-Add index column

 

-Expand your merged column (this will create duplicate values, the main issue we're looking into).

 

-since the values in the "index column" were purely unique, now Remove duplicates using index column.

 

-and now your data is back to original as it was inteded to be.

hopefully you'll reply but when I do this ...it does indeed eliminate the duplicates in the 2nd table added 

 

but after removing the dups in the index, now the original table has dups 

 

here is sample data ..trying to merge "ycr" and "integrated"and end up with "goal" 

 

https://docs.google.com/spreadsheets/d/1Bv56ggXMHQbWbro05kPliGdtll_u9C_W/edit?usp=drivesdk&ouid=1160...

I work with very extensive datas sheets and as soon as I tried this solution (that works pretty well) the uptade started to take hours to be conclude, thats the only issue Im having right now... Any clues about that?

Having the same issue. Performance severely degrades when I use this method. It works in theory, but in some cases will take forever. 

hello there,

the slowness is not coming from adding an index column or removing the duplicate, Power BI and Excel Power Queries are always having a problem when using the "Merge" method, this is happening because "merge" includes a "reference" step inside of it, this specific step "reference" is a performance killer, it really slows everything down in a horrible way, the only two solutions I found to this problem is either you find a workaround to this and try to achieve your result without using "merge" or "reference" (a workaround that does the same effect and I have done that sometimes), the other solution is to not use it all together, there is really no way to optimize the "merge" itself.

 

Regards,

Thanks. This solved my issue.

 

Thanks! This solution works like a charm
Don't know why Microsoft don want to fix this simple issue....

Anonymous
Not applicable

this approach should be accepted as solution. It perfectly resolved the issue.


@othmanalazzam wrote:

Hi people.

 

I've been looking for this issue with no luck as well until I founded a valid work around.

 

the work around goes like this ::

 

-Merge your sheets as you like (you don't have to merge on more than one column, one is fine).

 

-Add index column

 

-Expand your merged column (this will create duplicate values, the main issue we're looking into).

 

-since the values in the "index column" were purely unique, now Remove duplicates using index column.

 

-and now your data is back to original as it was inteded to be.


My duplicate issue is solved by applying this method. Thank you very much.

I've been working with a sales table and trying to create dimension tables from the table with all of the fields.  I've been frustrated, because I could not tell where the duplicate rows of data were coming from; however, could tell they happened after one of the two merge query steps.

 

I can affirm that having a unique index prior to the merge, and then removing duplicates from this field works.

 

Does anyone know why this occurs?

Anonymous
Not applicable

I'm facing the same trouble here. Not sure why merging 2 tables create duplicates. Have anyone got a work around this? 

 

Update 9/10 : Problem Solved. My original first table already have got unique rows hence I should have merged it as aggregated values otherwise it will further expand the table, causing duplicates. Also, it helps to use and select more than one field as the key to look up if you need that level of granularity. 

GenericUsername
Frequent Visitor

I just ran into the same problem and fixed it by joining the two tables on multiple fields.  For example, let's say a transaction number is the unique field you are joining on, expand the join to include other fields if possible.  So your joins would be on transaction number, time, and amount.  

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.