Solved: Re: Data Duplicating rows after merge

StephenK2022 · ‎07-05-2022

HI,

I hope someone can help with this.

After I merge 2 queries, some of my data duplicates into new rows in my data table. I am possibly using the wrong type of merge (left outer) but I was hoping someone can help stop this from happeneing. If I use the remove duplicates feature then it will remove some good data as well.

stefani_vileva · ‎07-05-2022

Hello @StephenK2022 ,

Thanks for the explanation, I have another question right now, what is the key of the new table because for example let we have the following table

if the key of my tables are columns: colA and colB, when I use the following selection and right click

I get the following result,

meaning that the key columns of my new tables are colA and colB so as long this combination of values is not reappearing in the dataset, the others will be removed. As you can see the second row with the duplication of 123, x was removed in the second appearing.

I hope this helps a little bit.

Kind regards,

Stefani Vileva

View solution in original post

stefani_vileva · ‎07-05-2022

Hello,

Can you please explain what is the goal of this merge?

When you are using left join, then all the rows of the left table are trying to find their match in the second table, so for example if we have the following tables:

Table A

1, xxx, 123

2, yyy, 134

3, zzz, 145

Table B

123, hello

123, bye

145, goodbye

Then, when you try to make a left join of A with B, using the last col of A and the first of B as a matching key, the result is the following

1, xxx, 123, hello

1, xxx, 123, bye

2, yyy, 134, null

3, zzz, 145, goodbye

In order to remove the duplicates, I think it is best to select the key columns of your result table and then to remove douplicates as a group directly through the power query.

If you have any questions, please let me know.

Kind regards,

Stefani

StephenK2022 · ‎07-05-2022

Hi Stefani,

Thanks for your reply.

The goal of the merge is to combine two data sets so that the "Number up.No up" column matches the "Job code" column.

The number up column is held in a different file and I want to match it to job code so that I can use it for a calculation.

The problem with removing duplicates through power query is that some rows have similar data.

There might be 2 separate entries that have the same values but at different times.

Also, some timestamps are the same as well but have different data:

stefani_vileva · ‎07-05-2022

Hello @StephenK2022 ,

Thanks for the explanation, I have another question right now, what is the key of the new table because for example let we have the following table

if the key of my tables are columns: colA and colB, when I use the following selection and right click

I get the following result,

meaning that the key columns of my new tables are colA and colB so as long this combination of values is not reappearing in the dataset, the others will be removed. As you can see the second row with the duplication of 123, x was removed in the second appearing.

I hope this helps a little bit.

Kind regards,

Stefani Vileva

Data Duplicating rows after merge

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly