Getting Random Duplicates after Merge and Expand

_Emil_ · ‎12-20-2016

I'm using Merge (left outer) on two fields with Expand column, my right side is unique (Check with SQL including case sensitive collation).

Before I expand I don't get duplicates, after I expand I get duplicates, however the duplicates rows are identical (using all columns from right side) and appear to be random as after refreshing the table I get different duplicates each time.

I have also sorted data in Power Query as final step, and when I look at the duplicated ID, it actually isn't duplicated, but is duplicated in Power BI Report.

Applying remove duplicates solves the problem, but I'm trying to understand why merge with expand would produce duplicates. The table is 100k+ rows.

Any ideas?

Regards

Emil

vpechersky · ‎04-10-2017

Where you able to find a solution to this? I'm having this problem when merging a large CSV and a large XLS.

v-huizhn-msft · ‎12-21-2016

Hi @_Emil_,

The date in left side have duplicate, right? If it is, I try to reproduce the scenario, please review the following.

The first table is left side table, the second is right side table which is unique.

I merge(left outer) on two tickets, and result as the screenshot below.

From the result, I get duplicates after expand. This is because there are duplicates, the value in right site must matches them.

If you have any other issue, please feel free to ask.

Best Regards,
Angelia

_Emil_ · ‎01-19-2017

Hi Angelia,

Let me clarify, using your example you have 5 rows on left side, and you get 5 rows as final result, in my case I get 5+ rows, even when right side is unique.

So my output is like this:

1. D, D

2. D, D

3. C, C

4. E, E

5. E, E <-- duplicated

6. F, NULL

and if I refresh data my duplication would be different like this:

1. D, D

2. D, D

3. C, C

4. C, C <-- duplicated

5. E, E

6. F, NULL

I have this beheviour on several different merges now, in all cases right side is unique, so I don't expect to get more rows on left side, but I do.

If I take my duplicated row and filter my left side (1 unique row) than I no longer get the duplication, and if I was doing something wrong (which is possible) I would expect it to be there duplicated.

I'll try to play with it a bit more to see if I can narrow down the issue.

Regards

Emil

v-huizhn-msft · ‎01-22-2017

Hi @_Emil_，

For your requriement, it should change dynamically. We are unable to achieve your expected result. Thanks for your understanding.

Best Regards,
Angelia

_Emil_ · ‎01-25-2017

Thanks, it seems to happen only with larger volume of data, and affects only very small % of rows, but appears with almost every merge I use that uses different logic, columns (no issues with SQL Left join).

I'll try to narrow it down and see if it is something on my side that I'm misunderstanding.

Regards

Emil

_Emil_ · ‎01-25-2017

Hi

I have managed to confirm it's an issue.

I have exported my "item" dimension to a CSV and "swapped" Power BI source. The issue goes away.

I put back SQL and issue returns and is still random.

On 10 runs (refreshes with same the same data) I get:

1) Left side duplicates + blanks

2) no issue

3) no issue

4) Item dimension contains duplicates (which it doesn't)

5) no issue

6) no issue

7) Left side duplicates + blanks

😎 Item dimension contains duplicates (which it doesn't)

9) No issue

10) No issue

* (blanks between 670+ to 50k out of 135k unique rows, duplicates between 5k to 30k depending on the run)

The only difference I see is different collation left side Latin1_General_CI_AS right side (item dim) Latin1_General_CI_AS

I have considered collation when I started having the issue but I don't understand what it is random.

I've tried using collate to match the other side (tried on both sides), but it didn't work.

Is there a collation setting in Power BI when SQL Server is used as a source?

Regards

Emil

freder1ck · ‎10-19-2017

Yes, I see the same thing using SQL Server as the source. It's also possible to connect to SQL Server via OLE DB (I haven't checked to see if this resolves the issue). I am just removing duplicates twice, as you said in your first post.

Getting Random Duplicates after Merge and Expand

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly