Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
_Emil_
Frequent Visitor

Getting Random Duplicates after Merge and Expand

I'm using Merge (left outer) on two fields with Expand column, my right side is unique (Check with SQL including case sensitive collation).

 

Before I expand I don't get duplicates, after I expand I get duplicates, however the duplicates rows are identical (using all columns from right side) and appear to be random as after refreshing the table I get different duplicates each time.

 

I have also sorted data in Power Query as final step, and when I look at the duplicated ID, it actually isn't duplicated, but is duplicated in Power BI Report.

 

Applying remove duplicates solves the problem, but I'm trying to understand why merge with expand would produce duplicates. The table is 100k+ rows.

 

Any ideas? 

 

Regards

Emil

 

7 REPLIES 7
vpechersky
New Member

Where you able to find a solution to this? I'm having this problem when merging a large CSV and a large XLS.

v-huizhn-msft
Employee
Employee

Hi @_Emil_,

The date in left side have duplicate, right? If it is, I try to reproduce the scenario, please review the following.

The first table is left side table, the second is right side table which is unique.

1.png2.png

I merge(left outer) on two tickets, and result as the screenshot below.

3.png4.png

From the result, I get duplicates after expand. This is because there are duplicates, the value in right site must matches them.

If you have any other issue, please feel free to ask.


Best Regards,
Angelia

Hi Angelia,

 

Let me clarify, using your example you have 5 rows on left side, and you get 5 rows as final result, in my case I get 5+ rows, even when right side is unique.

 

So my output is like this:

1. D, D

2. D, D

3. C, C

4. E, E

5. E, E <-- duplicated

6. F, NULL

 

and if I refresh data my duplication would be different like this:

1. D, D

2. D, D

3. C, C

4. C, C <-- duplicated

5. E, E 

6. F, NULL

 

I have this beheviour on several different merges now, in all cases right side is unique, so I don't expect to get more rows on left side, but I do. 

 

 

If I take my duplicated row and filter my left side (1 unique row) than I no longer get the duplication, and if I was doing something wrong (which is possible) I would expect it to be there duplicated.

 

I'll try to play with it a bit more to see if I can narrow down the issue.

 

Regards

Emil

Hi @_Emil_

For your requriement, it should change dynamically. We are unable to achieve your expected result. Thanks for your understanding.

 

Best Regards,
Angelia

Thanks, it seems to happen only with larger volume of data, and affects only very small % of rows, but appears with almost every merge I use that uses different logic, columns (no issues with SQL Left join).

 

I'll try to narrow it down and see if it is something on my side that I'm misunderstanding. 

 

Regards

Emil

 

_Emil_
Frequent Visitor

Hi 

 

I have managed to confirm it's an issue.

 

I have exported my "item" dimension to a CSV and "swapped" Power BI source. The issue goes away.

 

I put back SQL and issue returns and is still random.

 

On 10 runs (refreshes with same the same data) I get:

1) Left side duplicates + blanks

2) no issue

3) no issue

4) Item dimension contains duplicates (which it doesn't)

5) no issue

6) no issue

7) Left side duplicates + blanks

😎 Item dimension contains duplicates (which it doesn't)

9) No issue

10) No issue

 

* (blanks between 670+ to 50k out of 135k unique rows, duplicates between 5k to 30k depending on the run)

 

The only difference I see is different collation left side Latin1_General_CI_AS right side (item dim) Latin1_General_CI_AS

 

I have considered collation when I started having the issue but I don't understand what it is random.

 

I've tried using collate to match the other side (tried on both sides), but it didn't work.

 

Is there a collation setting in Power BI when SQL Server is used as a source?

 

Regards

Emil

Yes, I see the same thing using SQL Server as the source. It's also possible to connect to SQL Server via OLE DB (I haven't checked to see if this resolves the issue). I am just removing duplicates twice, as you said in your first post.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors