Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Inconsistent Data After Refresh - Merge Query Issue (?)

Hi all,

 

I encounter a very strange issue/behavior in PBI, which I suspect has something to do with the way merge transformation is used in PowerQuery.

 

Basically, I extract data from SQL Server database to PBI tables/queries. There are 5 queries in total (1 for Date, 3 "dimensions", and 1 "fact").

The source tables do not have obvious key field in them, I concatenate several fields to make up a key field.

Also, I added Index to the table/query in PowerQuery (via Add Column > Index Column).

I did this for the 3 "dimension" tables.

 

Then, for the "fact" table, I performed Merged Queries transformations (LEFT Outer with no fuzzy matching) with the 3 "dimension" tables based on the combination of fields which form the key fields.

From this transformation, I can then obtain the Index field (act like a surrogate key now) from each of the "dimension" tables, so that it replicates a proper star-schema / fact-dimension data model, which I believe is the best practice.

 

It was all working fine, until I performed data refresh.

Every time I refresh the data, even though the souce has not changed, the data in Power BI desktop keeps changing.

Looking at the Data tab, it seems that the Index keys in the fact table are out of whack, i.e. they dont correspond to the correct Index field value in the dimension tables.

This behaviour is also observed when I refresh data in Power BI Report Server and Power BI Service.

 

One thing that seems to "fix" it is to disable parallel loading in Power BI Desktop (File >Options and Settings > Options > Current File > Data Load > Parallel loading of tables > untick Enable parallel loading of tables).

It makes the data refresh/loading slow, but it seems that the resulting data does not get out of whack after it completes.

This only applies on Power BI Desktop though, i.e. even if I disable parallel loading in PBI Desktop, once it is published to PBIRS or PBI Service, the issue is still there.

 

Has anyone experienced this before? Any ideas on what to do or try here?

Is this a (known) bug? Or, perhaps I "misuse" Merge Queries transformation (if it is indeed the cause) ?

 

Thanks in advance.

 

 

 

 

 

13 REPLIES 13
Anonymous
Not applicable

Check data types of key fields used during merge step. 

rargyle
Advocate II
Advocate II

think  I have found a solution by the power of SQL query. Instead of adding an index column in PowerQuery, after getting the data, I'm using ROW_NUMBER in my database SQL:

 

SELECT ROW_NUMBER() OVER( ORDER BY Column1, Column2) id_Num, Column1, Column2 [...] 

 

So far my indexes have behaved on refreshing in the PowerBI Service. Hopefully it stays that way...!

Just to update, I've been using this approach (SQL ROW_NUMBER to add an index) since September and it is still working very well, without any problems, and is fast.

ImkeF
Super User
Super User

Hi @Anonymous ,

is my understanding correct that you're extracting the index-fields for the fact tables from the merge withe the Dim-Tables?
Then please add a buffer-step as the last step in your Dim-Tables:

 

Table.Buffer(YourLastStepNameSoFar)

 

This should solve the problem.

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Thanks @ImkeF .

Yes, I am extracting the index fields from the dimension tables via Merge Queries transformation in the fact table, so that the fact table has the index fields from the dimension tables and act like surrogate keys in the fact table.

Is there anything wrong with this approach?

 

I added the Table.Buffer as you suggested. However, it seems to only work in Power BI Desktop, i.e. I can re-enable the parallel loading again and after refresh in PBI Desktop, the data does not get out of whack.

However, when I republish to PBI Service and refresh the data there, the data still gets out of whack after the refresh 😞

 

What does Table.Buffer actually do by the way? And why do you think I need to add it in the PowerQuery in this case?

Hi @Anonymous ,

this sounds like a bug to me. I'd recommend to report it as such: https://community.powerbi.com/t5/Issues/idb-p/Issues

 

Table.Buffer should make the whole table immutable: https://bengribaudo.com/blog/2019/12/10/4778/power-query-m-primer-part12-tables-table-think-i

Although adding an index-column should do so as well, so it's strange already, that a buffer is needed for the dim table.

What you can try is to buffer the fact-table as well - just as an idea to solve the current problem.

But in theory it should work without...

 

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

@ImkeF , I did some brief testing by adding Table.Buffer on the fact table and remove the Table.Buffer in the dimension tables.

In Power BI Desktop, it seems to work fine, i.e. data does not get out of whack after refresh there.

However, the issue still persists when I publish to PBI Service and refresh there.

 

Basically, what I am trying to do is following this approach of modelling.

https://radacad.com/build-your-first-star-schema-model-in-action-power-bi-modeling-basics

 

Since the issue seems to be caused by PowerBI/PowerQuery incorrectly retrieves the Index field values after Merged Queries transformation in the fact table, I removed all the relationships which use these key/index fields.

Instead, I now use the concatenation of multiple fields to form the keys on each tables and join them in the Relationship.

By doing this, no issue with data getting out of whack - PBI Desktop and PBI Service data refresh seem fine.

 

Really disappointing something basics like this cause lots of issues and wasted lots of time, especially because I know some PBI experts like Reza (as above) and others promote this approach of data modelling..

Does your dataset contain terms that differ only by capiolization, as this can cause an issue when folding with SQL.

Anonymous
Not applicable

Hi @artemus ,

 

No, I believe they are consistent. 

Also, the fact that turn off parallel loading in PBI Desktop seems to resolve this issue suggests that the cause lies somewhere else.

Hi @artemus ,

that's an interesting aspect.
Just for my understanding: Wondering if any trouble from that side could be prevented. if one would buffer the table before adding the index column then?

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Hi @Anonymous ,

I can understand your frustration.
I'm pretty sure this is a bug, would have appreciated if you would have reported it.

 

Don't understand why you removed the buffer from the Dim-Tables. I would strongly recommed to keep them in, so having a buffer on both tables: Fact and Dim.

 

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Hi bidax, any news from Microsoft about that bug? Did you find some alternative solution to avoid inconsistencies in surrogate keys?

I'm experimenting the same issue here, just to have some fresh reference on this issue.

Thank you.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors