Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Scope of a golden dataset

Our data warehouse Sales mart has several fact tables and dimension tables, so in building the Power BI dataset(s) to support Sales reporting we need to make a choice about the scope of what to include. This seems to boil down to options on a spectrum between:

  • Single golden dataset with multiple fact tables (e.g. sales calls, sales opportunities, closed sales)
  • Minimal (ideally one) fact table per dataset

 

In theory the first option is most flexible in supporting the widest range of reports without having to duplicate logic, and allows reports to contain content from different facts (essential for calculating things like conversion rates). However, it's more fragile than a more modular approach because if one table breaks in the data refresh, all the reports fail.

 

This seems to be a trade-off of efficiency and flexibility on one hand vs fragility on the other.

 

Is this the right way to see it and are there approaches that can get the best of both - i.e. avoid the trade-off.

1 ACCEPTED SOLUTION

Understanding and optimizing dataflows refresh - Power BI | Microsoft Docs

 

It's slightly unclear as to whether this is just incremental refreshes or all refreshes but I think combined with the above confirms it:

 

"In any of these refresh scenarios, if a refresh fails the data is not updated, which means that your data might be stale until the latest refresh completes, or you refresh it manually and it completes without error. Refresh occurs at a partition or entity, so if an incremental refresh fails, or an entity has an error, then the entire refresh transaction will not occur. Said another way, if a partition (incremental refresh policy) or entity fails for a particular dataflow, the entire refresh operation is set to fail, and no data gets updated."



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

View solution in original post

7 REPLIES 7
bcdobbs
Super User
Super User

If you move all your power query logic into modular data flows then they can refresh separately.

 

Your golden dataset then has minimal power query for each table to reference the relevant dataflows.

 

In the event that a data flow fails to refresh your dataset will still refresh but be left with old data from the failed data flow but the rest will be fine.



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!
Anonymous
Not applicable

Thanks @bcdobbs 

 

Do you have a link to any documentation on how dataflows handle a failed refresh in terms of the dataset they contain?

Now worried I have that behaviour wrong! I'll do some googling and set up a test.



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

Still looking for some documentation however test below appears to confirm my original suggestion. My understanding is that under the covers data flows are persisted as text files in a data lake so assuming old test files are only over written after a full refresh. (I beleive if you hook up your own data lake to it you can see the files).

Test setup:

 

1) Two dataflows each referencing a different table stored in SQL Server (each just has one entity but they could be modular):

bcdobbs_0-1641934293353.png

2) They both have a a step that does a distinct on a named column eg:

Table.Distinct(Navigation, {"TableOneLabel"})
Therefore will break if underlying columns are renamed.
 
3) Dataflows are imported into a dataset in power bi and visualised with table visual showing all data:
bcdobbs_1-1641934469343.png

4) "TableOneLabel" is renamed in SQL database to "TableOneNewLabel"

5) Both data flows are refreshed and Table One gets a refresh error:

bcdobbs_2-1641934616872.png

6) Dataset is refreshed. This completes with no errors and original data is present:

bcdobbs_3-1641934669864.png

 



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

Understanding and optimizing dataflows refresh - Power BI | Microsoft Docs

 

It's slightly unclear as to whether this is just incremental refreshes or all refreshes but I think combined with the above confirms it:

 

"In any of these refresh scenarios, if a refresh fails the data is not updated, which means that your data might be stale until the latest refresh completes, or you refresh it manually and it completes without error. Refresh occurs at a partition or entity, so if an incremental refresh fails, or an entity has an error, then the entire refresh transaction will not occur. Said another way, if a partition (incremental refresh policy) or entity fails for a particular dataflow, the entire refresh operation is set to fail, and no data gets updated."



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!
Greg_Deckler
Super User
Super User

@Anonymous You've effectively hit the nail on the head for the most part. The new DirectQuery for Live datasets capability has the potential to provide the best of both worlds so I would check that out if you have not already.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Mastering Power BI 2nd Edition

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

Thanks @Greg_Deckler - appreciate you taking the time to respond and help me understand my options better

 

 

 

 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.