cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
charliedata
Responsive Resident
Responsive Resident

Scope of a golden dataset

Our data warehouse Sales mart has several fact tables and dimension tables, so in building the Power BI dataset(s) to support Sales reporting we need to make a choice about the scope of what to include. This seems to boil down to options on a spectrum between:

  • Single golden dataset with multiple fact tables (e.g. sales calls, sales opportunities, closed sales)
  • Minimal (ideally one) fact table per dataset

 

In theory the first option is most flexible in supporting the widest range of reports without having to duplicate logic, and allows reports to contain content from different facts (essential for calculating things like conversion rates). However, it's more fragile than a more modular approach because if one table breaks in the data refresh, all the reports fail.

 

This seems to be a trade-off of efficiency and flexibility on one hand vs fragility on the other.

 

Is this the right way to see it and are there approaches that can get the best of both - i.e. avoid the trade-off.

1 ACCEPTED SOLUTION

Understanding and optimizing dataflows refresh - Power BI | Microsoft Docs

 

It's slightly unclear as to whether this is just incremental refreshes or all refreshes but I think combined with the above confirms it:

 

"In any of these refresh scenarios, if a refresh fails the data is not updated, which means that your data might be stale until the latest refresh completes, or you refresh it manually and it completes without error. Refresh occurs at a partition or entity, so if an incremental refresh fails, or an entity has an error, then the entire refresh transaction will not occur. Said another way, if a partition (incremental refresh policy) or entity fails for a particular dataflow, the entire refresh operation is set to fail, and no data gets updated."



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

View solution in original post

7 REPLIES 7
bcdobbs
Super User
Super User

If you move all your power query logic into modular data flows then they can refresh separately.

 

Your golden dataset then has minimal power query for each table to reference the relevant dataflows.

 

In the event that a data flow fails to refresh your dataset will still refresh but be left with old data from the failed data flow but the rest will be fine.



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!
charliedata
Responsive Resident
Responsive Resident

Thanks @bcdobbs 

 

Do you have a link to any documentation on how dataflows handle a failed refresh in terms of the dataset they contain?

Now worried I have that behaviour wrong! I'll do some googling and set up a test.



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

Still looking for some documentation however test below appears to confirm my original suggestion. My understanding is that under the covers data flows are persisted as text files in a data lake so assuming old test files are only over written after a full refresh. (I beleive if you hook up your own data lake to it you can see the files).

Test setup:

 

1) Two dataflows each referencing a different table stored in SQL Server (each just has one entity but they could be modular):

bcdobbs_0-1641934293353.png

2) They both have a a step that does a distinct on a named column eg:

Table.Distinct(Navigation, {"TableOneLabel"})
Therefore will break if underlying columns are renamed.
 
3) Dataflows are imported into a dataset in power bi and visualised with table visual showing all data:
bcdobbs_1-1641934469343.png

4) "TableOneLabel" is renamed in SQL database to "TableOneNewLabel"

5) Both data flows are refreshed and Table One gets a refresh error:

bcdobbs_2-1641934616872.png

6) Dataset is refreshed. This completes with no errors and original data is present:

bcdobbs_3-1641934669864.png

 



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

Understanding and optimizing dataflows refresh - Power BI | Microsoft Docs

 

It's slightly unclear as to whether this is just incremental refreshes or all refreshes but I think combined with the above confirms it:

 

"In any of these refresh scenarios, if a refresh fails the data is not updated, which means that your data might be stale until the latest refresh completes, or you refresh it manually and it completes without error. Refresh occurs at a partition or entity, so if an incremental refresh fails, or an entity has an error, then the entire refresh transaction will not occur. Said another way, if a partition (incremental refresh policy) or entity fails for a particular dataflow, the entire refresh operation is set to fail, and no data gets updated."



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!
Greg_Deckler
Super User
Super User

@charliedata You've effectively hit the nail on the head for the most part. The new DirectQuery for Live datasets capability has the potential to provide the best of both worlds so I would check that out if you have not already.


@ me in replies or I'll lose your thread!!!
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Learn Power BI 2nd Edition

Thanks @Greg_Deckler - appreciate you taking the time to respond and help me understand my options better

 

 

 

 

Helpful resources

Announcements
August 2022 update 768x460.jpg

Check it Out!

Click here to learn more about the August 2022 updates!

August 1 episode 9_no_dates 768x460.jpg

The Power BI Community Show

Watch the playback when Priya Sathy and Charles Webb discuss Datamarts! Kelly also shares Power BI Community updates.

Power Platform Conf 2022 768x460.jpg

Join us for Microsoft Power Platform Conference

The first Microsoft-sponsored Power Platform Conference is coming in September. 100+ speakers, 150+ sessions, and what's new and next for Power Platform.

Top Solution Authors
Top Kudoed Authors