Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
dskrobowski
Helper I
Helper I

When to Use a Dataflow vs a Golden Dataset

Hello, I'm struggling to understand the utility of Power BI dataflows over using a "golden" dataset (promoted/shared for use in multiple reports). I haven't found a post that pitted these two against each other in this way yet.

 

Assume that I have an average-quality data-warehouse function in my org. Assume also that I don't care about sharing prepared data with other parts of the Power Platform; I'm only interested in optimal Power BI architecture and implementation.

 

Here are my thoughts. Please feel free to correct me where I'm wrong.

  1. Both Power BI Dataflows and Power BI Datasets comprise the data. Despite some calling datasets merely the "semantic model on top of data," my understanding is that, yes, the dataset contains this semantic-model layer but that it also consists (in the case of "import mode") of the Vertipaq-compressed data(base). This is why datasets consume space and have size limitations in your PBI account.
  2. The issue of reducing queries on original data sources is solved by both dataflows and "golden" datasets. For the latter (as long as you're using import-mode), promoting, sharing, and reusing the dataset is the solution--hence the "golden" name. The only savings that I could see by using dataflows for query reduction is when you start to have multiple "golden" datasets for different business solutions, but I feel like I'd need to have a lot of these for query reduction to matter. Obviously, if your golden datasets are DirectQuery datasets, then querying dataflows will reduce load on the original source.
  3. Assuming that I'm using import-mode, why would I want to mirror my data twice--once in the dataflow and again in the compressed dataset?
  4. If you're using import-mode already, then trying to implement dataflows will either needlessly mirror your data or (if you DirectQuery to the dataflow for your report) unnecessarily prevent you from using many DAX functions in the dataset-layer (due to DAX limitations for DirectQuery sources).
  5. Given the above and given that I want to leverage the speed and flexibility of compressed columnar data and full DAX, I can't see why I would think of using dataflows.

 

All guidance is greatly appreciated.

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

Here's my biased opinion:  Dataflows are glorified CSV / Parquet files.  They can be useful when you have a slow, badly performing data source and you want to shield your developers from that. The dataflow data can be spooled efficiently into a dataset.

 

A golden dataset is not only a bunch of tables, it also represents the data model  (although that can be circumvented).  If well maintained (with some incremental refresh and selective partition refresh sprinkled in) it will be a great starting point for a unified company wide data model.  Compared to the professional data stewardship tools it is more of a toy, though.

 

Golden datasets come with their own baggage - specifically all the access issues around dataset chaining, but also design limitations in composite data models.

 

Dataflows do nothing for Power BI report consumers.  To create a pleasant experience for report users you should favor (import mode) datasets over dataflows, and accept the added cost for developers.

 

Recently someone asked if they should use an incremental dataset on top of an incremental dataflow.  That's on the same level as Direct Query against dataflows.

View solution in original post

2 REPLIES 2
lbendlin
Super User
Super User

Here's my biased opinion:  Dataflows are glorified CSV / Parquet files.  They can be useful when you have a slow, badly performing data source and you want to shield your developers from that. The dataflow data can be spooled efficiently into a dataset.

 

A golden dataset is not only a bunch of tables, it also represents the data model  (although that can be circumvented).  If well maintained (with some incremental refresh and selective partition refresh sprinkled in) it will be a great starting point for a unified company wide data model.  Compared to the professional data stewardship tools it is more of a toy, though.

 

Golden datasets come with their own baggage - specifically all the access issues around dataset chaining, but also design limitations in composite data models.

 

Dataflows do nothing for Power BI report consumers.  To create a pleasant experience for report users you should favor (import mode) datasets over dataflows, and accept the added cost for developers.

 

Recently someone asked if they should use an incremental dataset on top of an incremental dataflow.  That's on the same level as Direct Query against dataflows.

Thanks for sharing your perspective!

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors