Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
TL;DR:
Big dataflow entity times out when trying to add to a data model.
Goal:
I have a large dataflow entity (approximately 2.4 billion rows) that I want to connect to in Power BI Desktop.
Problem:
Whenever I try to add the dataflow entity to the data model, it tries to download the whole entity. Due to the size of the data in question, the connection times out after approximately two hours. How do I connect to large dataflows in PBI Desktop (or even Service)?
Notes & Additional information:
Solved! Go to Solution.
yes, deployment pipelines allow you to parameterize the date sources. It is preferable to have data sources that support query folding.
Another option for you would be to evaluate mixed storage mode with aggregations.
two notes:
1. Have you tried to access the dataflow via direct query? This is now possible, but I don't know if it helps in your scenario
2. Any chance of normalizing the source data / splitting it into separate dataflows ?
I believe the 10 GB limit is either already lifted or about to be lifted. The idea is that a dataset will use up the available memory of a SKU and then swap data in and out. That allows working with datasets that are larger than half the memory size of the SKU (people tend to forget that each dataset always occupies twice the RAM)
@lbendlinthanks for the reply.
I've tried DirectQuery preview feature. Unfortunately, it keeps timing out at the visualization level.
The 2.4 billion rows is the normalised data. I've split out the historical data into separate tables, so this is only the most recent month's data.
The news about the dataset restrictions being lifted is great to hear! If I cannot get the dataflow to work, I will look at this option. The big downside of using a dataset is the moment there's a new fact table or dimension, we have to add it in Desktop, then reupload to Service. If this happens well after the initial dataset upload, then all the fact tables will need to be manually updated (we split out the fact tables into history (which remains static) and current (which can be refreshed incrementally). This enables us to load all the data without the server timing out).
One more constraint that I forgot to mention - due to the size of the different fact tables, we're expecting to probably run into issues with incrementally refreshing a dataset that contains all of these tables as they would all be refreshed at the same time.
With dataflows, we're able to stagger each refresh throughout the day.
Incremental refresh is done by data source, not by dataset.
ok when I say Datasource I mean "Power Query Query resulting in a Power BI table". Surely you have separate queries for them, right?
Incremental refresh can be configure independently for each of these Power BI tables. Granted, they all have to share the RangeStart and RangeEnd parameters, so this may be a moot point.
@lbendlinthanks for your assistance with this topic. Really appreciated
Based on our discussion so far, I take it that there's no way to get data from an extremely large Dataflows Entity directly into PBI Desktop?
I also just came across the deployment pipelines feature. I haven't used this feature before. Can I change the Dataflow that a dataset points to in the different stages (Dev, Test, and Prod)? If so, I can build a dataflow that uses a small subset of the original full dataflow and use that for Dev/Test, then just change the datasource to the full dataflow in Prod.
If all else fails, I'll take the dataset approach.
yes, deployment pipelines allow you to parameterize the date sources. It is preferable to have data sources that support query folding.
Another option for you would be to evaluate mixed storage mode with aggregations.
I'll try the deployment pipeline to solve this problem. Thanks again for your assistance.
You may want to look at ALM Toolkit which will allow you to make structural changes (within reason) without requiring a full data reload. (Discaimer: This requires XMLA read/write to be enabled on your SKU. May be incompatible with other preview options)
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
111 | |
100 | |
80 | |
64 | |
58 |
User | Count |
---|---|
146 | |
110 | |
93 | |
84 | |
67 |