Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Loading large data sets

We are doing a POC on Power BI Embedded to be used as a 3rd party BI tool. We were able to navigate through most of the features, except for two.

  1. Loading large datasets: We intend to load data in the scale of 70-100 GB (as measured in Amazon Redshift, which has its own compression scales and metrics) into possibly a single dataset, which will be used to power up 7-10 reports. Since DirectQuery doesn't work with Power BI Embedded (and it gets sluggish anyway), we have to go with the Import mode. As a start, we purchased the A4 SKU (for 20 GB of Redshift data) in Power BI Embedded and associated a workspace in Power BI server with this node. The issue arises in the next step, how do we pull in the 20 GB data and create a report on power BI desktop? The pbix file inherently doesn't support such a high data size. How are datasets and reports supposed to be created when the data size is of this scale?
  2. As an alternative, I tried creating a dataset and a report on a smaller data size (500 MB). The idea was to make a pbix file, publish it into a workspace, change the source of the data on the power BI server and then refresh again. Somehow, the change source option is disabled both in the power BI server as well as the Power BI desktop. This issue persists when the source of the data is Amazon Redshift or Dataflows, however, the option gets activated again when we use a CSV or excel file.

We have been looking for solutions through online forums and Microsoft consultants, but no one has been able to provide a definite solution to this problem.

5 REPLIES 5
ibarrau
Super User
Super User

Hi. You have a complicated scenario. Let me explain some things. First of all, if you pay for Capacity in Power Bi (like Power Bi Embed from azure) you can activate that for a workspace. That workspace will support models with more than 1gb pbix size. If you data is 100gb, power bi will comprese it so it will size less. I don't think you will have problem from that part.

 

The problem will be redshift. I test this product and it won't scale with power bi as import mode. If you can load all data in power bi desktop because you have a ram limit in your pc or laptop (because that is the limit in desktop), i think you can create views o procedures in redshift with limit 1000 to load in Power Bi Desktop. Publish to service and then change the limit of the queries from redshift before refreshing power bi service. 

 

In my experience you should try azure datawarehouse. Becase with redshift I couldn't load 50 millons rows from redshift to power bi with import mode. It takes like forever to refresh. AWS and Azure don't like each other haha. I am using Power Bi Aggregations for redshift that is like an hybrid solution (composite model with direct query and import mode).

 

Regards,


If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Happy to help!

LaDataWeb Blog

Anonymous
Not applicable

Hi ibarrau,

Does that mean that the ram limit does not apply in case when we load data from Azure dataware house? If so, why?

Also, we were somehow able to pull data from "dataflows" to Power BI desktop, but somehow, we were unable to change the data source for that pbix file. Is this immutability an inherent flaw?

 

Thanks for your help!

No. Let me try to be more clear. What I was trying to say is that power bi desktop doesn't have a limitation. The limitation is on service depending on the licence you have. Power Bi Pro will have a limit of 1gb for each power bi file. If you have millons of rows from redshift in import mode you will probably have a pbix bigger than 1gb.

 

The other problem is that when I tried to import millons of data from redshift, it didn't finish. I cancel the import when I was like 4 hours importing. That is a problem in comunications between redshift and power bi. That was why i am recomending azure data warehouse, because it has a better comunication but it won't take the limitations of power bi licence out.

 

Can you explain more about the second problem about dataflows. I have power bi desktop connected to dataflows and I can change the datasources of my pbix file.

 

Regards, hope this helps


If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Happy to help!

LaDataWeb Blog

Anonymous
Not applicable

Hi,
We wanted to avoid moving data from Redshift to Azure data warehouse and that is when we came across dataflows. Using dataflows we were able to dump data from Redshift to Dataflows with ease. At the same time, we were also able to pull that data into power BI desktop (despite the 20 gB size). The step where we couldn't move forward was changing the source of the data. I've attached a screenshot;  the change source option is 'greyed' here. For that matter, the same problem persisted with Redshift as well. The option stood activated when the source of the data was csv/excel

 

Is there any way to diagnose this problem?

Note: All data was pulled in Import mode, nothing via directquery

Screenshot 2019-09-30 at 4.59.31 PM.png

Ok. Great! nice idea with dataflows.

If you want to change this dataflows source to another one you can change "source" step in each table from edit queries. If you want to change datasource between two different dataflows just clear the permissions so Power Bi ask you to login again and select the new dataflow.

 

Hope this help, with import mode you can modify each source.

Regards,


If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Happy to help!

LaDataWeb Blog

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors