Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
AGo
Post Patron
Post Patron

Large amount of data refresh best practice

Hi everybody,

 

I've got Pro license. I've got a folder collecting CSVs from today to the past 12 month, 1 month is approx. 1,5GB. I could use directly folder source, but I'm using it through a dataflow. The refresh is very slow, in the next months I will may be over the 10GBs limit and I'm occupying Microsoft's and my bandwidth in a way that is nonsense, because this data could be subjected to incremental refresh day by day (production machines log). Premium capacity has no reason to be considered in relation to the smallness of the project and this only one necessity.

Does anyone have a best practice to extract and transfer data in csv in a compressed way? Do I need to make a program to inject these CSVs in a database that acts as a more compressed datasource?

Any advice?

 

Thanks very much in advance

1 ACCEPTED SOLUTION

Thanks to you reply I accepted that I could upload the data on drive/sharepoint and process them directly in the cloud. I managed to open multiple csvs in multiple subfolders using Sharepoint folder connection method. Now the refresh takes a half hour to complete, maybe in the future it will be unsustainable with the data increasing and I'll have to pass to a datalake method, but for now it's working for this test phase. The develop is now made with a parameter limiting the number of rows to 1M to avoid bandwidth saturation and crashes when on desktop and then no limit when online. Thanks again.

View solution in original post

7 REPLIES 7

@AGo You could spin up an Azure Analysis Services instance and build the model in there for cheaper. Then you could do a live connection from Power BI to that model without incurring the larger overhead. This would allow you to scale as well at a lower price point.


Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

I wonder if you could ZIP them and read them into a dataflow from a OneDrive/SharePoint folder:

https://community.powerbi.com/t5/Desktop/PowerQuery-read-from-folder-with-many-zip-file/td-p/57334

https://github.com/hohlick/Power-Query-Excel-Formats/blob/master/UnZip.pq

 

Not sure of performance on files that size especially with a custom M function.

I read about this zip topic but it would be my last choice for a workaround because csvs increasing everyday must be automatically zipped in a multi-level hierarchical path in the single zip file.
But the 10gb size limit refers to the pbix that is compressed, maybe there's no problem in my 10Tb limit on sharepoint/drive. I will test it because now these csvs are on a linux machine, and I have to set up a sync with drive.
If it works I hope there is a method to open multiple csvs files on sharepoint/drive as for Folder connection.
If only incremental refresh were available for Pro users...

10 GB limit refers to in-memory uncompressed...and not including any memory overhead needed.

 

I've stuggled getting .pbix's greater than 3 GB to publish to Premium.

Explain your first paragraph please.
Anyway I wrote compressed because .pbix if renamed .zip is a compressed container.

Thanks

I was thinking of published data model size limitation...I see now you probably meant workspace storage limitation.

Thanks to you reply I accepted that I could upload the data on drive/sharepoint and process them directly in the cloud. I managed to open multiple csvs in multiple subfolders using Sharepoint folder connection method. Now the refresh takes a half hour to complete, maybe in the future it will be unsustainable with the data increasing and I'll have to pass to a datalake method, but for now it's working for this test phase. The develop is now made with a parameter limiting the number of rows to 1M to avoid bandwidth saturation and crashes when on desktop and then no limit when online. Thanks again.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors