Solved: Large amount of data refresh best practice

AGo · ‎11-14-2019

Hi everybody,

I've got Pro license. I've got a folder collecting CSVs from today to the past 12 month, 1 month is approx. 1,5GB. I could use directly folder source, but I'm using it through a dataflow. The refresh is very slow, in the next months I will may be over the 10GBs limit and I'm occupying Microsoft's and my bandwidth in a way that is nonsense, because this data could be subjected to incremental refresh day by day (production machines log). Premium capacity has no reason to be considered in relation to the smallness of the project and this only one necessity.

Does anyone have a best practice to extract and transfer data in csv in a compressed way? Do I need to make a program to inject these CSVs in a database that acts as a more compressed datasource?

Any advice?

Thanks very much in advance

AGo · ‎11-19-2019

Thanks to you reply I accepted that I could upload the data on drive/sharepoint and process them directly in the cloud. I managed to open multiple csvs in multiple subfolders using Sharepoint folder connection method. Now the refresh takes a half hour to complete, maybe in the future it will be unsustainable with the data increasing and I'll have to pass to a datalake method, but for now it's working for this test phase. The develop is now made with a parameter limiting the number of rows to 1M to avoid bandwidth saturation and crashes when on desktop and then no limit when online. Thanks again.

View solution in original post

Seth_C_Bauer · ‎11-14-2019

@AGo You could spin up an Azure Analysis Services instance and build the model in there for cheaper. Then you could do a live connection from Power BI to that model without incurring the larger overhead. This would allow you to scale as well at a lower price point.

Looking for more Power BI tips, tricks & tools? Check out PowerBI.tips the site I co-own with Mike Carlo. Also, if you are near SE WI? Join our PUG Milwaukee Brew City PUG

jeffshieldsdev · ‎11-14-2019

I wonder if you could ZIP them and read them into a dataflow from a OneDrive/SharePoint folder:

https://community.powerbi.com/t5/Desktop/PowerQuery-read-from-folder-with-many-zip-file/td-p/57334

https://github.com/hohlick/Power-Query-Excel-Formats/blob/master/UnZip.pq

Not sure of performance on files that size especially with a custom M function.

AGo · ‎11-14-2019

I read about this zip topic but it would be my last choice for a workaround because csvs increasing everyday must be automatically zipped in a multi-level hierarchical path in the single zip file.
But the 10gb size limit refers to the pbix that is compressed, maybe there's no problem in my 10Tb limit on sharepoint/drive. I will test it because now these csvs are on a linux machine, and I have to set up a sync with drive.
If it works I hope there is a method to open multiple csvs files on sharepoint/drive as for Folder connection.
If only incremental refresh were available for Pro users...

jeffshieldsdev · ‎11-14-2019

10 GB limit refers to in-memory uncompressed...and not including any memory overhead needed.

I've stuggled getting .pbix's greater than 3 GB to publish to Premium.

AGo · ‎11-14-2019

Explain your first paragraph please.
Anyway I wrote compressed because .pbix if renamed .zip is a compressed container.

Thanks

jeffshieldsdev · ‎11-14-2019

I was thinking of published data model size limitation...I see now you probably meant workspace storage limitation.

AGo · ‎11-19-2019

Thanks to you reply I accepted that I could upload the data on drive/sharepoint and process them directly in the cloud. I managed to open multiple csvs in multiple subfolders using Sharepoint folder connection method. Now the refresh takes a half hour to complete, maybe in the future it will be unsustainable with the data increasing and I'll have to pass to a datalake method, but for now it's working for this test phase. The develop is now made with a parameter limiting the number of rows to 1M to avoid bandwidth saturation and crashes when on desktop and then no limit when online. Thanks again.

Large amount of data refresh best practice

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024