Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
doaks
New Member

Partitionig of large data sets (no DB, just CSVs)

Hi there!

 

So for some stupid reasons that can't really be changed for now we have reporting data exclusively in thousands of CSV files that date back all the way to 2021. In total we are talking about ~53k files for the last 4 years. I'm not sure about how many rows of data we are talking, but I'm expecting it to be ~1-2 million, not more. So nothing Power BI can't handle easily.

 

However, automated updates have failed to work a long time ago, due to the sheer amount of files being pulled each time, which obviously (at least to me?) calls for some partitioning and incremental updates.

 

I implemented a very basic partitioning setup, archiving anything that is older than 4 weeks from today. Whether that makes alot of sense or not - the initial partitioning fails again for the same reasons: time out bc too much to handle. At least that's what I'm getting.

 

My question: What are good resources for dealing with that problem? MS documentation is giving me a headache, YouTube is being way to general about it (at least from what I found after hours of digging) and the cases in the community forum here are either too specific or starting at a more expert level than I am currently (or both).

 

If anyone could share some good how-tos or courses about partitioning and maybe even about my problem described above I'd really appreciate it. I would haved jumped to Fivrr and just have someone else deal with it, but where's the learning in that? I mean, I will probably end up hiring someone anyways to get it up and running, but I still wanna mess with it in our Sandbox to become a better PowerBI person.

 

Thanks in advance!

2 ACCEPTED SOLUTIONS
audreygerred
Super User
Super User

For incremental refresh, your source needs to handle query folding (CSVs do not). However, you can create a datamart to take care of this: Incremental Refresh for Non-Query Folding Sources in Power BI Desktop (mssqltips.com)




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





View solution in original post

v-yohua-msft
Community Support
Community Support

Hi, @doaks 

Thanks for the reply from @audreygerred , please allow me to provide addition:

Before you can partition or incrementally update your CSV file, you need to effectively merge your CSV file into Power BI. You can use Power Query to merge files to optimize this process, especially if the files have a similar structure. For guidance on this initial step, see the documentation on using CSV files in Power BI:

Get data from comma separated value (CSV) files - Power BI | Microsoft Learn

 

Once the data is in Power BI, the next step is to set up incremental refresh to query and update only the data that has changed or been added since the last refresh. This is critical for working with large datasets and avoiding timeouts. Power BI's incremental refresh feature can help you do this. Here's a detailed guide on how to configure an incremental refresh policy:

Configure incremental refresh and real-time data for Power BI semantic models - Power BI | Microsoft...

 

How to Get Your Question Answered Quickly 

Best Regards

Yongkang Hua

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

2 REPLIES 2
v-yohua-msft
Community Support
Community Support

Hi, @doaks 

Thanks for the reply from @audreygerred , please allow me to provide addition:

Before you can partition or incrementally update your CSV file, you need to effectively merge your CSV file into Power BI. You can use Power Query to merge files to optimize this process, especially if the files have a similar structure. For guidance on this initial step, see the documentation on using CSV files in Power BI:

Get data from comma separated value (CSV) files - Power BI | Microsoft Learn

 

Once the data is in Power BI, the next step is to set up incremental refresh to query and update only the data that has changed or been added since the last refresh. This is critical for working with large datasets and avoiding timeouts. Power BI's incremental refresh feature can help you do this. Here's a detailed guide on how to configure an incremental refresh policy:

Configure incremental refresh and real-time data for Power BI semantic models - Power BI | Microsoft...

 

How to Get Your Question Answered Quickly 

Best Regards

Yongkang Hua

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

audreygerred
Super User
Super User

For incremental refresh, your source needs to handle query folding (CSVs do not). However, you can create a datamart to take care of this: Incremental Refresh for Non-Query Folding Sources in Power BI Desktop (mssqltips.com)




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!





Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.