Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Help with large datasets - is there a more efficient way to do this?

I have a very large data set that i am managing in PowerBi.  The source data is all stored as CSV extract from an Amazon Redshift db.

 

2017 data - 40 gb

2018 data - 50 gb

2019 data - 55 gb

2020 data - 30 gb

 

I use powerbi to consolidate all of the data into one massive dataset/table.  2020 data constantly changes given new results (new month end data).

 

My problem is every month when i run the process (with new 2020 data) PowerBi refreshes the 2017 - 2019 tables even though nothing has changed at all. 

 

Is there anyway around this?  It makes this process take so much longer than it needs to.

 

Appreciate your help!

6 REPLIES 6
v-yingjl
Community Support
Community Support

Hi @Anonymous ,

1. Since your dataset is so large, reduce to combine them as a single dataset and try to disable 'Include in report refresh ' in query editor for the datasets you don't want to refresh

report refresh.png

 

2. If you real want to combine them, try to configure incremental refresh and publish to power bi service. Please refer this document: Incremental refresh in Power BI 

 

Best Regards,
Yingjie Li

If this post helps then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

 HI there,

I have tried removing them from including them in report refresh but when I merge the queries via M it still has to scan through the tables.  It takes forever.

Anonymous
Not applicable

Any help on this?  My dataset is only getting bigger and i need to be able to refresh this faster.

 

So to be clear i have 4 queries each between 30-40 gbs.  I create a consolidated table using merge query in M.  But each time i have a new month added to the dataset, it has to reload each query into the consolidated table. So even if i add only one month it has to reload 4 years each time.  Any help or suggestions?

Anonymous
Not applicable

Hi there, just following up, is there any solution to this?  I can't imagine i'm the only one to run into this.

Hi @Anonymous ,

 

load the data as an annual fact table and combine the values using dimensions and measures.

Then you can update a single table if necessary.

Look at this.

https://www.youtube.com/watch?v=cnpndrrrD8I

 

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast


Anonymous
Not applicable

Sorry this doesn't work, my files are csv files it has nothing to do with fact or dimension and they are too large to merge with UNION.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors