cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Helper IV
Helper IV

Help with large datasets - is there a more efficient way to do this?

I have a very large data set that i am managing in PowerBi.  The source data is all stored as CSV extract from an Amazon Redshift db.

 

2017 data - 40 gb

2018 data - 50 gb

2019 data - 55 gb

2020 data - 30 gb

 

I use powerbi to consolidate all of the data into one massive dataset/table.  2020 data constantly changes given new results (new month end data).

 

My problem is every month when i run the process (with new 2020 data) PowerBi refreshes the 2017 - 2019 tables even though nothing has changed at all. 

 

Is there anyway around this?  It makes this process take so much longer than it needs to.

 

Appreciate your help!

6 REPLIES 6
Highlighted
Super User IV
Super User IV

Hi @GeraldZ ,

 

load the data as an annual fact table and combine the values using dimensions and measures.

Then you can update a single table if necessary.

Look at this.

https://www.youtube.com/watch?v=cnpndrrrD8I

 

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

I work at KUMAVISION AG , one of the world's largest
implementation partners for Microsoft Dynamics. #BetterTogether
"Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website

Highlighted
Community Support
Community Support

Hi @GeraldZ ,

1. Since your dataset is so large, reduce to combine them as a single dataset and try to disable 'Include in report refresh ' in query editor for the datasets you don't want to refresh

report refresh.png

 

2. If you real want to combine them, try to configure incremental refresh and publish to power bi service. Please refer this document: Incremental refresh in Power BI 

 

Best Regards,
Yingjie Li

If this post helps then please consider Accept it as the solution to help the other members find it more quickly.

Highlighted

Sorry this doesn't work, my files are csv files it has nothing to do with fact or dimension and they are too large to merge with UNION.

Highlighted

 HI there,

I have tried removing them from including them in report refresh but when I merge the queries via M it still has to scan through the tables.  It takes forever.

Highlighted

Any help on this?  My dataset is only getting bigger and i need to be able to refresh this faster.

 

So to be clear i have 4 queries each between 30-40 gbs.  I create a consolidated table using merge query in M.  But each time i have a new month added to the dataset, it has to reload each query into the consolidated table. So even if i add only one month it has to reload 4 years each time.  Any help or suggestions?

Highlighted

Hi there, just following up, is there any solution to this?  I can't imagine i'm the only one to run into this.

Helpful resources

Announcements
Community Conference

Power Platform Community Conference

Check out the on demand sessions that are available now!

Community Conference

Microsoft Power Platform Communities

Check out the Winners!

secondImage

Create an end-to-end data and analytics solution

Learn how Power BI works with the latest Azure data and analytics innovations at the digital event with Microsoft CEO Satya Nadella.

Top Solution Authors
Top Kudoed Authors