cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
frano72
Helper IV
Helper IV

Pattern for handling large fact tables in PBI Desktop to Service - not working

Hi,

 

I have 6 months of fact data in a dataflow and to make it a bit easier i filtered that DF down to the last five days and developed the report in pbi desktop.

 

i followed matthew roches guidance here : Quick Tip: Developing with large dataflows – BI Polar (ssbipolar.com)

 

In summary, after publishing the report to service,  I took out the filter on the dataflows and refreshed.  I then went to the dataset and refreshed.  However viewing the report in the service - it was still only the last five days ??

 

I went back - refreshed the data in pbi desktop (waited until the couple Gb got pulled down) published that and refreshed the dataset and its working - but am left with a pbix file chunkier than the local bakeries chunky steak pie.

 

anyone know what is going on here ?

5 REPLIES 5
lbendlin
Super User
Super User

what's your rationale for not using a dataset with incremental refresh?

Hi @lbendlin - good point - I'll have another look at incremental refresh in datatsets, but I thought that doesn't actually change the ultimate size of the dataset in the service or the size of the pbix file - it just speeds up the time it takes to refresh in the service.  (might be wrong on this though...)

What are you more concerned about?  dataset size or customer experience?  You could do a full direct query against your on-prem dataset (and save a ton of space in the Power BI Service) but it might not be a nice experience for your users.

i'm not so concerned about dataset size in the service.  even without incremental dataset refresh it only takes 20 seconds each day to refresh.

 

don't want to do direct query - the reporting is only on last days completed shift.  so the query (dataflow) only needs to run once per day to update the last days shifts data - so would be unnecessary load on sql to be running direct query against the sql for as many times the report is run during the day.

 

the concern is the performance of developing in pbi desktop having to have all the fact data loaded.  the pbix file is already 300 mb on 6 months of data and I really need to do 12 months.  

 

really looking for a generalised pattern for the next time when its five years of data.

 

technically - what i originally outlined should have worked - curious as to why it wasn't.

 

another option is using the recently released dataflow connector which supports direct query against a dataflow with enhanced engine turned on - so i think i could move the filter into the pq steps in pbi desktop rather than in pq query in the service - using the old DF connector  doesn't achieve this as it still has to download all the fact data before the filter step is applied.

 

(haven't looked at this yet as there are issues with the sep/oct release of pbi desktop that i've been impacted by so running aug version.)

Direct query against dataflows is an atrocity. Basically a shadow Azure SQL server sitting between a perfectly fine binary blob and your dataset, adding to the complexity.

 

When you have large incremental refresh datasets you can use Power BI Desktop to set RangeStart and RangeEnd to manageable levels (just enough for UI design) and then use ALM Toolkit to separate the meta data refresh from the data refresh, and your favorite scheduler to initiate individual partition refresh.

 

 

Helpful resources

Announcements
2022 Release Wave 1 760x460.png

2022 Release Wave 1 Plan

Power Platform release plan for the 2022 release wave 1 describes all new features releasing from April 2022 through September 2022.

Power BI December 2021 Update_carousel 768x460.jpg

Check it Out!

Click here to read more about the December 2021 Updates!

Jan 2022 Dev Camp 768x460 copy.png

Power BI Dev Camp- January 27th, 2022

Mark your calendars and join us for our next Power BI Dev Camp!

Top Solution Authors