cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
uberdube
Advocate IV
Advocate IV

How to use Incremental Refresh in Dataflows?

Hi community,

 

More silly questions still trying to understand fundamentals here, but just not quite getting it.

I have a Workspace for 'Ingestion' entities, pulling data from an on-premise datasource (SQL Server).

 

2019-12-03_7-29-11.png

 

This ingestion Dataflow is running on Incremental Refresh - it took around 3 hours to do the initial load, but now consistently refreshes on schedule in around 4 - 6 minutes. Great so far.

 

2019-12-03_7-39-47.png

 

I then have a 2nd Workspace for staging ie. linked entities, with Calculated entities referencing them. (I'm still unsure if its better to do this all in the 'Ingestion' workspace, or to do staging / ETL in a seperate workspace like this.. that's another issue but any suggestions welcomed 😀 😞

 

2019-12-03_7-36-02.png

 

The 'Staged' Dataflow is however taking excessively longer to refresh than the 'Ingestion' dataflow, around 1 hour an 45 minutes:

 

2019-12-03_7-45-55.png

 

So I'm thinking - 'this time should be at least 'comparable' to the previous  incremental refresh time.. Do I need to set up Incremental Refresh on my Calculated Entities in this Dataflow?'

However, this Microsoft article seems to indicated that Computed entities behave the same way as Linked entities, which don't require incremental refresh (as Linked entities are simply a pointer), and that Calculated entities are simply performing queries over the existing stored data, not 'storing' the data again within themselves..  so putting incremental refresh on Calculated entities  doesn't seem to be the correct method:

 

2019-12-03_7-49-02.png

So.. my 3 big questions are:

 

  1. Does anyone know how this should be correctly configured and why my 'staging' refresh would be taking so much longer than the 'ingestion' incremental refresh?
  2. Do we only use Incremental Refresh on the Initial set of Ingestion datasource entities, or do you need to apply it on every 'step' of the way when working across multiple dataflows and/or multiple  workspaces??
  3. If linked entities (in a separate workspace) are effectively are already pointing to previous dataflows/workspaces, and calculated entities simply reference those linked entities.. then why do we need to refresh these subsequent dataflows at all?  (I am aware from this Microsoft article that dataflows in previous workspaces are treated as an 'external datasource' and therefore apparently need refreshing.. but how does this work when it appears that computed entities are in reality just referencing linked entities (which in turn are referencing the 'source' entities)... i visialize this like they are still just all 'pointers' in a series.. so why is a refresh even required on calculated entities?)

2019-12-03_8-16-45.png

 

Thanks for sticking with me... any help greatly appreciated!

1 ACCEPTED SOLUTION
v-shex-msft
Community Support
Community Support

HI @uberdube ,

#1. I think you only need to configure normally refresh on link entities.

#2. As the document said, it seems like incremental refresh will work on original dataflow, so if you already setting incremental refresh on original dataflow, you not need to config this on link entities.

#3. I think this refresh means to sync the last data from original dataflow and execute in query calculation steps on new records.
According to your description, I think refresh time is spent query operations in that computed entity if any advanced or complex query formulas existed in it. (merge or combine query/reference other steps)

In addition, you can also submit a support ticket to get further support form power bi team.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

View solution in original post

1 REPLY 1
v-shex-msft
Community Support
Community Support

HI @uberdube ,

#1. I think you only need to configure normally refresh on link entities.

#2. As the document said, it seems like incremental refresh will work on original dataflow, so if you already setting incremental refresh on original dataflow, you not need to config this on link entities.

#3. I think this refresh means to sync the last data from original dataflow and execute in query calculation steps on new records.
According to your description, I think refresh time is spent query operations in that computed entity if any advanced or complex query formulas existed in it. (merge or combine query/reference other steps)

In addition, you can also submit a support ticket to get further support form power bi team.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

Helpful resources

Announcements
Carousel_PBI_Wave1

2023 Release Wave 1 Plans

Power BI release plans for 2023 release wave 1 describes all new features releasing from April 2023 through September 2023.

Power BI Summit Carousel 2

Global Power BI Training

Make sure you register today for the Power BI Summit 2023. Don't miss all of the great sessions and speakers!

Thank you 2022 Review

2022 Monthly Feature Releases

We had a great 2022 with a ton of feature releases to help you drive a data culture.

Top Solution Authors
Top Kudoed Authors