cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
uberdube Regular Visitor
Regular Visitor

How to use Incremental Refresh in Dataflows?

Hi community,

 

More silly questions still trying to understand fundamentals here, but just not quite getting it.

I have a Workspace for 'Ingestion' entities, pulling data from an on-premise datasource (SQL Server).

 

2019-12-03_7-29-11.png

 

This ingestion Dataflow is running on Incremental Refresh - it took around 3 hours to do the initial load, but now consistently refreshes on schedule in around 4 - 6 minutes. Great so far.

 

2019-12-03_7-39-47.png

 

I then have a 2nd Workspace for staging ie. linked entities, with Calculated entities referencing them. (I'm still unsure if its better to do this all in the 'Ingestion' workspace, or to do staging / ETL in a seperate workspace like this.. that's another issue but any suggestions welcomed 😀 😞

 

2019-12-03_7-36-02.png

 

The 'Staged' Dataflow is however taking excessively longer to refresh than the 'Ingestion' dataflow, around 1 hour an 45 minutes:

 

2019-12-03_7-45-55.png

 

So I'm thinking - 'this time should be at least 'comparable' to the previous  incremental refresh time.. Do I need to set up Incremental Refresh on my Calculated Entities in this Dataflow?'

However, this Microsoft article seems to indicated that Computed entities behave the same way as Linked entities, which don't require incremental refresh (as Linked entities are simply a pointer), and that Calculated entities are simply performing queries over the existing stored data, not 'storing' the data again within themselves..  so putting incremental refresh on Calculated entities  doesn't seem to be the correct method:

 

2019-12-03_7-49-02.png

So.. my 3 big questions are:

 

  1. Does anyone know how this should be correctly configured and why my 'staging' refresh would be taking so much longer than the 'ingestion' incremental refresh?
  2. Do we only use Incremental Refresh on the Initial set of Ingestion datasource entities, or do you need to apply it on every 'step' of the way when working across multiple dataflows and/or multiple  workspaces??
  3. If linked entities (in a separate workspace) are effectively are already pointing to previous dataflows/workspaces, and calculated entities simply reference those linked entities.. then why do we need to refresh these subsequent dataflows at all?  (I am aware from this Microsoft article that dataflows in previous workspaces are treated as an 'external datasource' and therefore apparently need refreshing.. but how does this work when it appears that computed entities are in reality just referencing linked entities (which in turn are referencing the 'source' entities)... i visialize this like they are still just all 'pointers' in a series.. so why is a refresh even required on calculated entities?)

2019-12-03_8-16-45.png

 

Thanks for sticking with me... any help greatly appreciated!

1 ACCEPTED SOLUTION

Accepted Solutions
Community Support Team
Community Support Team

Re: How to use Incremental Refresh in Dataflows?

HI @uberdube ,

#1. I think you only need to configure normally refresh on link entities.

#2. As the document said, it seems like incremental refresh will work on original dataflow, so if you already setting incremental refresh on original dataflow, you not need to config this on link entities.

#3. I think this refresh means to sync the last data from original dataflow and execute in query calculation steps on new records.
According to your description, I think refresh time is spent query operations in that computed entity if any advanced or complex query formulas existed in it. (merge or combine query/reference other steps)

In addition, you can also submit a support ticket to get further support form power bi team.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin Sheng
If this post helps, please consider Accept it as the solution to help the other members find it more quickly
Learning resources: Power BI

View solution in original post

1 REPLY 1
Community Support Team
Community Support Team

Re: How to use Incremental Refresh in Dataflows?

HI @uberdube ,

#1. I think you only need to configure normally refresh on link entities.

#2. As the document said, it seems like incremental refresh will work on original dataflow, so if you already setting incremental refresh on original dataflow, you not need to config this on link entities.

#3. I think this refresh means to sync the last data from original dataflow and execute in query calculation steps on new records.
According to your description, I think refresh time is spent query operations in that computed entity if any advanced or complex query formulas existed in it. (merge or combine query/reference other steps)

In addition, you can also submit a support ticket to get further support form power bi team.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin Sheng
If this post helps, please consider Accept it as the solution to help the other members find it more quickly
Learning resources: Power BI

View solution in original post

Helpful resources

Announcements
New Topics Started Badges Coming

New Topics Started Badges Coming

We're releasing new versions of the badge that everyone's talking about. ;) Check your inbox for notifications.

MBAS 2020

Save the new date (and location)!

Our business applications community is growing—so we needed a different venue, resulting in a new date and location. See you there!

Difinity Conference

Difinity Conference

The largest Power BI, Power Platform, and Data conference in New Zealand

Power Platform 2019 release wave 2 plan

Power Platform 2019 release wave 2 plan

Features releasing from October 2019 through March 2020

Top Kudoed Authors (Last 30 Days)