cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
jeffshieldsdev
Impactful Individual
Impactful Individual

When to split dataflows in ETL chain?

I have multiple dataflows, as a part of an ETL chain.

 

I'm following this great pattern from @MatthewRoche here: https://ssbipolar.com/2019/10/07/quick-tip-factoring-your-dataflow-entities/

 

For each entity, I have at least 3 dataflows:

 

  • 1-Ingest
  • 2-Cleanse
  • 3-Final

 

This is a great setup, because it gives me injection points if I need to add new or change data midstream.

 

At this point, Ingest simply ingests. I have a few Ingest dataflows with Incremental Refresh enabled.

 

Cleanse converts data types...my data source stores some numerical IDs as strings instead of integers--so I convert those here.

 

Final (at this point) simple renames the columns to business friendly names.

 

This is a great pattern--but I wonder if my minimal transformations require this many steps--or am I sacrificing performance by generating 3 different computed entities in this chain?

 

Should Ingest always just ingest, or if I'm simply casting and renaming columns--should I just use one dataflow? Does anyone have any recommendations in this space?  Thanks.

 

EDIT:

I think I answered my question on Ingest with Incremental Refresh.  When I enable Incremental Refresh, additional steps and queries are added ("_Canary", "RangeStart", and "RangeEnd") and a Table.Select() step added to my main query.  This steps is added last, so any other transformations will have to be performed first--meaning the Table.Select() will not fold and all records will have to be downloaded before they can be filtered.

 

EDIT2:

Although, I could have two queries in my dataflow: Customers_Ingest and Customers_Cleanse, where _Ingest is untransformed and incremental refresh enabled, and _Cleanse is linked and has the transformations.  Since these transformations are happening within the same dataflow though, I assume I wouldn't get the benefit of the enhanced compute engine.

1 REPLY 1
v-xuding-msft
Community Support
Community Support

Hi @jeffshieldsdev ,

If you need to get timely help, I think you could create a support ticket to get the dedicated support from Microsoft. You could reference the blog about how to create it.  I don't have much experience in ETL.  Sorry that I have not helped you.

Support Ticket.gif

Best Regards,

Xue Ding

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly. Kudos are nice too.

Best Regards,
Xue Ding
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Carousel_PBI_Wave1

2023 Release Wave 1 Plans

Power BI release plans for 2023 release wave 1 describes all new features releasing from April 2023 through September 2023.

Power BI Summit Carousel 2

Global Power BI Training

Make sure you register today for the Power BI Summit 2023. Don't miss all of the great sessions and speakers!

BizApps LATAM 2023

Business Application LATAM Summit 2023

Join the biggest FREE Business Applications Event in LATAM this February.

Power Platform Bootcamp

Global Power Platform Bootcamp

In this bootcamp we will deep-dive into Microsoft’s Power Platform stack with hands-on sessions and labs, delivered to you by experts and community leaders.

Top Solution Authors
Top Kudoed Authors