Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
jeffshieldsdev
Solution Sage
Solution Sage

When to split dataflows in ETL chain?

I have multiple dataflows, as a part of an ETL chain.

 

I'm following this great pattern from @MatthewRoche here: https://ssbipolar.com/2019/10/07/quick-tip-factoring-your-dataflow-entities/

 

For each entity, I have at least 3 dataflows:

 

  • 1-Ingest
  • 2-Cleanse
  • 3-Final

 

This is a great setup, because it gives me injection points if I need to add new or change data midstream.

 

At this point, Ingest simply ingests. I have a few Ingest dataflows with Incremental Refresh enabled.

 

Cleanse converts data types...my data source stores some numerical IDs as strings instead of integers--so I convert those here.

 

Final (at this point) simple renames the columns to business friendly names.

 

This is a great pattern--but I wonder if my minimal transformations require this many steps--or am I sacrificing performance by generating 3 different computed entities in this chain?

 

Should Ingest always just ingest, or if I'm simply casting and renaming columns--should I just use one dataflow? Does anyone have any recommendations in this space?  Thanks.

 

EDIT:

I think I answered my question on Ingest with Incremental Refresh.  When I enable Incremental Refresh, additional steps and queries are added ("_Canary", "RangeStart", and "RangeEnd") and a Table.Select() step added to my main query.  This steps is added last, so any other transformations will have to be performed first--meaning the Table.Select() will not fold and all records will have to be downloaded before they can be filtered.

 

EDIT2:

Although, I could have two queries in my dataflow: Customers_Ingest and Customers_Cleanse, where _Ingest is untransformed and incremental refresh enabled, and _Cleanse is linked and has the transformations.  Since these transformations are happening within the same dataflow though, I assume I wouldn't get the benefit of the enhanced compute engine.

1 REPLY 1
v-xuding-msft
Community Support
Community Support

Hi @jeffshieldsdev ,

If you need to get timely help, I think you could create a support ticket to get the dedicated support from Microsoft. You could reference the blog about how to create it.  I don't have much experience in ETL.  Sorry that I have not helped you.

Support Ticket.gif

Best Regards,

Xue Ding

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly. Kudos are nice too.

Best Regards,
Xue Ding
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors