Solved: MS Fabric Incremental loading

Fisayo99 · ‎04-16-2024

I need help to understand the best practices for Incremental loading with AZURE and loading into Data Factory.

I beleive there is two methods behind my use case but want to understand the route I should take and explore my options

Use case: I'm in a smaller organization ( Around 400 people) and we want to build a central location for a reporting platform solution. We use Salesforce and QAD as our ERP. QAD is a little old but we are still able to connect to azure through ODBC.

How and what prinicples should I be following for loading data from our ERP enviornment on a incremental cadence? Based off my understadning and knowledge, Should I (1) Use the copy data pipeline and use fucntions/ activities to update the table? how would I scheudle the job? How would this take into account for a heavy volume table? or (2) use notebooks throughout this process and set a datetime column as a water mark value to help with updates? How would I go around this method?

I would apprecaite any information on how to handle this and best practices.... I do not have a SQL server available which I know which would help for the control table feature. Any suggestions, thanks

v-zhengdxu-msft · ‎04-18-2024

Hi @Fisayo99

The Copy Data Pipeline in Azure Data Factory (ADF) is a robust choice for incremental loading. It allows you to efficiently move data from various sources into a centralized data store, catering to your need for a reporting platform solution. Utilize the watermark method for incremental loads. This involves tracking the last successfully loaded timestamp (or an equivalent unique identifier) and using it to load only new or updated records since that timestamp. ADF provides built-in support for scheduling pipelines. You can use the Trigger feature to schedule your incremental load jobs. For heavy volume tables, consider partitioning your data and parallelizing the copy activity to enhance performance.

Here for your reference:

Incrementally load data from Data Warehouse to Lakehouse - Microsoft Fabric | Microsoft Learn

Pattern to incrementally amass data with Dataflow Gen2 - Microsoft Fabric | Microsoft Learn

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

v-zhengdxu-msft · ‎04-18-2024

Hi @Fisayo99

The Copy Data Pipeline in Azure Data Factory (ADF) is a robust choice for incremental loading. It allows you to efficiently move data from various sources into a centralized data store, catering to your need for a reporting platform solution. Utilize the watermark method for incremental loads. This involves tracking the last successfully loaded timestamp (or an equivalent unique identifier) and using it to load only new or updated records since that timestamp. ADF provides built-in support for scheduling pipelines. You can use the Trigger feature to schedule your incremental load jobs. For heavy volume tables, consider partitioning your data and parallelizing the copy activity to enhance performance.

Here for your reference:

Incrementally load data from Data Warehouse to Lakehouse - Microsoft Fabric | Microsoft Learn

Pattern to incrementally amass data with Dataflow Gen2 - Microsoft Fabric | Microsoft Learn

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

MS Fabric Incremental loading

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly