Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I need help to understand the best practices for Incremental loading with AZURE and loading into Data Factory.
I beleive there is two methods behind my use case but want to understand the route I should take and explore my options
Use case: I'm in a smaller organization ( Around 400 people) and we want to build a central location for a reporting platform solution. We use Salesforce and QAD as our ERP. QAD is a little old but we are still able to connect to azure through ODBC.
How and what prinicples should I be following for loading data from our ERP enviornment on a incremental cadence? Based off my understadning and knowledge, Should I (1) Use the copy data pipeline and use fucntions/ activities to update the table? how would I scheudle the job? How would this take into account for a heavy volume table? or (2) use notebooks throughout this process and set a datetime column as a water mark value to help with updates? How would I go around this method?
I would apprecaite any information on how to handle this and best practices.... I do not have a SQL server available which I know which would help for the control table feature. Any suggestions, thanks
Solved! Go to Solution.
Hi @Fisayo99
The Copy Data Pipeline in Azure Data Factory (ADF) is a robust choice for incremental loading. It allows you to efficiently move data from various sources into a centralized data store, catering to your need for a reporting platform solution. Utilize the watermark method for incremental loads. This involves tracking the last successfully loaded timestamp (or an equivalent unique identifier) and using it to load only new or updated records since that timestamp. ADF provides built-in support for scheduling pipelines. You can use the Trigger feature to schedule your incremental load jobs. For heavy volume tables, consider partitioning your data and parallelizing the copy activity to enhance performance.
Here for your reference:
Incrementally load data from Data Warehouse to Lakehouse - Microsoft Fabric | Microsoft Learn
Pattern to incrementally amass data with Dataflow Gen2 - Microsoft Fabric | Microsoft Learn
Best Regards
Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @Fisayo99
The Copy Data Pipeline in Azure Data Factory (ADF) is a robust choice for incremental loading. It allows you to efficiently move data from various sources into a centralized data store, catering to your need for a reporting platform solution. Utilize the watermark method for incremental loads. This involves tracking the last successfully loaded timestamp (or an equivalent unique identifier) and using it to load only new or updated records since that timestamp. ADF provides built-in support for scheduling pipelines. You can use the Trigger feature to schedule your incremental load jobs. For heavy volume tables, consider partitioning your data and parallelizing the copy activity to enhance performance.
Here for your reference:
Incrementally load data from Data Warehouse to Lakehouse - Microsoft Fabric | Microsoft Learn
Pattern to incrementally amass data with Dataflow Gen2 - Microsoft Fabric | Microsoft Learn
Best Regards
Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
107 | |
105 | |
79 | |
69 | |
62 |
User | Count |
---|---|
142 | |
106 | |
103 | |
85 | |
70 |