Solved: How can I load data into datamart form datawarehou...

Tanayraj · ‎05-07-2024

Hello!

I have created Medilon architecture. I completed it with a bronze layer and a silver layer. Now to make one specific report I do not want to use all data that resides inside the sliver layer. The sliver layer is a data warehouse. I have one SQL query. Which has around 500 lines with more than 10 joins. This query gets only useful data for one particular report. Now, I want the result of this query in the gold layer. My idea is to create dataflow from silver to gold and store it in the gold layer, inside the datamart. However, the issue is When I try to run that SQL script on the sliver layer, it takes around 5 to 6 hours to give me results. Now, what should I do? I tried to create a view of that SQL script in the sliver layer and load that view in the gold layer but it is not able to load. It shows "evaluation canceled" I checked online and it says if execution takes a lot of time then this can happen. Then I tried to save the result of the SQL query in the sliver layer as a table but it also took a lot of time and load the table in the gold layer. I do not know if it is good for the future. Because it takes a lot of time and I also want to refresh data every 2 hours.

Any kind of information will be helpful. Which tool/feature should I use? I have fabric capacity.

v-zhengdxu-msft · ‎05-07-2024

Hi @Tanayraj

Instead of loading the entire dataset every time, consider implementing an incremental load strategy. This involves loading only the data that has changed since the last load. You can achieve this by using a watermark column (such as a last modified timestamp or an auto-incrementing ID) to identify new or updated records. Microsoft Fabric's Data Factory supports incremental loading, which can significantly reduce the volume of data transferred and the time required for each load.

Incrementally load data from Data Warehouse to Lakehouse - Microsoft Fabric | Microsoft Learn
Given the complexity of your SQL query, it's worth reviewing and optimizing it to reduce execution time. This could involve indexing key columns used in joins and where clauses, reducing the number of joins if possible, or breaking the query into smaller, more manageable parts.

Or maybe you can consider using Data Factory pipelines to perform the data transformation and loading. Data Factory allows you to create data-driven workflows for orchestrating and automating data movement and data transformation. You can create a pipeline that executes your SQL script to transform the data as needed and then loads the result into your datamart. This approach offers more control over the execution and can be optimized for performance.

What is Data Factory - Microsoft Fabric | Microsoft Learn

Best Regards

Zhengdong Xu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post