Solved: Re: Bug or am I missing something? Procedure in Da...

AlexanderPowBI · ‎03-15-2024

Hello,

I'm experiencing something very unusual and can't find the issue. I'm modifying table names and columns here for privacy reasons.

My flow is as such;

1. Notebook 1: Data is ingested from API to a "Bronze Lakehouse" into a managed table, call it tableBronze.

Metadata column added: bronze_inserted_timestmap

2. Notebook 2 reads tableBronze and filters it on bronze_inserted_timestamp > [Date[pipeline trigger timestamp]]. Then transformations are made within a DF and data is "upserted" into Silver Lakehouse, tableSilver.

Meaning if df.transaction_id = tableSilver.transaction_id exists its updated in the tableSilver, and if not, its added. Here also a "silver_transformed_timestamp" is added.

So far so good. The flow works and tested manually and inside the data pipeline and it works.

3. Now comes the problem.

I have created a Gold Data Warehouse and table "tableGold"In this data warehouse, I have created a stored procedure.

The intention is to:

3.1. Read the tableSilver from Silver Lakehouse and filter it based on silver_transformed_timestamp > [Date[pipeline trigger timestamp]]. Again intention is to only include data that is just transformed.

3.2. Update existing records in tableGold based on [SilverLakehouse].[Dbo].[tableSilver].transaction_id = tableGold.transaction_id

3.3 Insert new records if they do not exists.

As DW has alot of limitations, I came to use the procedure described below.

This has worked well during my testing with manual runs.

Yesterday, my first pipeline test run was schedulued to 2024-03-14 03:02.

Everything seemd to have gone well, but upon a closer look, the UPDATE part of the procedure had updated 0 rows (should have been about 30k rows). but insert part had worked properly.

I tried to dig into it, couldn't see any issues. I ran the procdure manually through "exec procedure param = 2024-03-14" in the warehouse and then, procedure worked as intended and i began to be very confused..

Theories and tests:

* I thought, maybe it has something to do with the timestamp? but I reviewed the procedure and my thinking is that "@SilveringestionDat DATE" would convert the timestamp parameter to date, so my silverTable should be filtered on date that is >= 2024-03-14 00:00. Another thing is that the "insert" part use exactly same filtering and it worked as expected. To be sure, I made sure that the pipeline sends in a date without time to the procedure.

* I ran the pipeline again in schedule 2024-03-15 03:03 or so. same issue arises.

* I ran the procedure again manually, no issue.

Anyone has any theories?

Can it be that there is a delay from the silverNotebook run updatig the tableSilver? So when i run the procedure and it reads the tableSilver for the update, it hasnt been updated yet, but when it comes to insert it has?

Any other theories?

CREATE PROCEDURE usp_load_silver_to_gold

@SilveringestionDat DATE

AS

BEGIN

----update records---

UPDATE t

SET

t.transaction_id = s.transaction_id,

..........Other columns..........,

t.silver_transformation_timestamp = s.silver_transformation_timestamp,

t.gold_general_transformation_timestamp = getDate()

FROM

[LAKEHOUSE_Silver].[dbo].[tableSilver] s

JOIN gold_table t ON s.transaction_id = t.transaction_id

WHERE s.silver_transformation_timestamp >= @SilveringestionDate;

-----add new data--

INSERT INTO gold_table(

transaction_id,

....other columns...,

silver_transformation_timestamp,

gold_general_transformation_timestamp)

SELECT S.* FROM

(SELECT transaction_id,

. .......other columns........,

silver_transformation_timestamp,

getDate() as gold_general_transformation_timestamp

FROM [LAKEHOUSE_Silver].[dbo].[tableSilver]

WHERE silver_transformation_timestamp >= @SilveringestionDate) S

left join gold_table t on S.transaction_id = t.transaction_id where t.transaction_id is null;

end;

AlexanderPowBI · ‎03-21-2024

Hi,

Well depends on how we view it. After adding the "one minute delay" in my pipeline:

....

notebook updates table_in_lakehouse (on success) >

wait (1 minute) (on success) >

Run stored procedures in data warehouse

.....

The problem has been solved. However this for me suggest some kind of synch issues, where the data warehouse stored procedure was querying a "old" version of the table_in_lakehouse, eventhough my believe is that the pipeline should handle this without the wait activity?

View solution in original post

AlexanderPowBI · ‎03-18-2024

Hello again,

So just for information, I added a one minute "wait" in my pipeline between notebook run and stored procedures. This has now run twice succesfully on schedule, suggesting that the issue was/is due to some kind of synch / timing issues, but I can't be sure.

v-nikhilan-msft · ‎03-19-2024

Hi @AlexanderPowBI
Are you still facing this issue or it got resolved?
Please let me know if you need any further help.

AlexanderPowBI · ‎03-21-2024

Hi,

Well depends on how we view it. After adding the "one minute delay" in my pipeline:

....

notebook updates table_in_lakehouse (on success) >

wait (1 minute) (on success) >

Run stored procedures in data warehouse

.....

The problem has been solved. However this for me suggest some kind of synch issues, where the data warehouse stored procedure was querying a "old" version of the table_in_lakehouse, eventhough my believe is that the pipeline should handle this without the wait activity?

v-nikhilan-msft · ‎03-21-2024

Hi @AlexanderPowBI

Thanks for the details. I have passed on the information to our internal team. I will keep you updated.

Thanks.

AlexanderPowBI · ‎04-02-2024

I have now experienced the same issue again in another strored procedure which is for my audit table. I can quite clearly conclude that the SQL warehouse is quering an old verision of my lakehouse delta tables.

I found this: https://learn.microsoft.com/en-us/fabric/get-started/known-issues/known-issue-618-using-inactive-sql...

Is my issue due to this perhaps? is it known to happen when running pipelines? What is Microsoft doing to solve this?

v-nikhilan-msft · ‎03-16-2024

Hi @AlexanderPowBI
Thanks for using Fabric Community.
At this time, we are reaching out to the internal team to get some help on this. We will update you once we hear back from them.
Thanks