Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
arpost
Advocate V
Advocate V

Anyone know how to fix external data type error with Lakehouse files?

Greetings, all. I'm trying to copy data from files loaded into a Lakehouse into a Data Warehouse table and am running into some sort of data type error. The files loaded into the Lakehouse were loaded as CSVs (DelimitedText).

 

Here's a "cleaned up" version of the error I get with my Copy Activity:

 

Column 'TotalItems' of type 'INT' is not compatible with external data type 'Parquet physical type: BYTE_ARRAY, logical type: UTF8'

I was rather suprised as I didn't know/expect that files in a Lakehouse would be typed. The column referenced is a column of integers that can range from 1 to 100+.

 

Anyone know how to resolve this? I'm not specifying any data types when loading into the Lakehouse, and I'd REALLY like to get rid of this error.

19 REPLIES 19
v-cboorla-msft
Community Support
Community Support

Hi @arpost 

 

Welcome to Microsoft Fabric Community and thanks for posting your question here.

 

As I understand that you are trying to copy data from files loaded into a Lakehouse into a Data Warehouse table and you are facing some sort of data type error.

 

When I tried to Repro this scenario using csv files I didn't get any error. I was successful in loading the file into a table in warehouse. I have tried the repro with multiple csv files yet there was no error. 

vcboorlamsft_0-1697802834643.png

 

Can you please share the screenshot of the copy activity which you are using?

 

Thanks.

 

 

Hi, @v-cboorla-msft. I am wondering if the issue is that Parquet type is being interpreted by Fabric as a BYTE_ARRAY for some files because the files have a value of 1 for the TotalItems column. So while File A's TotalItems column may have values of 1, 2, 5, 10, 25, etc. File B's TotalItems column has 1.

 

Here's the Copy Activity Source/Destination that loads the files into the Lakehouse:

 

arpost_1-1697808098502.png

 

And here's the Copy Activity that's failing when trying to load into Data Warehouse:

 

arpost_3-1697808393232.png

 

Hi @arpost 

 

Apologies for the delay in response.

Thank you for the information that you have provided.

 

I can see in your screenshot that you're using an existing file to load data into the warehouse. If the schema doesn't match the existing file, we can encounter the same error. I attempted to replicate the same situation with my data and encountered a similar issue.

 

vcboorlamsft_0-1698324110776.png

 

It is recommended to always validate the schema of the source table before using the 'auto create table' option to ensure that the table is created with the correct data types. Try loading the data using the 'auto create table' option. This way, if it is successful, it can help confirm whether the issue is related to the schema.

 

When I tried using the exact file data with an existing table, it copied successfully for me.

 

vcboorlamsft_0-1698324591831.png

 

I hope this information is helpful. Please do let us know if you have any questions.

Any thoughts, @v-cboorla-msft?

Hi @arpost 

 

Apologies for the delay in response.

Sorry for the inconvenience that you are facing.

I have reached the internal team for help on this. I will update you once I hear from them.
Appreciate your patience.

Hi @arpost 

 

When you say "When loaded into the Lakehouse as a file, FileA.Value is being typed as an Integer while FileB.Value is being typed as a Byte Array". Where are you verifying that? In the lakehouse?

 

Could you please provide me the details so that I can understand the issue better.

 

Thanks.

That's what the error message (see image you posted above) in ADF states. It notes that the external data type (i.e., data in Lakehouse) is of a particular Parquet physical type (i.e., Byte Array).

 

Column 'TotalItems' of type 'INT' is not compatible with external data type 'Parquet physical type: BYTE_ARRAY, logical type: UTF8'

 

That typing wasn't something I ever set up, and there isn't a way to set that up when copying files into the Lakehouse initially before trying to load to the DW.

Hi @arpost 

 

Here is an update from the internal team:

 

"There's unique mapping for the combination of Parquet physical type & logical type to .NET type.

 

Check out this online tool to read Parquet file content and inspect schema & metadata with an intuitive user experience." 

 

Please let us know if you need any help.

 

Thanks.

Could you clarify this a bit further, @v-cboorla-msft? Is this purely informational? The issue is Fabric's storing the files in a Lakehouse with types that can't be converted for use in a DW, and there's nothing I as the user can do to solve for that (to my knowledge) because the Parquet typing is all done by Fabric under the hood.

Hi @arpost 

 

Can you please provide a sample parquet file for us to repro the issue? We would like to understand why there is an issue?

You can send us this information through email to AzCommunity[at]Microsoft[dot]com with the below details,

Subject of the email: ATTN: PRADEEP - Thread title

 

Thanks.

Any update on this, @v-cboorla-msft?

@v-cboorla-msft, I emailed the address you provided (AzCommunity@Microsoft.com) and got the following back:

 

Good day!

 

I am one of the moderators for Azure Community Connect DL. Post reviewing your email, we couldn’t find active thread details. Please confirm if there is any active thread posted in Microsoft Q&A platform : https://learn.microsoft.com/en-us/answers/index.html ?

 

If yes, share us the URL. Otherwise, request you to open a new thread on Q&A platform for your query/issue so that one of the community experts will reach you shortly for further assistance.

Hi @arpost 

 

We haven’t heard from you on the last response and was just checking back to see if you have a resolution yet.
In case if you have any resolution please do share that same with the community as it can be helpful to others.

Thanks

@v-cboorla-msft, any update on this?

I just emailed you some sample files. Both CSVs get uploaded to a Lakehouse as files. We then need to copy them into a Data Warehouse with a table that has an INT column using a Data Pipeline. ColumnB in the CSV files should be mapped to the INT column in the DW.

arpost_1-1701709281048.png

 

 

@v-cboorla-msft, appreciate the reply. Glad to see you were able to replicate the issue. The destination table in the DW is in the desired schema; the issue is Fabric is apparently identifying the Parquet data type automatically behind the scenes, which makes it impossible to load files from a Lakehouse into a DW because they"should" have the same data types but are being read into the Lakehouse as different Parquet file types.

 

So if File A looks like this:

DateValue
1/1/202325
1/2/202350

 

But File B looks like this:

DateValue
2/1/20231
2/2/20231

 

When loaded into the Lakehouse as a file, FileA.Value is being typed as an Integer while FileB.Value is being typed as a Byte Array. But they are the same "schema" and should be treated as such by Fabric.

 

Is there a solution to fix this behavior? Otherwise, it really doesn't make sense to load files into the Lakehouse.

Hello, old post but im facing a similar issue. Did you get a fix?

@AlexanderPowBI, nope. And I've reached out to @v-cboorla-msft for an update but haven't heard back. I also haven't done any recent testing as I moved away from this solution due to this bug.

 

One POSSIBLE solution might be to use a staging table in your Data Warehouse where all the columns have the datatype VARCHAR. This should mitigate any data type mismatches between lakehouse and warehouse. The bug still needs to be fixed, but this is a workaround.

Thank you for your reply. I have found mulitple services to fail due to similar issue, ADF, dataflow gen2 etc. Only workaround I have found so far is to load into pyspark DF and retype the columns before saving.

 

I have tried to load files into a table with VARCHAR in the column where the issue occurs, but no luck unfortunately.  

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayFBCUpdateCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.