Solved: Load partitioned parquet table in ADLSg2 into Lake...

amaaiia · ‎05-02-2024

How can I copy/clone a parquet table stored in ADLSg2 into a managed table in a Lakehouse keeping partitions? I've created a shortcut below Files in my lakehouse, where I can see my tables in parquet format, partitioned in some folders. I click on ··· > Load to tables > New Table > Including Subfolder and then I create new table bellow Tables folder. The issue is that my table bellow Files is partitioned, and the new table bellow Tables folder isn't, all the data is in one parquet file not partitioned. Is there any way I can keep the partitions in the new Delta table?

I guess I can use a notebook to read table in Files and write it as Delta in Tables. But I'd like to know if there is a more simple way to achieve this using Fabric features. Or at least, maybe there's a python script or whatever that reads each table one by one and converts it into Delta bellow Tables folder keeping partitions?

Thanks.

amaaiia · ‎05-09-2024

So, I guess I have 2 options:

- Read the table with a notebook from ADLSg2 and write it partitioned into Fabric lakehouse

- Read the table with Data Pieline COPY activity from ADLSg2 connection and partition it in Destination (Fabric Lakehouse) with advanced options

View solution in original post

v-gchenna-msft · ‎05-02-2024

Hi @amaaiia ,

Thanks for using Fabric Community.
As I understand you want to create a Delta table that maintains the partition structure of the original parquet table.

At this time, we are reaching out to the internal team to get some help on this .
We will update you once we hear back from them.

v-gchenna-msft · ‎05-09-2024

Hi @amaaiia ,

There isn't a direct way in Fabric to clone a partitioned Parquet table to a managed Delta table in the lakehouse while preserving partitions. You can simply read the file and save the file with option partitionBy.

Docs to refer -
PySpark partitionBy() - Write to Disk Example - Spark By {Examples} (sparkbyexamples.com)

Hope this is helpful.

amaaiia · ‎05-09-2024

So, I guess I have 2 options:

- Read the table with a notebook from ADLSg2 and write it partitioned into Fabric lakehouse

- Read the table with Data Pieline COPY activity from ADLSg2 connection and partition it in Destination (Fabric Lakehouse) with advanced options

v-gchenna-msft · ‎05-09-2024

Hi @amaaiia ,

Yes you are right.

Load partitioned parquet table in ADLSg2 into Lakehouse

Helpful resources

New forum boards available in Synapse

Fabric certifications survey

Fabric Monthly Update - April 2024

Fabric Community Update - April 2024