Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
Housden996
Helper I
Helper I

Ingesting 1.5TB of data

Hi 

 

Not necicerally a pipeline question but what would be the best way to ingest a 1.5tb db into Fabric?

I feel like a notebook into a KQL DB would be most optermised. I am going to run through tutorials for the likes of the taxi data but I am wondering if any one has had any experiance importing this scale of data into Fabric.

 

Thanks

1 ACCEPTED SOLUTION
v-nikhilan-msft
Community Support
Community Support

Hi @Housden996 
Thanks for using Fabric Community.
Ingesting large-scale data into Microsoft Fabric can be achieved through several methods, each with its own strengths depending on your specific use case.

Here are some options:
COPY (Transact-SQL): This method offers flexible, high-throughput data ingestion from an external Azure storage account. You can use the COPY statement as part of your existing ETL/ELT logic in Transact-SQL code.

Data Pipelines: Pipelines offer a code-free or low-code experience for data ingestion. Using pipelines, you can orchestrate robust workflows for a full Extract, Transform, Load (ETL) experience that includes activities to help prepare the destination environment, run custom Transact-SQL statements, perform lookups, or copy data from a source to a destination.

Dataflows: An alternative to pipelines, dataflows enable easy data preparation, cleaning, and transformation using a code-free experience.

Cross-Warehouse Ingestion: Data ingestion from workspace sources is also possible. This scenario might be required when there’s the need to create a new table with a subset of a different table, or as a result of joining different tables in the warehouse and in the lakehouse.

For large-scale data ingestion, you might want to consider using data pipelines or the COPY statement for the highest data ingestion throughput possible. You can also use Dataflow Gen2 . Dataflows provide a low-code interface for ingesting data from hundreds of data sources, transforming your data using 300+ data transformations. 

For more information refer to these links:
https://www.youtube.com/watch?v=sXZkrFtN5oc
Ingesting data into the warehouse - Microsoft Fabric | Microsoft Learn
https://www.linkedin.com/pulse/mastering-data-loading-microsoft-fabric-comprehensive-machado-ph-d-/
https://community.fabric.microsoft.com/t5/Desktop/Need-help-on-handling-the-large-volume-of-data/td-...
https://www.linkedin.com/pulse/data-ingestion-patterns-microsoft-fabric-sankha-chakraborty-zei4c/

Hope this helps. Please let me know if you have any further questions.

View solution in original post

6 REPLIES 6
jwinchell40
Advocate I
Advocate I

@Housden996 

  1. Where is the database hosted and what type is it? 
  2. How many tables would you be ingesting?
  3. Would you be looking to sunset this DB or are you just trying to clone it?

If it is in Azure, the new Mirroring integration may simplify things for you:  Mirroring - Microsoft Fabric | Microsoft Learn

1. This is a MYSQL Server hosted on the client side

2. Unsure about how many tables they didnt let me know

3. Clone for the time being, could be sunsetted once migrated over to MongoDB

 

I dont think unfortuntly I can mirror this type of DB.

v-nikhilan-msft
Community Support
Community Support

Hi @Housden996 
Thanks for using Fabric Community.
Ingesting large-scale data into Microsoft Fabric can be achieved through several methods, each with its own strengths depending on your specific use case.

Here are some options:
COPY (Transact-SQL): This method offers flexible, high-throughput data ingestion from an external Azure storage account. You can use the COPY statement as part of your existing ETL/ELT logic in Transact-SQL code.

Data Pipelines: Pipelines offer a code-free or low-code experience for data ingestion. Using pipelines, you can orchestrate robust workflows for a full Extract, Transform, Load (ETL) experience that includes activities to help prepare the destination environment, run custom Transact-SQL statements, perform lookups, or copy data from a source to a destination.

Dataflows: An alternative to pipelines, dataflows enable easy data preparation, cleaning, and transformation using a code-free experience.

Cross-Warehouse Ingestion: Data ingestion from workspace sources is also possible. This scenario might be required when there’s the need to create a new table with a subset of a different table, or as a result of joining different tables in the warehouse and in the lakehouse.

For large-scale data ingestion, you might want to consider using data pipelines or the COPY statement for the highest data ingestion throughput possible. You can also use Dataflow Gen2 . Dataflows provide a low-code interface for ingesting data from hundreds of data sources, transforming your data using 300+ data transformations. 

For more information refer to these links:
https://www.youtube.com/watch?v=sXZkrFtN5oc
Ingesting data into the warehouse - Microsoft Fabric | Microsoft Learn
https://www.linkedin.com/pulse/mastering-data-loading-microsoft-fabric-comprehensive-machado-ph-d-/
https://community.fabric.microsoft.com/t5/Desktop/Need-help-on-handling-the-large-volume-of-data/td-...
https://www.linkedin.com/pulse/data-ingestion-patterns-microsoft-fabric-sankha-chakraborty-zei4c/

Hope this helps. Please let me know if you have any further questions.

Hi @Housden996 
We haven’t heard from you on the last response and was just checking back to see if your query has been resolved. Otherwise, will respond back with the more details and we will try to help.
Thanks

Hi @Housden996 
We haven’t heard from you on the last response and was just checking back to see if your query has been resolved. Otherwise, will respond back with the more details and we will try to help.
Thanks

Thanks and appolagies for the delay.

 

I will be going down the cloning route, how the potential client curerntly uses Tableau Server is a bit strange as they dont really use it for reporting more just for working with the data so the likes of Notebooks and the ML side of this is more usefull to them.

 

I have looked into the copy feature and this seems more cost effective for the client!

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayFBCUpdateCarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.