Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Vinodh247
Regular Visitor

fabric shortcuts

In shortcuts, where does the processing or loading of the file from the source is done in fabric? does it use the fabric compute every time to copy and load the data from the file? if so, network latency might be a bottleneck. Can anyone shed some light on the internal workings of fabric shortcuts? there doesn't seem to be proper documentation explaining the same.

1 ACCEPTED SOLUTION

Hi @Vinodh247 ,

We have an update from internal team -



We are working on documentation focusing primarily on performance and other considerations related to shortcuts. This should be out in a month or so.

 

Let's discuss goal of shortcuts before discussing costs. It's designed to be fast, secure, deliver no-code alternative to building complex pipelines for sourcing data from external storage. It's designed to offset development and operational costs of data pipelines. Think of it this way, once created, you can quickly access data and you can also share it with other users/workspaces. Since it has properties of symbolic link, it will always reflect current state of data (as it is on external storage). It's also not designed to be a replacement for pipelines as there'll be scenarios where pipelines are required.

 

Where does the processing or loading of the file from the source is done in Fabric?

If you are reading data from a shortcut, then IO sits with external storage be it ADLS g2, S3 or GCS. Compute (or CUs) will be dictated by type of operation you are doing on Fabric. This will sit with Fabric engine you are using to interact with a shortcut. 

 

Shortcut itself does not use any compute. It also does not copy data. It performs in-place reads/writes on remote storage. As called out in our documentation, one can consider them as symbolic links which means that it relies IO performance of remote/external storage where data is being read from.

 

In terms of compute cost, when you read a data file from let's say ADLS g2 shortcut using Fabric Spark, the CUs cost is for whatever analysis you are trying to do with data. The IO required is handled by remote storage. The alternative in this scenario would be to physically store data on OneLake which involves CUs (for compute and also storage transactions) and raw storage cost (to physically store data on OneLake). 

 

Does it use the fabric compute every time to copy and load the data from the file?

No, shortcut does not physically copy or move data to OneLake. A shortcut enables Fabric to perform in-place IO on file(s) wherever they reside.

 

As for the last comment around network latency might be bottleneck - can you please clarify bottleneck in what sense? If you are concerned about latency. Yes, laws of physics play a role, however if you are reading a file from ADLS g2 in the same region as Fabric capacity, the latency should be similar to latency of reading the same file stored on OneLake. If you are reading a file via a shortcut on the other side of world, yes there will be latency because shortcut is carrying out in-place read/write operations. In some cases, customers are OK to trade slight performance in return for ease of use; avoiding pipeline costs etc.. Lastly, based on our work with customers, performance is subjective as what's "fast" for a customer might be considered "slow" for another. If you are shortcutting files in the same region as Fabric capacity, you shouldn't be concerned about latency. For cross-region, please double-click on the scenario and understand what latency targets customer is trying to achieve and then one can go deeper and discuss strategies around shortcutting data across regions.

 

Continuing with performance theme - at the end of the day you'll be consuming a shortcut with one of the Fabric engines. There are various performance enhancing strategies in place such as Intelligent Cache which would reduce latency. Intelligent cache in Microsoft Fabric - Microsoft Fabric | Microsoft Learn




Hope this is helpful. Please let me know incase of further queries.

View solution in original post

4 REPLIES 4
v-gchenna-msft
Community Support
Community Support

Hello @Vinodh247 ,

Thanks for using Fabric Community.
At this time, we are reaching out to the internal team to get some help on this.
We will update you once we hear back from them.

Hi @Vinodh247 ,

We have an update from internal team -



We are working on documentation focusing primarily on performance and other considerations related to shortcuts. This should be out in a month or so.

 

Let's discuss goal of shortcuts before discussing costs. It's designed to be fast, secure, deliver no-code alternative to building complex pipelines for sourcing data from external storage. It's designed to offset development and operational costs of data pipelines. Think of it this way, once created, you can quickly access data and you can also share it with other users/workspaces. Since it has properties of symbolic link, it will always reflect current state of data (as it is on external storage). It's also not designed to be a replacement for pipelines as there'll be scenarios where pipelines are required.

 

Where does the processing or loading of the file from the source is done in Fabric?

If you are reading data from a shortcut, then IO sits with external storage be it ADLS g2, S3 or GCS. Compute (or CUs) will be dictated by type of operation you are doing on Fabric. This will sit with Fabric engine you are using to interact with a shortcut. 

 

Shortcut itself does not use any compute. It also does not copy data. It performs in-place reads/writes on remote storage. As called out in our documentation, one can consider them as symbolic links which means that it relies IO performance of remote/external storage where data is being read from.

 

In terms of compute cost, when you read a data file from let's say ADLS g2 shortcut using Fabric Spark, the CUs cost is for whatever analysis you are trying to do with data. The IO required is handled by remote storage. The alternative in this scenario would be to physically store data on OneLake which involves CUs (for compute and also storage transactions) and raw storage cost (to physically store data on OneLake). 

 

Does it use the fabric compute every time to copy and load the data from the file?

No, shortcut does not physically copy or move data to OneLake. A shortcut enables Fabric to perform in-place IO on file(s) wherever they reside.

 

As for the last comment around network latency might be bottleneck - can you please clarify bottleneck in what sense? If you are concerned about latency. Yes, laws of physics play a role, however if you are reading a file from ADLS g2 in the same region as Fabric capacity, the latency should be similar to latency of reading the same file stored on OneLake. If you are reading a file via a shortcut on the other side of world, yes there will be latency because shortcut is carrying out in-place read/write operations. In some cases, customers are OK to trade slight performance in return for ease of use; avoiding pipeline costs etc.. Lastly, based on our work with customers, performance is subjective as what's "fast" for a customer might be considered "slow" for another. If you are shortcutting files in the same region as Fabric capacity, you shouldn't be concerned about latency. For cross-region, please double-click on the scenario and understand what latency targets customer is trying to achieve and then one can go deeper and discuss strategies around shortcutting data across regions.

 

Continuing with performance theme - at the end of the day you'll be consuming a shortcut with one of the Fabric engines. There are various performance enhancing strategies in place such as Intelligent Cache which would reduce latency. Intelligent cache in Microsoft Fabric - Microsoft Fabric | Microsoft Learn




Hope this is helpful. Please let me know incase of further queries.

Thanks for the quick response. These internals should be part of official documentation so one can be clear and confident with the client when promoting Fabric as a proposed solution. A lot of questions coming up regarding Fabric including the basic workings, there is no point in wasting time going back and forth with the community or Fabric support for rudimentary details. Hope MS understands the gravity of what questions the users raise/face instead of focusing only on the sales of the product.

Hi @Vinodh247 ,

Glad to know that your query was answered. Team will take this as a feedback. Please continue using Fabric Community for your further queries.

Helpful resources

Announcements
Expanding the Synapse Forums

New forum boards available in Synapse

Ask questions in Data Engineering, Data Science, Data Warehouse and General Discussion.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

April Fabric Update Carousel

Fabric Monthly Update - April 2024

Check out the April 2024 Fabric update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Kudoed Authors