Solved: fabric shortcuts

Vinodh247 · ‎02-23-2024

In shortcuts, where does the processing or loading of the file from the source is done in fabric? does it use the fabric compute every time to copy and load the data from the file? if so, network latency might be a bottleneck. Can anyone shed some light on the internal workings of fabric shortcuts? there doesn't seem to be proper documentation explaining the same.

v-gchenna-msft · ‎02-23-2024

Hi @Vinodh247 ,

We have an update from internal team -

We are working on documentation focusing primarily on performance and other considerations related to shortcuts. This should be out in a month or so.

Let's discuss goal of shortcuts before discussing costs. It's designed to be fast, secure, deliver no-code alternative to building complex pipelines for sourcing data from external storage. It's designed to offset development and operational costs of data pipelines. Think of it this way, once created, you can quickly access data and you can also share it with other users/workspaces. Since it has properties of symbolic link, it will always reflect current state of data (as it is on external storage). It's also not designed to be a replacement for pipelines as there'll be scenarios where pipelines are required.

Where does the processing or loading of the file from the source is done in Fabric?

If you are reading data from a shortcut, then IO sits with external storage be it ADLS g2, S3 or GCS. Compute (or CUs) will be dictated by type of operation you are doing on Fabric. This will sit with Fabric engine you are using to interact with a shortcut.

Shortcut itself does not use any compute. It also does not copy data. It performs in-place reads/writes on remote storage. As called out in our documentation, one can consider them as symbolic links which means that it relies IO performance of remote/external storage where data is being read from.

In terms of compute cost, when you read a data file from let's say ADLS g2 shortcut using Fabric Spark, the CUs cost is for whatever analysis you are trying to do with data. The IO required is handled by remote storage. The alternative in this scenario would be to physically store data on OneLake which involves CUs (for compute and also storage transactions) and raw storage cost (to physically store data on OneLake).

Does it use the fabric compute every time to copy and load the data from the file?

No, shortcut does not physically copy or move data to OneLake. A shortcut enables Fabric to perform in-place IO on file(s) wherever they reside.

As for the last comment around network latency might be bottleneck - can you please clarify bottleneck in what sense? If you are concerned about latency. Yes, laws of physics play a role, however if you are reading a file from ADLS g2 in the same region as Fabric capacity, the latency should be similar to latency of reading the same file stored on OneLake. If you are reading a file via a shortcut on the other side of world, yes there will be latency because shortcut is carrying out in-place read/write operations. In some cases, customers are OK to trade slight performance in return for ease of use; avoiding pipeline costs etc.. Lastly, based on our work with customers, performance is subjective as what's "fast" for a customer might be considered "slow" for another. If you are shortcutting files in the same region as Fabric capacity, you shouldn't be concerned about latency. For cross-region, please double-click on the scenario and understand what latency targets customer is trying to achieve and then one can go deeper and discuss strategies around shortcutting data across regions.

Continuing with performance theme - at the end of the day you'll be consuming a shortcut with one of the Fabric engines. There are various performance enhancing strategies in place such as Intelligent Cache which would reduce latency. Intelligent cache in Microsoft Fabric - Microsoft Fabric | Microsoft Learn

Hope this is helpful. Please let me know incase of further queries.

View solution in original post

v-gchenna-msft · ‎02-23-2024

Hello @Vinodh247 ,

Thanks for using Fabric Community.
At this time, we are reaching out to the internal team to get some help on this.
We will update you once we hear back from them.

v-gchenna-msft · ‎02-23-2024

Hi @Vinodh247 ,

We have an update from internal team -

We are working on documentation focusing primarily on performance and other considerations related to shortcuts. This should be out in a month or so.

Let's discuss goal of shortcuts before discussing costs. It's designed to be fast, secure, deliver no-code alternative to building complex pipelines for sourcing data from external storage. It's designed to offset development and operational costs of data pipelines. Think of it this way, once created, you can quickly access data and you can also share it with other users/workspaces. Since it has properties of symbolic link, it will always reflect current state of data (as it is on external storage). It's also not designed to be a replacement for pipelines as there'll be scenarios where pipelines are required.

Where does the processing or loading of the file from the source is done in Fabric?

If you are reading data from a shortcut, then IO sits with external storage be it ADLS g2, S3 or GCS. Compute (or CUs) will be dictated by type of operation you are doing on Fabric. This will sit with Fabric engine you are using to interact with a shortcut.

Shortcut itself does not use any compute. It also does not copy data. It performs in-place reads/writes on remote storage. As called out in our documentation, one can consider them as symbolic links which means that it relies IO performance of remote/external storage where data is being read from.

In terms of compute cost, when you read a data file from let's say ADLS g2 shortcut using Fabric Spark, the CUs cost is for whatever analysis you are trying to do with data. The IO required is handled by remote storage. The alternative in this scenario would be to physically store data on OneLake which involves CUs (for compute and also storage transactions) and raw storage cost (to physically store data on OneLake).

Does it use the fabric compute every time to copy and load the data from the file?

No, shortcut does not physically copy or move data to OneLake. A shortcut enables Fabric to perform in-place IO on file(s) wherever they reside.

As for the last comment around network latency might be bottleneck - can you please clarify bottleneck in what sense? If you are concerned about latency. Yes, laws of physics play a role, however if you are reading a file from ADLS g2 in the same region as Fabric capacity, the latency should be similar to latency of reading the same file stored on OneLake. If you are reading a file via a shortcut on the other side of world, yes there will be latency because shortcut is carrying out in-place read/write operations. In some cases, customers are OK to trade slight performance in return for ease of use; avoiding pipeline costs etc.. Lastly, based on our work with customers, performance is subjective as what's "fast" for a customer might be considered "slow" for another. If you are shortcutting files in the same region as Fabric capacity, you shouldn't be concerned about latency. For cross-region, please double-click on the scenario and understand what latency targets customer is trying to achieve and then one can go deeper and discuss strategies around shortcutting data across regions.

Continuing with performance theme - at the end of the day you'll be consuming a shortcut with one of the Fabric engines. There are various performance enhancing strategies in place such as Intelligent Cache which would reduce latency. Intelligent cache in Microsoft Fabric - Microsoft Fabric | Microsoft Learn

Hope this is helpful. Please let me know incase of further queries.

Vinodh247 · ‎02-24-2024

Thanks for the quick response. These internals should be part of official documentation so one can be clear and confident with the client when promoting Fabric as a proposed solution. A lot of questions coming up regarding Fabric including the basic workings, there is no point in wasting time going back and forth with the community or Fabric support for rudimentary details. Hope MS understands the gravity of what questions the users raise/face instead of focusing only on the sales of the product.

v-gchenna-msft · ‎02-24-2024

Hi @Vinodh247 ,

Glad to know that your query was answered. Team will take this as a feedback. Please continue using Fabric Community for your further queries.

fabric shortcuts

Helpful resources

New forum boards available in Synapse

Fabric certifications survey

Fabric Monthly Update - April 2024

Fabric Community Update - April 2024