Solved: Re: Direct Query with Snowflake sequential handlin...

Remo_ · ‎09-20-2022

Hi folks,

For our customer we have build a Snowflake on Azure datawarehouse. Since the customer has a strict data policy we would like to use the Snowflake dynamic data masking based on user account (https://docs.snowflake.com/en/user-guide/security-column-ddm-use.html). Therefore we have build our PBI datasets with Direct Query and SSO applied to ensure that a user will only retrieve the data that he/she is allowed to see.

This all works fine, but performance of the reports is still an issue. We already tried to play with the following settings in our dataset (as described here https://learn.microsoft.com/en-us/power-bi/create-reports/desktop-evaluation-configuration😞

- Increase maximum number of simultanous evaluations (under parallel loading of tables)

- Increase maximum number of concurrent jobs

- Set active number of data source connections to 30

- Put the dataset in an premium workspace to make sure we can use 30 data source connections

However, when testing queries in our report, we still see that a visual that sends out multiple queries is being handles sequentially instead of parallel. In the below example we have a visual that sends out 13 queries (regardless of whether it is a good idea to have this large number of queries sends to the source by 1 visual).

Dax Studio trace on Dataset in PPU workspace

We see that handling of queries is being done sequentially:

I would suspect that having all the correct settings, part of these queries would be handled in parallel. Does anyone know why this is not the case?

Snowflake

Looking in Snowlfake we see the same pattern (take notice! this is not the exact same run, so the numbers in Snowflake are not 1 on 1 the same as in DAX studio). Queries do not take to long to anser by Snowflake, but since they are all evaluated sequentially, it takes a long time to get an answer for the visual.

Horizontal fusion

As of yesterday, microsoft announced a new preview feature: Horizontal Fusion. Which should be able to merge queries and sent out fewer queries to the source. (https://powerbi.microsoft.com/en-us/blog/announcing-horizontal-fusion-a-query-performance-optimizati...). Obviously, this would be great as it would not send out 13 queries (in this example), but maybe only 3 or 4. However, this does not change the fact that these queries shall be handled sequentially. Is there anything we can do to make this happen? Or can the formula engine which compiles all the answers from the underlying queries to 1 output only handle one query at the time?

All thoughts are greatly appreciated.

Regards,

Remo

oscarrudnas · ‎10-24-2022

This article by SQLBI actually describes this issue.

The Formula engine does not support parallelism and will send the queries sequentially. I am not sure why this is the case, though but at least I can start to focus on other problems since this is really dax motor-behaviour.

https://www.sqlbi.com/articles/formula-engine-and-storage-engine-in-dax/

Because the formula engine is single-threaded, any operation executed in the formula engine uses just one thread and one core, no matter how many cores are available. The formula engine sends requests to the storage engine sequentially, one query at a time. A certain degree of parallelism is available only within each request to the storage engine, which has a different architecture and can take advantage of the multiple cores available.

View solution in original post

oscarrudnas · ‎10-24-2022

This article by SQLBI actually describes this issue.

The Formula engine does not support parallelism and will send the queries sequentially. I am not sure why this is the case, though but at least I can start to focus on other problems since this is really dax motor-behaviour.

https://www.sqlbi.com/articles/formula-engine-and-storage-engine-in-dax/

Because the formula engine is single-threaded, any operation executed in the formula engine uses just one thread and one core, no matter how many cores are available. The formula engine sends requests to the storage engine sequentially, one query at a time. A certain degree of parallelism is available only within each request to the storage engine, which has a different architecture and can take advantage of the multiple cores available.

oscarrudnas · ‎10-21-2022

I have exactly the same issue with Power BI Direct Query against Azure Databricks.

I have contacted Microsoft support, but have not yet got any solution to this. Do you have any update to this topic that could help with the issue?

Remo_ · ‎10-21-2022

No I haven't received any support from Microsoft other than the comment below.

v-cgao-msft · ‎09-21-2022

Hi @Remo_ ,

Regardless of the improvements, the performance of the data source should always be the primary consideration when using DirectQuery. If the data source is slow, using DirectQuery on that data source will still not be feasible.

'Horizontal Fusion' is a great feature, according to the blog, you need to enable preview on desktop and republish to service for it to work, did you do that? If it still doesn't work, it is recommended to comment below the blog to ask PM if the feature supports Snowflake data source. Finally, it's a preview feature, and I recommend that users consider using the preview feature carefully in a production environment.

Behavior:

Using DirectQuery in Power BI

Premium capacity enables query caching to speed up report opening:
Query caching in Power BI Premium

Best Regards,
Gao
Community Support Team

If there is any post helps, then please consider Accept it as the solution to help the other members find it more quickly. If I misunderstand your needs or you still have problems on it, please feel free to let us know. Thanks a lot!

How to get your questions answered quickly -- How to provide sample data

Remo_ · ‎09-21-2022

Hi Gao @v-cgao-msft ,

Thanks for you reaction, however this is unfortunately not the answer I was looking for.

1. Regardless of whether the data source is fast or slow, I would want to know if there is a way to parrellize the handling of queries by the Power BI engine, because according to the microsoft docs that I refer to above, changing the settings as I did should help.

2. I did change the preview setting to have HorizontalFusion, and published it to a premium workspace, but no luck so far.

3. Regarding query caching, this is only valid for import datasets. My question is about DirectQuery mode.

I am interested to know if this feature supports Snowflake. How do I notify the PM about this thread?

Regards,

Remo

Direct Query with Snowflake sequential handling of queries

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024