Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
cbailiss
Advocate II
Advocate II

Does Power BI fully support Azure SQL Data Warehouse?

Hello

 

Power BI seems to work in a way that means it cannot get the best out of Azure SQL Data Warehouse - more specifically Power BI seems likely to generate unnecessary data movement between the compute nodes.  I want to outline my use case, check my understanding and see if anyone else has any comments/thoughts…

 

Example:  I have the following (subset of) tables containing web analytics data from an e-Commerce solution:

Session

PageView

PageEvent

(One session has multiple page views, and each page view has multiple events).

 

Session joins to PageView on SessionID

PageView joings to PageEvent on PageViewID

 

In Azure SQL DW, all three tables contain the SessionID and all three tables are Hash distributed on this field.  

This means all the data for a particular session resides within one compute node in the Azure SQL DWH.

This means hand-written analytical queries typically involve no data movement between compute nodes (other than to return the final results) as they are written roughly as follows:

SELECT ...

FROM Session s

INNER JOIN PageView pv

ON s.SessionID = pv.SessionID

INNER JOIN PageEvent pe

ON pv.PageViewID = pe.PageViewID and

pv.SessionID = pe.SessionID

 

When Power BI is connected to Azure DWH via Direct Query, it presumably won't generate the last line in the query above (the additional join criteria on SessionID), since it only supports one field joins.

This looks like it will lead to additional data movement between the compute nodes of an Azure SQL DWH instance.

I have investigated this by running a query (from Management Studio) similar to the above with and without the last line, and reviewing the steps in the query execution from sys.dm_pdw_request_steps.

 

With the “pv.SessionID = pe.SessionID” line:

 

step_index     operation_type           distribution_type      location_type

0              OnOperation              Unspecified            Control

1              PartitionMoveOperation   Unspecified            DMS

2              ReturnOperation          Unspecified            Control

3              OnOperation              Unspecified            Control

 

Without the “pv.SessionID = pe.SessionID” line:

 

step_index     operation_type           distribution_type      location_type

0              RandomIDOperation        Unspecified            Control

1              OnOperation              AllComputeNodes        Compute

2              BroadcastMoveOperation   Unspecified            DMS

3              OnOperation              Unspecified            Control

4              PartitionMoveOperation   Unspecified            DMS

5              OnOperation              AllComputeNodes        Compute

6              ReturnOperation          Unspecified            Control

7              OnOperation              Unspecified            Control

 

This additional join criteria is important since it enforces that the scope of the query is within each control node and so no data movement between nodes is needed until returning results.

 

This seems to make Power BI less useful as a client to generally browse an Azure SQL Data Warehouse.  Of course, it is still possible to write specific queries using the additional join criteria, but then that isn’t using Power BI as a general DWH client.

 

 

4 REPLIES 4
cbailiss
Advocate II
Advocate II

Just in case anyone else is encoutering the problem above, the recently added COMBINEVALUES() function helps in the above scenario.  Details already blogged at:

https://www.sqlbi.com/articles/using-combinevalues-to-optimize-directquery-performance

v-qiuyu-msft
Community Support
Community Support

Hi @cbailiss,

 

What do you mean about "When Power BI is connected to Azure DWH via Direct Query, it presumably won't generate the last line in the query above (the additional join criteria on SessionID), since it only supports one field joins."? Do you mean you write the specific query when connect to the Azure SQL data warehouse? Or you just connect to those three tables, and merge queries in Query Editor?

 

When you write specific query in DirectQuery, you can use SQL Profile to trace query execution. Also you can try to connect to Azure SQL data warehouse in Import mode, then write the query to see if the results is the same as in SSMS.

 

If you just connect to three tables, then you can open Query Editor, use Merge Queries to merge PageView and PageEvent twice. For more information, please refer this article: Shape and combine data in Power BI Desktop.

 

 

Best Regards,
Qiuyun Yu

Community Support Team _ Qiuyun Yu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hello @v-qiuyu-msft

 

Thank you for the reply.

 

>> Do you mean you write the specific query when connect to

>> the Azure SQL data warehouse?

No, not writing a specific query.

 

>> Or you just connect to those three tables, and merge

>> queries in Query Editor?

No, not joining three tables in the query editor.

 

I connect to the Azure DWH via a direct query connection.

Then I select those three tables.

I don't join the tables in the query editor.

The three tables come through to Power BI as three tables in the Power BI Model.

In the diagram view in Power BI I create the relationships between the tables.  (These relationships can only be between one field in each table).

 

>> If you just connect to three tables, then you can open

>> Query Editor, use Merge Queries to merge PageView

>> and PageEvent twice. For more information, please

>> refer this article: Shape and combine data in Power BI Desktop.

 

I don't want to merge the tables in the query editor, because this limits the analysis that can be done in Power BI.

This scenario is about creating a general Power BI model that reflects the way the data is structured in the DWH.

I.e. the Power BI model is acting as a general client to the DWH.

But when used in this way, Power BI desktop cannot generate the additional join criteria highlighted in my first post (since it supports only one field joins) - this means unnecessary data movement steps appear in the query plan in Azure DWH.

 

Merging tables in the Query Editor is OK when performing a very specific analysis (but even then, I think the join is only on one one field if done via the UI, so the same problem happens).  

But merging tables in the query stage doesn't work when wanting to use Power BI as a general DWH client.

 

Also, importing instead of DirectQuery doesn't work at the scale of data in Azure DWH.  I.e. it is far bigger than fits in a Power BI Online model.

 

Thanks

 

Chris

@cbailiss,
My organization is currently trying to experiment if we can use the Azure Dw for power bi with Direct query because of the cheaper price tag. If you can share your insights, that will be great!!

Thanks in Advance!



Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors