Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
tikka
Frequent Visitor

Reading dataset from Azure Databricks

Hi,

 

We have a Databricks (Premium) environment set up in Azure. Databricks is also set up under a custom Azure Vnet. We are reading prepared datasets from PowerBI using the Databricks cluster's JDBC/ODBC APIs according to this article:

 

https://docs.azuredatabricks.net/user-guide/bi/power-bi.html

 

The first concern is, that performance seems to be extremely slow. The second concern is, that publishing the report to PowerBI Premium wants to force a gateway, instead of connecting to the cluster directly.

 

Any thoughts on the above issues?

 

BR,

Tuomas

8 REPLIES 8
nagau
New Member

Did we find solution for this ? i am facing same issue, its forcing to use Gateway.

tikka
Frequent Visitor

Hi,

 

Unfortunately we have not found a solution yet. Both problems above relate to the fact that the Spark connector seems to iterate through all databases and tables in Databricks, and ultimately times out:

 

- dataset takes really long to refresh in PowerBI Desktop

- gateway cannot be created in PowerBI Service

 

For the later one, there is now an option to skip the connection test, which allows you to create the gateway, but does not resolve the performance issue.

 

As a workaround, we are using an Azure SQLServer to save the processed results from Databricks. Now connecting to Azure SQLServer from PowerBI works as expected, but this is an additional step though.

 

BR,

Tuomas

We are also obersving that Direct Query connection mode to Databricks is very slow. Probably not a good solution at this moment unless performance improves drastically.

v-cherch-msft
Employee
Employee

Hi @tikka 

You may use DAX Studio to check the performance.Power BI and Databricks are integrated and everything is in the cloud and you don’t need gateways.For further,please refer to this document.Below are some articles for your reference.

https://azure.microsoft.com/es-es/blog/structured-streaming-with-databricks-into-power-bi-cosmos-db/

https://medium.com/@mauridb/powerbi-and-azure-databricks-193e3dc567a

Regards,

Community Support Team _ Cherie Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi,

 

Thank you for your response.

 

Unfortunately, we haven't found a solution to the original problem. The thing is that the connection doesn't fail, it just takes a very long time to get a response, like many hours. I'm thinking the problem might relate to the fact that we are running Databricks inside a Vnet. I did a comparison by creating another Databricks workspace, this time without the Vnet, and added a few sample tables. Now the connection is ok, although still quite slow, for example a few minutes to load a table with only 4 columns and 2 rows. Otherwise the network conditions between us and Azure seem ok; no problems with uploads/downloads to storage.

 

We are still also facing the issue where we publish to PowerBI. It seems to force a gateway on PowerBI Premium, although for no apparent reason. This setting can't even be changed in the dataset settings, to remove the gateway usage. When we try to refresh the dataset, we see the following error:

 

Refresh failed due to gateway configuration issues.

This dataset requires a properly configured gateway in order to refresh.

If you're using personal mode, please make sure your gateway is online.

If you're using enterprise mode, please make sure you've added the following data sources in the Gateway Management Portal

Extension { extensionDataSourceKind : "Spark" , extensionDataSourcePath : "https://westeurope.azuredatabricks.net:443/sql/protocolv1/o/***/***" }

Please try again later or contact support. If you contact support, please provide these details.

To my understanding also, this kind of connections doesn't need any kind of gateway.

tikka
Frequent Visitor

Another thing we noticed is that the publishing to PowerBI Premium forces a scheduled refresh request into the future, exactly one hour from the time of publishing.

tikka
Frequent Visitor

Also, here is the data source from Advanced Editor (ids masked with "*"):

 

let
    Source = ApacheSpark.Tables("https://westeurope.azuredatabricks.net:443/sql/protocolv1/o/***/***", 2, [BatchSize=null]),
    default_test_csv = Source{[Schema="default",Item="test_csv"]}[Data]
in
    default_test_csv

Hi @tikka 

If you have Pro account you could try to open a support ticket. If you have a Pro account it is free. Go to https://support.powerbi.com. Scroll down and click "CREATE SUPPORT TICKET".

Regards,

Community Support Team _ Cherie Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.