I recently got access to a Databricks cluster. In PBI Desktop I use the Azure Databricks connection type to access this data. Is it possible that Databricks is not primarily intended for importing large tables? But rather intended to work through Direct Query? Compared to Oracle, ODBC or Impala, Databricks is really very slow here.
Does anyone know what could be the reason for this?
Would it be helpful to access the Databricks cluster via ODBC?
Using an enterprise gateway and a corresponding dataflow, the refresh is faster than in PBI Desktop.
The answer is as usual: it depends. It's no secret that the best and fastest data source for PBI is a relational database. But the speed depends on many factors. One of them being the throughput of the network and the quality of the driver. I have no experience with Databricks as the source for PBI, so can't really comment on this any more than I have. I can just add that getting data from a plain csv file is much faster than from other data sources (minus the relational db mentioned). It might also be faster to get data from a parquet file if the data is BIG.