Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

With incremental refresh policy, why does Power BI "SELECT TOP 1000" thousands of times?

I have a large dataset, and many of the tables have incremental refresh policies. The data source is Amazon Redshift, and I'm using the Amazon Redshift connector.

 

I just modified the dataset, and then published a new version to a workspace in my Premium Capacity. Then I refreshed the dataset, which triggered the initial "full refresh." During this full refresh, I monitored the queries that were being executed against my Amazon Redshift data source.

 

I noticed something very strange: For each table X that has an incremental refresh, many times Power BI executes a simple query like "SELECT TOP 1000 ... FROM X" (with no WHERE clause).

 

For some tables, that identical query was executed a few dozen times. For other tables, that identical query was executed nearly 200 times! Often, several such identical queries are executed simultaneously, many times in a row, within milliseconds of each other and with no other queries in between. In total, during this one "full refresh," a few thousand such queries were executed. This all seems wasteful, and I don't see any purpose.

 

Meanwhile, my "full refresh" takes several hours. So I am wondering if this "SELECT TOP 1000" thing is a bug that might be fixed? Because if it stopped happening, that would be thousands fewer queries that have to be performed, which might significantly reduce my "full refresh" time.

 

Why does this happen? If it is not necessary, can it please be stopped?

 

Many thanks in advance for your help!

Status: New
Comments
v-qiuyu-msft
Community Support

Hi @kevhav

 

I would suggest you create a support ticket to get help. 

 

Support Ticket.gif

 

Best Regards,
Qiuyun Yu

sudhirshintu
Regular Visitor

Hi Kevhav,

 

Did you manage to fix this issue? we are facing same issue with incremental refresh with SAP HANA database. we are working with microsoft for fix but there is no solution yet.

 

Thanks

Sudhir

kevhav
Continued Contributor

No. I did work a bit with Support about performance issues in general, but never any more insight into to this specific (and strange!) issue.

sudhirshintu
Regular Visitor

Thanks for answer..

 

yes, it is strange and really impact the performance of our SAP systems as Select TOP 1000 is without where clause and that causes issues in live system.

let's see what support team does for this.

 

Thanks

Sudhir

dfunk_ntr
New Member

Did anyone ever get a response on this from Microsoft? We are facing similar issues with SAP HANA as the backend database and dozens of "SELECT TOP 1000" queries hitting the database.

foging
Frequent Visitor

Any news here? I have the same issue, a dataset with incremental refresh configured. Every time I hit refresh or a scheduled refresh runs it triggers around 10 "Top 1000" on the DB view I'm trying to incrementally refresh. I have not written a query like the one it is executing. It is by far the most resource demanding query on that database server.

Power BI Desktop 2.100.1182.0 64-bit (December 2021)

Power BI Service with Premium capacity

Microsoft SQL Server Standard 15.0.2080.9

kevhav
Continued Contributor

I hadn't checked this in a long time. At a glance, right now, it seems I still see some of these queries, but they don't seem excessive anymore. With Amazon Redshift as the source. That's good -- I would call this "fixed."

 

I don't really understand it, but I can imagine these queries serve some purpose for Power BI. Like, maybe Power BI wants a "preview" of the data to ensure it conforms, before starting to download many more rows.

 

And I don't think that a small number of them (10 such queries during a refresh) would cause any noticeable issues. In my opinion, that is reasonable. (My problem was when it would execute the exact same "SELECT TOP 1000" query to Redshift dozens—or hundreds—of times. In a very short time span, for a single Power BI refresh operation. That seemed wasteful, and I was concerned it was degrading my Redshift performance.)