Solved: Understanding DirectQuery row limit

twilliams98103 · ‎11-22-2019

Trying to understand the Direct Query million row limit.

I create a Direct Query Connection to ContosoRetailDW and load the FactOnlineSales table, which has 12M rows.

I drop OnlineSalesKey, which is a unique ID so well over 1 million items, onto a Table viz. No problem.

I sort the column ascending and descending. No problem.

I create a Measure [KeyPlus]=SUMX(FactOnlineSales,FactOnlineSales[OnlineSalesKey]+1) and drop it on my table. Boom, I get the million row limit error.

Why is it able to display the 12 million rows as a column in a table and correctly execute a sort over all 12 million rows without the error? What's happening differently when I create the measure that hits the limit?

Thanks

d_gosbell · ‎11-23-2019

@twilliams98103 wrote:

Why is it able to display the 12 million rows as a column in a table and correctly execute a sort over all 12 million rows without the error? What's happening differently when I create the measure that hits the limit?

The simple answer here is that it's not actually displaying 12 million rows. If you trace the DAX queries sent to the tabular engine you will see them doing something like a "EVALUATE TOPN( 500, ..." and if you trace the generated SQL queries you will see an equivalent "SELECT TOP 500 ... " (the number might not be exactly 500, but it will use this general pattern)

There are two things going on here.

1. Is that the UI tries to minimize the amount of detailed data it requests. Ultimately your report is going to display in a browser so it can't deal with tens of millions of records anyway, if you scroll down you will find that there are additional queries generated in the form of "SELECT TOP 500 FROM table WHERE OnlineSalesKey > 501"

2. The second thing is where the 1 million row limit kicks in. That is that the tabular engine that is evaluating the DAX queries is trying to minimize the number of SQL queries it generates and keep the total number of rows coming back to under 1 million. This works well with simple aggregations. So if you do a sum of sales amount by category it can translate into SQL like the following

SELECT Category, SUM(SalesAmount) FROM SalesTable GROUP BY Category

And you only get a row back from SQL per category. However as your measure expressions become more complicated there are operations which the engine cannot fold down into aggregate queries in SQL so it has to request more detailed level information from SQL and then perform the calculations in the tabular engine.

This is where the team chose the 1 million row limit. At that size the SQL engine can usually return that many rows or less in a reasonable amount of time. It can be transferred over the network at a reasonable speed and the tabular engine can process any expressions over that amount of data relatively quickly in order to not keep the user waiting too long.

This is why the general recommendations are to use import models as the default approach, but if you data is too large to fit in memory or if it;'s being updated too fast then you should look at a direct query model. But subsequently you should try to keep your structures and calculations as simple as possible in a direct query model to try and avoid issues.

I'm a bit surprised that just doing a +1 inside a SUMX breaks the query folding, but it's probably also not something that people would do very often in a production model. You'd probably find that if you did the +1 in a view over your table and just did a simple sum in your measure that this would work fine.

View solution in original post

d_gosbell · ‎11-23-2019

@twilliams98103 wrote:

Why is it able to display the 12 million rows as a column in a table and correctly execute a sort over all 12 million rows without the error? What's happening differently when I create the measure that hits the limit?

The simple answer here is that it's not actually displaying 12 million rows. If you trace the DAX queries sent to the tabular engine you will see them doing something like a "EVALUATE TOPN( 500, ..." and if you trace the generated SQL queries you will see an equivalent "SELECT TOP 500 ... " (the number might not be exactly 500, but it will use this general pattern)

There are two things going on here.

1. Is that the UI tries to minimize the amount of detailed data it requests. Ultimately your report is going to display in a browser so it can't deal with tens of millions of records anyway, if you scroll down you will find that there are additional queries generated in the form of "SELECT TOP 500 FROM table WHERE OnlineSalesKey > 501"

2. The second thing is where the 1 million row limit kicks in. That is that the tabular engine that is evaluating the DAX queries is trying to minimize the number of SQL queries it generates and keep the total number of rows coming back to under 1 million. This works well with simple aggregations. So if you do a sum of sales amount by category it can translate into SQL like the following

SELECT Category, SUM(SalesAmount) FROM SalesTable GROUP BY Category

And you only get a row back from SQL per category. However as your measure expressions become more complicated there are operations which the engine cannot fold down into aggregate queries in SQL so it has to request more detailed level information from SQL and then perform the calculations in the tabular engine.

This is where the team chose the 1 million row limit. At that size the SQL engine can usually return that many rows or less in a reasonable amount of time. It can be transferred over the network at a reasonable speed and the tabular engine can process any expressions over that amount of data relatively quickly in order to not keep the user waiting too long.

This is why the general recommendations are to use import models as the default approach, but if you data is too large to fit in memory or if it;'s being updated too fast then you should look at a direct query model. But subsequently you should try to keep your structures and calculations as simple as possible in a direct query model to try and avoid issues.

I'm a bit surprised that just doing a +1 inside a SUMX breaks the query folding, but it's probably also not something that people would do very often in a production model. You'd probably find that if you did the +1 in a view over your table and just did a simple sum in your measure that this would work fine.

Understanding DirectQuery row limit

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly