Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
PJChip
Frequent Visitor

SELECT TOP 1000 [$Table] - Causing poor performance for Incremental Load

I have a table in Power BI I want to incrementally load. Query folding is happening and from the time PBI issues the folded query for a daily partition takes 2 minutes to load from a view on a base table with billions of records. That is awesome. However, Power BI seems to insist on doing a little sample query for every partition that is being processed where it does a 'SELECT TOP 1000 [$Table]....' . There is NO where clause on that query so the view performs the SELECT TOP 1000 across all the billions of records and it takes about 15 minutes per partition just to do the 'SELECT TOP 1000'.  Is there any way to stop Power BI from doing that 'SELECT top 1000'? 

The view summarizes from a 15 minute increment to a DAY level. Since query folding provides good performance when incrementally loading the last 10 days we would like to avoid having to instantiate a DAY level fact table just to avoid this unnecessary query that  Power BI is doing. Any suggestions?

I am able to see all those queries that are happening by running Profiler against the PBI service and SQL Server so I can see exactly what is taking all the time.

3 REPLIES 3
Tomiasp
New Member

If anyone else running into this problem, it is the data source privacy level configuration that causes this TOP 1000 to happen. 

 

Change the data source privacy level to anything else.

Tomiasp_0-1679586916886.png

 

Guy in a cube video also explaining this: Reduce the queries being sent with Incremental Refresh in Power BI! - YouTube

v-easonf-msft
Community Support
Community Support

Hi, @PJChip 

To avoid using' SELECT TOP 1000 [$Table]....' by default, you can write your own SQL code to get the data you want. 

29.png

Import data from a database using native database query 

Query Folding And Writing Your Own SQL Queries In Power Query/Power BI/Excel Get & Transform 

Query folding for native SQL in Power BI 

 

Best Regards,
Community Support Team _ Eason

Anonymous
Not applicable

@PJChip did you get any help with this? I am facing the same issue when querying delta tables through Power BI. I have a view in Synapse that is based on three Delta tables. When I give the query a where clause it correctly filters the data from deltatables. However Power BI service sends these top 1000 queries before sending the actual where clause containing query:
For each partition in Power BI Tabular model it seems to first send this Top 1000 query

select top 1000
    [$Table].[AccountEntryID] as [AccountEntryID],
    [$Table].[ACCOUNTINGPERIOD] as [ACCOUNTINGPERIOD],
    [$Table].[ACCOUNTINGYEAR] as [ACCOUNTINGYEAR],
    [$Table].[ACCOUNTINGMONTH] as [ACCOUNTINGMONTH],
    [$Table].[ACCOUNTINGDATE] as [ACCOUNTINGDATE],
    [$Table].[CREATEDDATETIME] as [CREATEDDATETIME],
    [$Table].[JOURNALNUMBER] as [JOURNALNUMBER],
    [$Table].[SUBLEDGERVOUCHER] as [SUBLEDGERVOUCHER],
    [$Table].[POSTINGLAYER] as [POSTINGLAYER],
    [$Table].[ACCOUNTINGCURRENCYAMOUNT] as [ACCOUNTINGCURRENCYAMOUNT],
    [$Table].[TRANSACTIONCURRENCYCODE] as [TRANSACTIONCURRENCYCODE],
    [$Table].[REPORTINGCURRENCYAMOUNT] as [REPORTINGCURRENCYAMOUNT],
    [$Table].[POSTINGTYPE] as [POSTINGTYPE],
    [$Table].[LEDGERDIMENSION] as [LEDGERDIMENSION],
    [$Table].[LEDGERACCOUNT] as [LEDGERACCOUNT],
    [$Table].[TEXT] as [TEXT],
    [$Table].[ISCREDIT] as [ISCREDIT],
    [$Table].[Company] as [Company],
    [$Table].[MainAccount_ID] as [MainAccount_ID],
    [$Table].[BusinessUnit_ID] as [BusinessUnit_ID],
    [$Table].[Department_ID] as [Department_ID],
    [$Table].[CostCenter_ID] as [CostCenter_ID],
    [$Table].[ItemGroup_ID] as [ItemGroup_ID],
    [$Table].[Project_ID] as [Project_ID],
    [$Table].[Property_ID] as [Property_ID],
    [$Table].[Customer_ID] as [Customer_ID],
    [$Table].[ACCOUNTINGDATEKEY] as [ACCOUNTINGDATEKEY]
from [f_GeneralLedger_DELTA] as [$Table]

After that it sends the query with where clause:

select top 1000
    [_].[AccountEntryID],
    [_].[ACCOUNTINGPERIOD],
    [_].[ACCOUNTINGYEAR],
    [_].[ACCOUNTINGMONTH],
    [_].[ACCOUNTINGDATE],
    [_].[CREATEDDATETIME],
    [_].[JOURNALNUMBER],
    [_].[SUBLEDGERVOUCHER],
    [_].[POSTINGLAYER],
    [_].[ACCOUNTINGCURRENCYAMOUNT],
    [_].[TRANSACTIONCURRENCYCODE],
    [_].[REPORTINGCURRENCYAMOUNT],
    [_].[POSTINGTYPE],
    [_].[LEDGERDIMENSION],
    [_].[LEDGERACCOUNT],
    [_].[TEXT],
    [_].[ISCREDIT],
    [_].[Company],
    [_].[MainAccount_ID],
    [_].[BusinessUnit_ID],
    [_].[Department_ID],
    [_].[CostCenter_ID],
    [_].[ItemGroup_ID],
    [_].[Project_ID],
    [_].[Property_ID],
    [_].[Customer_ID],
    [_].[ACCOUNTINGDATEKEY]
from [f_GeneralLedger_DELTA] as [_]
where [_].[CREATEDDATETIME] >= convert(datetime2, '2015-01-01 00:00:00') and [_].[CREATEDDATETIME] <= convert(datetime2, '2016-01-01 00:00:00')

 The top 1000 query forces Delta tables Parquet files to be loaded in full to Synapse on-demand cluster. Where clauses only loads required data from the parquet files. This pretty much destroys one point of using the incremental dataset, which is of course not spending so much resources in Synapse on-demand....

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors