cancel
Showing results for 
Search instead for 
Did you mean: 

DirectQuery BigQuery: conversion to TIMESTAMP prevents the use of partitioning

When PBI (desktop or service) performs its query-folding when sending a query to BigQuery, it converts date values (DATE in BigQuery and detected as such in PBI) to TIMESTAMP.

 

Because of the conversion, BigQuery cannot perform data pruning, that is, only scan the relevant date partitions.

 

See below when using a date slicer in PBI on budat:

 

  • Query folded (translated to BQ dialect) by PowerBI):
    select `matkl`,
    sum(`C1`) as `C1`
    from 
    (
    select `budat`,
    `matkl`,
    `wrbtr` as `C1`
    from `hs-airflow-1`.`final`.`a2`
    where cast(`budat` as TIMESTAMP) < TIMESTAMP('2015-12-04 00:00:00') 
        and cast(`budat` as TIMESTAMP) >= TIMESTAMP('2014-04-30 00:00:00')
    ) as `ITBL`
    group by `matkl`
    LIMIT 1000001 OFFSET 0
    Stats from BigQuery:
    Bytes Processed 174 MB
    Bytes Billed 175 MB
    Slot Time (ms) 51.3 K


  • Same query edited in BigQuery UI to remove the casting to timestamp
    select `matkl`,
    sum(`C1`) as `C1`
    from
    (
    select `budat`,
    `matkl`,
    `wrbtr` as `C1`
    from `hs-airflow-1`.`final`.`a2`
    where budat < '2015-12-04' and budat >= '2014-04-30'
    -- same result with "where budat < DATE('2015-12-04') and budat >= DATE('2014-04-30')"
    ) as `ITBL`
    group by `matkl`
    LIMIT 1000001 OFFSET 0
    Stats from BigQuery:
    Bytes Processed 79.2 MB
    Bytes Billed 80.0 MB
    Slot Time (ms) 19.2 K


As you can see from the bytes processed, the second query *did* use the data partitions whereas the first, built by PowerBI, *did not*.

The table scan here is limited because this is just a subset to illustrate but the performance hit can be considerable on large datasets.

 

PBI should *not* cast dates to timestamps - especially in WHERE predicates - when there is no need to.

 

E.g. the following works perfectly fine for limiting date-based partitions in BigQuery: 
WHERE CAST( budat AS DATE) < DATE('2015-12-04') and budat >= DATE('2014-04-30')

 

On the bigger picture, it would be very useful if query-folding logic could be configurable or somehow editable (e.g dictionary file). I understand MSFT cannot keep up with all SQL-based source changes.

Status: Needs Info
Comments
v-qiuyu-msft
Community Support

Hi @yan,

 

Please update your Power BI desktop to the latest version 2.63.3272.40461 then test again. 

 

If issue persists, as I don't have such Google BigQuery data to test on my side, please record a video to clarify the issue. I can try to report this issue internally. 

 

Best Regards,
Qiuyun Yu 

v-qiuyu-msft
Community Support
Status changed to: Needs Info
 
alicialcaide
Frequent Visitor

Hi, I'm having a similar problem. I have configured the Incremental Refresh for a table connected to a Big Query source and when the query is generated is changing the datatype to TIMESTAMP(RangeStart) and TIMESTAMP(RangeEnd) so the SQL is not valid due to different data types. 

 

Any idea about how avoid Power BI change the data type automatically in query folding? 

DavidPROVOST
Advocate II

I have a model stored on a premium workspace and based on Bigquery datasources, that refreshes incrementally 5 times a day. Date and INT filter were correctly folded to Bigquery until feb 18 2022, 11am CET.

This informration is from the BQ logs. Power BI used to generate a query that included the following : 

 

... where ((`site` = '12022' and `site` is not null) and `requestDate` >= TIMESTAMP('2022-02-18 00:00:00')) and `requestDate` <= TIMESTAMP('2022-02-19 00:00:00')

 

which is OK for me as requestDate really IS a timestamp in BQ.

Then on this feb 18th, it suddenly stopped sending date, INT and text filters to BQ. I'm using native BQ connection.