Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
When PBI (desktop or service) performs its query-folding when sending a query to BigQuery, it converts date values (DATE in BigQuery and detected as such in PBI) to TIMESTAMP.
Because of the conversion, BigQuery cannot perform data pruning, that is, only scan the relevant date partitions.
See below when using a date slicer in PBI on budat:
select `matkl`, sum(`C1`) as `C1` from ( select `budat`, `matkl`, `wrbtr` as `C1` from `hs-airflow-1`.`final`.`a2` where cast(`budat` as TIMESTAMP) < TIMESTAMP('2015-12-04 00:00:00') and cast(`budat` as TIMESTAMP) >= TIMESTAMP('2014-04-30 00:00:00') ) as `ITBL` group by `matkl` LIMIT 1000001 OFFSET 0Stats from BigQuery:
select `matkl`,Stats from BigQuery:
sum(`C1`) as `C1`
from
(
select `budat`,
`matkl`,
`wrbtr` as `C1`
from `hs-airflow-1`.`final`.`a2`
where budat < '2015-12-04' and budat >= '2014-04-30'
-- same result with "where budat < DATE('2015-12-04') and budat >= DATE('2014-04-30')"
) as `ITBL`
group by `matkl`
LIMIT 1000001 OFFSET 0
As you can see from the bytes processed, the second query *did* use the data partitions whereas the first, built by PowerBI, *did not*.
The table scan here is limited because this is just a subset to illustrate but the performance hit can be considerable on large datasets.
PBI should *not* cast dates to timestamps - especially in WHERE predicates - when there is no need to.
E.g. the following works perfectly fine for limiting date-based partitions in BigQuery:
WHERE CAST( budat AS DATE) < DATE('2015-12-04') and budat >= DATE('2014-04-30')
On the bigger picture, it would be very useful if query-folding logic could be configurable or somehow editable (e.g dictionary file). I understand MSFT cannot keep up with all SQL-based source changes.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.