ODBC and Large Data Sets

jsperson · ‎02-20-2018

Hey folks,

I'm having an issue with Power Query (both Power BI and Excel) and am casting a wide net looking for options.

Here's my requirement - the data source has about 300M records. I have the option of using three different database engines - AWS Athena, AWS Redshift, or Presto. The problem that I'm having is with Power Query SQL generation using the ODBC driver. Let's say I select two attribute columns and two measure columns. If I group by both attribute columns, Power Query pushes down the group by SQL to the database and returns only a few records with teh measures aggregated appropriately. In my analysis I noticed that one of the measure columns had a bunch of null values that I want to exclude. So I placed a filter on that column. When I hit go Power Query proceeds to try to bring back the entire dataset (filtered, but that only removes 10% of the records) before doing the group by. This seems like a signficant deficiencey in the Power Query engine.

I've tested with Tableau and the same driver. Tableau creates a SQL statmenet like SELECT foo, SUM(foofact) FROM bar WHERE foofact IS NOT NULL GROUP BY foo. That's exactly what I would expect from Power BI.

Any suggestions appreciated...

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly