Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
jsperson
Frequent Visitor

ODBC and Large Data Sets

Hey folks,

I'm having an issue with Power Query (both Power BI and Excel) and am casting a wide net looking for options.

 

Here's my requirement - the data source has about 300M records. I have the option of using three different database engines - AWS Athena, AWS Redshift, or Presto. The problem that I'm having is with Power Query SQL generation using the ODBC driver. Let's say I select two attribute columns and two measure columns. If I group by both attribute columns, Power Query pushes down the group by SQL to the database and returns only a few records with teh measures aggregated appropriately. In my analysis I noticed that one of the measure columns had a bunch of null values that I want to exclude. So I placed a filter on that column. When I hit go Power Query proceeds to try to bring back the entire dataset (filtered, but that only removes 10% of the records) before doing the group by. This seems like a signficant deficiencey in the Power Query engine.

 

I've tested with Tableau and the same driver. Tableau creates a SQL statmenet like SELECT foo, SUM(foofact) FROM bar WHERE foofact IS NOT NULL GROUP BY foo. That's exactly what I would expect from Power BI.

 

Any suggestions appreciated...

10 REPLIES 10

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.