cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
robarivas Member
Member

Large Table performance

I'm connecting to an IBM DB2 database (z/OS) via ODBC (cause I could never get the built-in connector to work). Most tables are viable. However, I cannot find a way to get the transactions table data to load to the data model in a reasonable timeframe. The table in its entirety is over 800 million rows and represents at least 5 years of transactions. So I typically want to pull out just a subset. I can sometimes successfully filter it in Power Query in a reasonable time frame (by apparently leveraging indexed fields) but when I hit Close and Apply it loads to the data model at an extremely unusuably slow rate. So slow that I've never gotten even just 3 months of transactions to completely load. I've tried the following but none have made a difference:

 

1. Limit to the smallest number of columns needed

2. Exclude and/or break up high-cardinality columns

3. Pre-select columns (and rows) using SQL syntax

4. Remove unnecessary string columns

5. Load no other tables at all

 

 

 

7 REPLIES 7
robarivas Member
Member

Re: Large Table performance

Ok, maybe I'll get a response if I ask the question a little differently:

 

Why would Power Query stop Query Folding upon the simple act of filtering a column?

 

let
     Source = Odbc.DataSource("dsn=ABCD", [HierarchicalNavigation=true]),
     ABC_Schema = Source{[Name="ABC",Kind="Schema"]}[Data],
     ABC1234_View = ABC_Schema{[Name="ABC1234",Kind="View"]}[Data],
     #"Removed Other Columns" = Table.SelectColumns(ABC1234_View,{"TX_DATE_POST", "ORDER_ID", "CORP_ID", "ORDER_SITE", "PRODUCT_ID", "ORDER_TYPE", "TX_ID", "TX_AMOUNT", "TX_QTY"}),
     #"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each [CORP_ID = "54"),

Highlighted
Microsoft Phil_Seamark
Microsoft

Re: Large Table performance

Hi @robarivas,

 

Get rid of any columns that have unique ID's and if you have a column that is Date & Time, split that into two columns in Power Query (or just drop time altogether).

 

Do any of those help?


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

robarivas Member
Member

Query Folding

Why would Power Query stop Query Folding upon the simple act of filtering a column?

 

let
     Source = Odbc.DataSource("dsn=ABCD", [HierarchicalNavigation=true]),
     ABC_Schema = Source{[Name="ABC",Kind="Schema"]}[Data],
     ABC1234_View = ABC_Schema{[Name="ABC1234",Kind="View"]}[Data],
     #"Removed Other Columns" = Table.SelectColumns(ABC1234_View,{"TX_DATE_POST", "ORDER_ID", "CORP_ID", "ORDER_SITE", "PRODUCT_ID", "ORDER_TYPE", "TX_ID", "TX_AMOUNT", "TX_QTY"}),
     #"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each [CORP_ID] = "54"),

 

I'm querying an IBM DB2 (z/OS) database via ODBC (could never get the built-in connector to work).  The table (or view) I'm querying is an extremely large transactions table (nearly 900 million rows), I just want to pull a subset but cannot because the filtering takes forever. I know folding is not working because when I right click on the filtering step the "View Native Query" option is dithered out. The steps prior, however, are folding because on those steps the "View Native Query" option is selectable.

robarivas Member
Member

Re: Large Table performance

Hi @Phil_Seamark

 

I don't have those kinds of columns there because one of my first steps is to remove unneeded columns. The first step in which I perform a filter seems to cause query folding to stop happening.

yan Frequent Visitor
Frequent Visitor

Re: Large Table performance

Same issue here on a simple group by - no query folding so PD/PQ loads it all (20M rcds) before aggregating...

I use Teradata through ODBC because it is the only option (still) to authenticate through LDAP.

 

I bet it is due to the use of ODBC: PB/PQ doesn't know what DBMS it needs to translate the SQL too.

 

(De)selecting columns though does translate to native SQL, probably because it is pretty standard. Anything invoking a WHERE or GROUP BY is not.

 

The alternative is to copy/write your own "hardcoded" custom SQL in the data source step. Not ideal if you want business users to share a baseline and create their own steps (unless folding after that is not a must).

robarivas Member
Member

Re: Large Table performance

Finally got a "proper" connection to the DB2 database...no more ODBC. I'm using the Microsoft driver. However, still no Query Folding!!! Smiley Frustrated

 

I wonder why. Anyone have a theory? Again, all I'm doing is a simple dates between filter on a date column (as my test of Query Folding).

robarivas Member
Member

Re: Large Table performance

Would still love a response (Microsoft maybe??). The inability to leverage my transactions table seriously hampers my ability to fully adopt Power BI. One step or no step (i.e., Scheduled) Refresh is an impossibility because of this.

 

And why would query folding happen for certain columns but not for others, such as (crucially) my Posting Date column???

Helpful resources

Announcements
Coming Soon: T-Shirt Design Contest

Coming Soon: T-Shirt Design Contest

Keep your eyes open for our upcoming T-shirt design contest!

Meet the 2020 Season 1 Power BI Super Users!

Meet the 2020 Season 1 Power BI Super Users!

It’s the start of a new Super User season! Learn all about the new Super Users and brand-new tiered recognition system.

Super User Challenge: Can You Solve These?

Super User Challenge: Can You Solve These?

We're celebrating the start of the New Super User season with our first ever Super User 'Can You Solve These?' challenge.

Power Platform Online Conference

Power Platform Online Conference

Join us for the first ever Power Platform Online Conference!

Top Solution Authors
Top Kudoed Authors