cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
robarivas Member
Member

Large Table performance

I'm connecting to an IBM DB2 database (z/OS) via ODBC (cause I could never get the built-in connector to work). Most tables are viable. However, I cannot find a way to get the transactions table data to load to the data model in a reasonable timeframe. The table in its entirety is over 800 million rows and represents at least 5 years of transactions. So I typically want to pull out just a subset. I can sometimes successfully filter it in Power Query in a reasonable time frame (by apparently leveraging indexed fields) but when I hit Close and Apply it loads to the data model at an extremely unusuably slow rate. So slow that I've never gotten even just 3 months of transactions to completely load. I've tried the following but none have made a difference:

 

1. Limit to the smallest number of columns needed

2. Exclude and/or break up high-cardinality columns

3. Pre-select columns (and rows) using SQL syntax

4. Remove unnecessary string columns

5. Load no other tables at all

 

 

 

7 REPLIES 7
robarivas Member
Member

Re: Large Table performance

Ok, maybe I'll get a response if I ask the question a little differently:

 

Why would Power Query stop Query Folding upon the simple act of filtering a column?

 

let
     Source = Odbc.DataSource("dsn=ABCD", [HierarchicalNavigation=true]),
     ABC_Schema = Source{[Name="ABC",Kind="Schema"]}[Data],
     ABC1234_View = ABC_Schema{[Name="ABC1234",Kind="View"]}[Data],
     #"Removed Other Columns" = Table.SelectColumns(ABC1234_View,{"TX_DATE_POST", "ORDER_ID", "CORP_ID", "ORDER_SITE", "PRODUCT_ID", "ORDER_TYPE", "TX_ID", "TX_AMOUNT", "TX_QTY"}),
     #"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each [CORP_ID = "54"),

Highlighted
Microsoft Phil_Seamark
Microsoft

Re: Large Table performance

Hi @robarivas,

 

Get rid of any columns that have unique ID's and if you have a column that is Date & Time, split that into two columns in Power Query (or just drop time altogether).

 

Do any of those help?


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

robarivas Member
Member

Query Folding

Why would Power Query stop Query Folding upon the simple act of filtering a column?

 

let
     Source = Odbc.DataSource("dsn=ABCD", [HierarchicalNavigation=true]),
     ABC_Schema = Source{[Name="ABC",Kind="Schema"]}[Data],
     ABC1234_View = ABC_Schema{[Name="ABC1234",Kind="View"]}[Data],
     #"Removed Other Columns" = Table.SelectColumns(ABC1234_View,{"TX_DATE_POST", "ORDER_ID", "CORP_ID", "ORDER_SITE", "PRODUCT_ID", "ORDER_TYPE", "TX_ID", "TX_AMOUNT", "TX_QTY"}),
     #"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each [CORP_ID] = "54"),

 

I'm querying an IBM DB2 (z/OS) database via ODBC (could never get the built-in connector to work).  The table (or view) I'm querying is an extremely large transactions table (nearly 900 million rows), I just want to pull a subset but cannot because the filtering takes forever. I know folding is not working because when I right click on the filtering step the "View Native Query" option is dithered out. The steps prior, however, are folding because on those steps the "View Native Query" option is selectable.

robarivas Member
Member

Re: Large Table performance

Hi @Phil_Seamark

 

I don't have those kinds of columns there because one of my first steps is to remove unneeded columns. The first step in which I perform a filter seems to cause query folding to stop happening.

yan Frequent Visitor
Frequent Visitor

Re: Large Table performance

Same issue here on a simple group by - no query folding so PD/PQ loads it all (20M rcds) before aggregating...

I use Teradata through ODBC because it is the only option (still) to authenticate through LDAP.

 

I bet it is due to the use of ODBC: PB/PQ doesn't know what DBMS it needs to translate the SQL too.

 

(De)selecting columns though does translate to native SQL, probably because it is pretty standard. Anything invoking a WHERE or GROUP BY is not.

 

The alternative is to copy/write your own "hardcoded" custom SQL in the data source step. Not ideal if you want business users to share a baseline and create their own steps (unless folding after that is not a must).

robarivas Member
Member

Re: Large Table performance

Finally got a "proper" connection to the DB2 database...no more ODBC. I'm using the Microsoft driver. However, still no Query Folding!!! Smiley Frustrated

 

I wonder why. Anyone have a theory? Again, all I'm doing is a simple dates between filter on a date column (as my test of Query Folding).

robarivas Member
Member

Re: Large Table performance

Would still love a response (Microsoft maybe??). The inability to leverage my transactions table seriously hampers my ability to fully adopt Power BI. One step or no step (i.e., Scheduled) Refresh is an impossibility because of this.

 

And why would query folding happen for certain columns but not for others, such as (crucially) my Posting Date column???

Helpful resources

Announcements
Exclusive LIVE Community Event #3 – Ask Arun Anything

Exclusive LIVE Community Event #3 – Ask Arun Anything

Join us in the third Triple A event!

Meet the 2020 Season 1 Power BI Super Users!

Meet the 2020 Season 1 Power BI Super Users!

It’s the start of a new Super User season! Learn all about the new Super Users and brand-new tiered recognition system.

January 2020 Community Highlights

January 2020 Community Highlights

Make sure you didn't miss any of the things that happened in the community in January!

Difinity Conference

Difinity Conference

The largest Power BI, Power Platform, and Data conference in New Zealand

Top Solution Authors
Top Kudoed Authors