Re: Large Table performance

robarivas · ‎03-07-2017

I'm connecting to an IBM DB2 database (z/OS) via ODBC (cause I could never get the built-in connector to work). Most tables are viable. However, I cannot find a way to get the transactions table data to load to the data model in a reasonable timeframe. The table in its entirety is over 800 million rows and represents at least 5 years of transactions. So I typically want to pull out just a subset. I can sometimes successfully filter it in Power Query in a reasonable time frame (by apparently leveraging indexed fields) but when I hit Close and Apply it loads to the data model at an extremely unusuably slow rate. So slow that I've never gotten even just 3 months of transactions to completely load. I've tried the following but none have made a difference:

1. Limit to the smallest number of columns needed

2. Exclude and/or break up high-cardinality columns

3. Pre-select columns (and rows) using SQL syntax

4. Remove unnecessary string columns

5. Load no other tables at all

Phil_Seamark · ‎03-08-2017

Hi @robarivas,

Get rid of any columns that have unique ID's and if you have a column that is Date & Time, split that into two columns in Power Query (or just drop time altogether).

Do any of those help?

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

robarivas · ‎03-08-2017

Hi @Phil_Seamark

I don't have those kinds of columns there because one of my first steps is to remove unneeded columns. The first step in which I perform a filter seems to cause query folding to stop happening.

yan · ‎03-21-2017

Same issue here on a simple group by - no query folding so PD/PQ loads it all (20M rcds) before aggregating...

I use Teradata through ODBC because it is the only option (still) to authenticate through LDAP.

I bet it is due to the use of ODBC: PB/PQ doesn't know what DBMS it needs to translate the SQL too.

(De)selecting columns though does translate to native SQL, probably because it is pretty standard. Anything invoking a WHERE or GROUP BY is not.

The alternative is to copy/write your own "hardcoded" custom SQL in the data source step. Not ideal if you want business users to share a baseline and create their own steps (unless folding after that is not a must).

robarivas · ‎05-30-2017

Finally got a "proper" connection to the DB2 database...no more ODBC. I'm using the Microsoft driver. However, still no Query Folding!!!

I wonder why. Anyone have a theory? Again, all I'm doing is a simple dates between filter on a date column (as my test of Query Folding).

robarivas · ‎07-24-2017

Would still love a response (Microsoft maybe??). The inability to leverage my transactions table seriously hampers my ability to fully adopt Power BI. One step or no step (i.e., Scheduled) Refresh is an impossibility because of this.

And why would query folding happen for certain columns but not for others, such as (crucially) my Posting Date column???

robarivas · ‎03-08-2017

Why would Power Query stop Query Folding upon the simple act of filtering a column?

let
Source = Odbc.DataSource("dsn=ABCD", [HierarchicalNavigation=true]),
ABC_Schema = Source{[Name="ABC",Kind="Schema"]}[Data],
ABC1234_View = ABC_Schema{[Name="ABC1234",Kind="View"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(ABC1234_View,{"TX_DATE_POST", "ORDER_ID", "CORP_ID", "ORDER_SITE", "PRODUCT_ID", "ORDER_TYPE", "TX_ID", "TX_AMOUNT", "TX_QTY"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each [CORP_ID] = "54"),

I'm querying an IBM DB2 (z/OS) database via ODBC (could never get the built-in connector to work). The table (or view) I'm querying is an extremely large transactions table (nearly 900 million rows), I just want to pull a subset but cannot because the filtering takes forever. I know folding is not working because when I right click on the filtering step the "View Native Query" option is dithered out. The steps prior, however, are folding because on those steps the "View Native Query" option is selectable.

robarivas · ‎03-08-2017

Ok, maybe I'll get a response if I ask the question a little differently:

Why would Power Query stop Query Folding upon the simple act of filtering a column?

let
Source = Odbc.DataSource("dsn=ABCD", [HierarchicalNavigation=true]),
ABC_Schema = Source{[Name="ABC",Kind="Schema"]}[Data],
ABC1234_View = ABC_Schema{[Name="ABC1234",Kind="View"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(ABC1234_View,{"TX_DATE_POST", "ORDER_ID", "CORP_ID", "ORDER_SITE", "PRODUCT_ID", "ORDER_TYPE", "TX_ID", "TX_AMOUNT", "TX_QTY"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each [CORP_ID = "54"),

Large Table performance

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly