Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
robarivas
Post Patron
Post Patron

Large Table performance

I'm connecting to an IBM DB2 database (z/OS) via ODBC (cause I could never get the built-in connector to work). Most tables are viable. However, I cannot find a way to get the transactions table data to load to the data model in a reasonable timeframe. The table in its entirety is over 800 million rows and represents at least 5 years of transactions. So I typically want to pull out just a subset. I can sometimes successfully filter it in Power Query in a reasonable time frame (by apparently leveraging indexed fields) but when I hit Close and Apply it loads to the data model at an extremely unusuably slow rate. So slow that I've never gotten even just 3 months of transactions to completely load. I've tried the following but none have made a difference:

 

1. Limit to the smallest number of columns needed

2. Exclude and/or break up high-cardinality columns

3. Pre-select columns (and rows) using SQL syntax

4. Remove unnecessary string columns

5. Load no other tables at all

 

 

 

7 REPLIES 7
Phil_Seamark
Employee
Employee

Hi @robarivas,

 

Get rid of any columns that have unique ID's and if you have a column that is Date & Time, split that into two columns in Power Query (or just drop time altogether).

 

Do any of those help?


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Hi @Phil_Seamark

 

I don't have those kinds of columns there because one of my first steps is to remove unneeded columns. The first step in which I perform a filter seems to cause query folding to stop happening.

Same issue here on a simple group by - no query folding so PD/PQ loads it all (20M rcds) before aggregating...

I use Teradata through ODBC because it is the only option (still) to authenticate through LDAP.

 

I bet it is due to the use of ODBC: PB/PQ doesn't know what DBMS it needs to translate the SQL too.

 

(De)selecting columns though does translate to native SQL, probably because it is pretty standard. Anything invoking a WHERE or GROUP BY is not.

 

The alternative is to copy/write your own "hardcoded" custom SQL in the data source step. Not ideal if you want business users to share a baseline and create their own steps (unless folding after that is not a must).

Finally got a "proper" connection to the DB2 database...no more ODBC. I'm using the Microsoft driver. However, still no Query Folding!!! Smiley Frustrated

 

I wonder why. Anyone have a theory? Again, all I'm doing is a simple dates between filter on a date column (as my test of Query Folding).

Would still love a response (Microsoft maybe??). The inability to leverage my transactions table seriously hampers my ability to fully adopt Power BI. One step or no step (i.e., Scheduled) Refresh is an impossibility because of this.

 

And why would query folding happen for certain columns but not for others, such as (crucially) my Posting Date column???

Why would Power Query stop Query Folding upon the simple act of filtering a column?

 

let
     Source = Odbc.DataSource("dsn=ABCD", [HierarchicalNavigation=true]),
     ABC_Schema = Source{[Name="ABC",Kind="Schema"]}[Data],
     ABC1234_View = ABC_Schema{[Name="ABC1234",Kind="View"]}[Data],
     #"Removed Other Columns" = Table.SelectColumns(ABC1234_View,{"TX_DATE_POST", "ORDER_ID", "CORP_ID", "ORDER_SITE", "PRODUCT_ID", "ORDER_TYPE", "TX_ID", "TX_AMOUNT", "TX_QTY"}),
     #"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each [CORP_ID] = "54"),

 

I'm querying an IBM DB2 (z/OS) database via ODBC (could never get the built-in connector to work).  The table (or view) I'm querying is an extremely large transactions table (nearly 900 million rows), I just want to pull a subset but cannot because the filtering takes forever. I know folding is not working because when I right click on the filtering step the "View Native Query" option is dithered out. The steps prior, however, are folding because on those steps the "View Native Query" option is selectable.

robarivas
Post Patron
Post Patron

Ok, maybe I'll get a response if I ask the question a little differently:

 

Why would Power Query stop Query Folding upon the simple act of filtering a column?

 

let
     Source = Odbc.DataSource("dsn=ABCD", [HierarchicalNavigation=true]),
     ABC_Schema = Source{[Name="ABC",Kind="Schema"]}[Data],
     ABC1234_View = ABC_Schema{[Name="ABC1234",Kind="View"]}[Data],
     #"Removed Other Columns" = Table.SelectColumns(ABC1234_View,{"TX_DATE_POST", "ORDER_ID", "CORP_ID", "ORDER_SITE", "PRODUCT_ID", "ORDER_TYPE", "TX_ID", "TX_AMOUNT", "TX_QTY"}),
     #"Filtered Rows" = Table.SelectRows(#"Removed Other Columns", each [CORP_ID = "54"),

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.