Solved: Re: Understanding why Table.Buffer makes a differe...

JeffWeir · ‎07-23-2019

I understand from Google that using Table.Buffer is often a good idea when multiple steps within a query will be referencing the same Table. But it seems from my experimentations that it's also a good idea when there are other downstream queries in the query dependency chain.

As per the question I posted here, I have a bunch of queries in a chain that are somewhat recursive in nature: I am finding matching between two data sources where the date doesn't always match, so I progressively match the tables on larger values of a 'tolerance' factor X using this condition:

Table1.Date = Table2.Date +/- X

And I'm effectiviely running this in a loop for values of X between 1 to 7, and removing successful matches on each pass leaving just unsuccessful ones. This lets me progressively increase the 'mismatch' tolerence on those dates, and remove matches at each pass leaving just the unmatched rows to do increasingly desperate matches on.

My query performs the following steps:

Loads data from Table1 and Table2, and does an inner join on unique ID and Date
Effectively removes these matches from Table1 and Table2 via an anti-join against each referenced table.
Does an inner join on the remaining records, but this time offsets the dates in the Port table by +/- 1 day. This finds some more matches.
Performs steps 2 and 3 over and over, increasing the date offset by 1 each time until I have captured all date mismatches up to a week.

The query dependency tree that results looks like so:

I've found that using Table.Buffer on both Tables whenever I do a join dramatically ~~increases~~ (whoops) decreases execution time.

In this file, with no Table.Buffer, it takes about 17 seconds to run through the chain:

Match to closest day_20190722 No Buffer.xlsx

In this file, with a Table.Buffer on every Table that gets joined on every step where a join occurs, it takes just 6 seconds:

Match to closest day_20190722 buffer3 Revised.xlsx

Can anyone shed light on why the Buffer works? Perhaps another interesting question for @ImkeF

ImkeF · ‎07-23-2019

This thread contains a lot of information around how Table.Buffer works: https://social.technet.microsoft.com/Forums/en-US/34e454b5-3a18-4eef-b920-40703c93f390/tablebuffer-f...

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

ImkeF · ‎07-23-2019

This thread contains a lot of information around how Table.Buffer works: https://social.technet.microsoft.com/Forums/en-US/34e454b5-3a18-4eef-b920-40703c93f390/tablebuffer-f...

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

JeffWeir · ‎07-23-2019

Wow, that's quite some thread @ImkeF ! I'll read through it several times today, and try to comprehend it all 🙂

JeffWeir · ‎07-23-2019

Wow....putting Table.Buffer around any steps that referenced previous queries reduced my load time from potentially hours to one and a half minutes. (I say potentially hours, because I'd never dared to load all 30k rows of data for both Tables at once, nor attempted to make as many as 7 recursive steps. Even just loading one tenth of the data for 5 steps was taking a good 10 minutes or more).

I take it that I want to buffer each input just once, as soon as it appears in the current query that is referencing previous ones?

JeffWeir · ‎07-23-2019

This thread is also very enlightening: https://social.technet.microsoft.com/Forums/en-US/8d5ee632-fdff-4ba2-b150-bb3591f955fb/queries-evalu...

Say you have a query Q4 that references Q2 and Q3. And say Q2 and Q3 both reference Q1. And say Q1 is pulling from a flat file. According to Erhen at Microsoft, if you're pulling from a flat file - and because File.Contents results aren't cached - the flat file will be read 5 times! It gets read each time Q1 is directly or indirectly referenced: twice in Q4, once in Q3, Q2, and Q1. And I believe he’s saying that this is the case for either Excel or PBI.

So…when doing any kind of complicated chain, the code produced from the UI just flat out sucks! And that is an epic fail for a tool that is designed to be used by non-experts via the UI.

I think a very low number of people facing performance issues because of this behaviour will be able to form a hypothesis about what's going wrong, find a thread on the web that explains the issue in plain English, then go wrap the first reference in any one query to other queries with Table.Buffer. And if you don’t diagnose the issue, you just think that PQ is a slow dog, and switch to using SQL.

Anyone know if there a UserVoice request that addresses this specific behaviour? And whether this behaviour unaddressable now, given the architecture choices MS has hade?

Element115 · ‎01-12-2023

Amen to that as I have hit that wall as well just recently.

Understanding why Table.Buffer makes a difference in dependency chain

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024