cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
campelliann
Helper V
Helper V

When to use Table.Buffer

Hi there, 

I would like to know in each steps or before each steps it is recommended to use Table.Buffer to improve performance.

Many thanks

1 ACCEPTED SOLUTION
BA_Pete
Super User
Super User

Hi @campelliann ,

 

This is an enormous question with many, many variables. However, very broadly, once you understand what Table.Buffer and List.Buffer do, this will often inform you of when is a good time to use them.

At their core, these functions load the entire target into memory. Therefore, one key use is to use them when Power Query would otherwise make multiple scans of the target.

For example:

If you want to do a conditional join in PQ (in this example to merge values from an SCD table), you may find yourself creating a column something like this to achieve it:

= Table.AddColumn(
  previousStep,
  "columnName",
  (OT) => Table.SelectRows(
    Table2,  // The 'Target'
    each OT[Field] = [Field]
      and OT[dateField] >= [table2StartDate]
      and OT[matchField2] <= [table2EndDate]
    ){0}[valueToMerge],
  type number
)

 

In this instance, PQ will need to keep rescanning Table2 for the conditions on each row. Therefore, if we buffer Table2 in the Table1 query (T2Buffer = Table.Buffer(Table2) ), then swap the target from Table2 to T2Buffer, PQ can just hit the memory for what it needs each time.

 

List.Buffer example:

If you want to filter a table by the values in another table, you may find yourself righting code like this:

Table.SelectRows(
  TableToFilter,
  each List.Contains(
    FilterTable[ColumnWithFilterValues]),  // The 'Target'
    TableToFilter[ColumnName]
  )
)

 

In this instance, PQ will scan FilterTable in order to check the values against each row value in TableToFilter.

Therefore, if you buffer the FilterTable values first, PQ can just hit the memory for these values. You can do this inline (we created a separate step for this previously), like this:

Table.SelectRows(
  TableToFilter,
  each List.Contains(
    List.Buffer(FilterTable[ColumnWithFilterValues])),  // The 'Target'
    TableToFilter[ColumnName]
  )
)

 

This example is particularly important when applying this technique on foldable sources, as doing it this way means you can stream non-foldable lists to foldable queries and maintain folding 👍

 

Hope this helps?

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

3 REPLIES 3
campelliann
Helper V
Helper V

@BA_Pete your posts are always insightful :). 

But for instance creating a Table.Buffer on an excel Sharepoint File, after the "call" to sharepoint. Or let's say I retrieve data from an API and in the step imediatly after the call, could that improve performance (which is affected by many variables as you say).

 

Generally, I'd say no, it wouldn't improve performance significantly, if at all. But, again, there's many variables.

If you bring in a 200 million row dataset with 50 columns you could potentially hurt performance by trying to buffer it as you may get close to, or exceed, your memory limit. This could slow down all of the mashup container processes, cause the query to fail/crash, or may even limit the functioning of the processing machine itself.

Furthermore, I don't think buffering prevents PQ making calls to the source if it needs to, it just helps to limit/prevent multiple table scans internally.

As a rule of thumb, I'd keep buffering limited to relatively small utility tables and lists, such as in my examples. As above, used too generously or incorrectly, buffering data has the potential to cause more harm than good.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




BA_Pete
Super User
Super User

Hi @campelliann ,

 

This is an enormous question with many, many variables. However, very broadly, once you understand what Table.Buffer and List.Buffer do, this will often inform you of when is a good time to use them.

At their core, these functions load the entire target into memory. Therefore, one key use is to use them when Power Query would otherwise make multiple scans of the target.

For example:

If you want to do a conditional join in PQ (in this example to merge values from an SCD table), you may find yourself creating a column something like this to achieve it:

= Table.AddColumn(
  previousStep,
  "columnName",
  (OT) => Table.SelectRows(
    Table2,  // The 'Target'
    each OT[Field] = [Field]
      and OT[dateField] >= [table2StartDate]
      and OT[matchField2] <= [table2EndDate]
    ){0}[valueToMerge],
  type number
)

 

In this instance, PQ will need to keep rescanning Table2 for the conditions on each row. Therefore, if we buffer Table2 in the Table1 query (T2Buffer = Table.Buffer(Table2) ), then swap the target from Table2 to T2Buffer, PQ can just hit the memory for what it needs each time.

 

List.Buffer example:

If you want to filter a table by the values in another table, you may find yourself righting code like this:

Table.SelectRows(
  TableToFilter,
  each List.Contains(
    FilterTable[ColumnWithFilterValues]),  // The 'Target'
    TableToFilter[ColumnName]
  )
)

 

In this instance, PQ will scan FilterTable in order to check the values against each row value in TableToFilter.

Therefore, if you buffer the FilterTable values first, PQ can just hit the memory for these values. You can do this inline (we created a separate step for this previously), like this:

Table.SelectRows(
  TableToFilter,
  each List.Contains(
    List.Buffer(FilterTable[ColumnWithFilterValues])),  // The 'Target'
    TableToFilter[ColumnName]
  )
)

 

This example is particularly important when applying this technique on foldable sources, as doing it this way means you can stream non-foldable lists to foldable queries and maintain folding 👍

 

Hope this helps?

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Helpful resources

Announcements
Carousel_PBI_Wave1

2023 Release Wave 1 Plans

Power BI release plans for 2023 release wave 1 describes all new features releasing from April 2023 through September 2023.

Power BI Summit Carousel 2

Global Power BI Training

Make sure you register today for the Power BI Summit 2023. Don't miss all of the great sessions and speakers!

BizApps LATAM 2023

Business Application LATAM Summit 2023

Join the biggest FREE Business Applications Event in LATAM this February.

Power Platform Bootcamp

Global Power Platform Bootcamp

In this bootcamp we will deep-dive into Microsoft’s Power Platform stack with hands-on sessions and labs, delivered to you by experts and community leaders.

Top Solution Authors
Top Kudoed Authors