Solved: Buffered lists with List.Buffer()

imranamikhan · ‎12-30-2020

Hi everyone,

I have a large table with a list of customers and I wanted to identify if any customers in the table exist in a separate keyword list of distinct customers using DAX. I have received support from the forums on using an M code solution but have found performance to be extremely slow for 200,000+ records.

Alternatively, I have used Power Query's fuzzy matching feature which is significantly faster than custom M code, but it is not as accurate in identifying matches.

I understand it is possible to use buffered lists with List.Buffer() to significantly improve performance, but I do not quite know if I am using this correctly.

Ref: https://community.powerbi.com/t5/Desktop/DAX-Check-if-text-value-exists-in-keyword-list-then-return/...

let
    Source = Table.Buffer(SharePoint.Files("https://rbcom.sharepoint.com/sites/CustomerScope", [ApiVersion = 15])),
    #"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/" = 
    Source{[Name="PBI_Customer_Master_Tables.csv",#"Folder Path"="https://rbcom.sharepoint.com/sites/CustomerScope/Shared Documents/1. Customer Master Cleansing/3. PowerBI/0. Power BI Master Files/0. Test Data/"]}[Content],
    #"Imported CSV" = Csv.Document(#"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/",[Delimiter=",", Columns=17, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Customer", type text}, {"Data Source", type text}, {"Customer Text", type text}, {"Data Origin", type text}, {"IAG", type text}, {"IAG Text", type text}, {"IBG", type text}, {"IBG Text", type text}, {"Trade Type Text", type text}, {"Trade Type", type text}, {"Parent Customer Code", type text}, {"Group Company Code", type text}, {"Record Timestamp", type date}}),
    #"Appended Query" = Table.Combine({#"Changed Type", DB_CustomerMasterData_History}),
    #"Removed Other Columns" = Table.SelectColumns(#"Appended Query",{"Customer", "Data Source", "Customer Text", "Data Origin", "IAG", "IAG Text", "IBG", "IBG Text", "Trade Type Text", "Trade Type", "Parent Customer Code", "Group Company Code", "Record Timestamp"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Trade Type", type text}}),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Changed Type3", "Text After Delimiter", each Text.AfterDelimiter([Customer], "/"), type text),
    #"Added Custom" = Table.AddColumn(#"Inserted Text After Delimiter", "eComm Keyword", each List.Accumulate
            (
                List.Buffer(#"Tbl_eComm"[Keyword]),

                "",

                (state, current) => 
                
                    if List.Contains(Text.Split([Customer Text], " "), current, Comparer.OrdinalIgnoreCase) and state = ""
                    
                    then state & " " & current 
                    
                    else state
                
    ))
in
    #"Added Custom"

Best regards,

Ami

PhilipTreacy · ‎12-30-2020

Hi @imranamikhan

Try buffering the list before List.Accumulate

let
    Source = Table.Buffer(SharePoint.Files("https://rbcom.sharepoint.com/sites/CustomerScope", [ApiVersion = 15])),
    #"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/" = 
    Source{[Name="PBI_Customer_Master_Tables.csv",#"Folder Path"="https://rbcom.sharepoint.com/sites/CustomerScope/Shared Documents/1. Customer Master Cleansing/3. PowerBI/0. Power BI Master Files/0. Test Data/"]}[Content],
    #"Imported CSV" = Csv.Document(#"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/",[Delimiter=",", Columns=17, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Customer", type text}, {"Data Source", type text}, {"Customer Text", type text}, {"Data Origin", type text}, {"IAG", type text}, {"IAG Text", type text}, {"IBG", type text}, {"IBG Text", type text}, {"Trade Type Text", type text}, {"Trade Type", type text}, {"Parent Customer Code", type text}, {"Group Company Code", type text}, {"Record Timestamp", type date}}),
    #"Appended Query" = Table.Combine({#"Changed Type", DB_CustomerMasterData_History}),
    #"Removed Other Columns" = Table.SelectColumns(#"Appended Query",{"Customer", "Data Source", "Customer Text", "Data Origin", "IAG", "IAG Text", "IBG", "IBG Text", "Trade Type Text", "Trade Type", "Parent Customer Code", "Group Company Code", "Record Timestamp"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Trade Type", type text}}),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Changed Type3", "Text After Delimiter", each Text.AfterDelimiter([Customer], "/"), type text),
    
    BufferedList = List.Buffer(#"Tbl_eComm"[Keyword]),

    #"Added Custom" = Table.AddColumn(#"Inserted Text After Delimiter", "eComm Keyword", each List.Accumulate
            (
                BufferedList,

                "",

                (state, current) => 
                
                    if List.Contains(Text.Split([Customer Text], " "), current, Comparer.OrdinalIgnoreCase) and state = ""
                    
                    then state & " " & current 
                    
                    else state
                
    ))
in
    #"Added Custom"

regards

Phil

Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.

Blog :: YouTube Channel :: Connect on Linkedin

Proud to be a Super User!

View solution in original post

mahoneypat · ‎12-30-2020

I agree you should buffer outside the List.Accumulate, but re-reading your original post I believe this would perform faster in DAX. If you can provide some example data in the two tables/columns, a column or measure expression can be suggested.

Regards,

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

View solution in original post

mahoneypat · ‎12-31-2020

@imranamikhan Thanks for the sample data. This DAX column expression returned all the same results as the M approach. I called your tables Data and Keyword, so you'll need to change the expression and use your actual table names. There is no relationship between the tables. Runs fast on this small dataset, but please try on your actual.

NewColumn =
VAR thiscustomer = Customer[Customer Text]
VAR matchtable =
    FILTER (
        VALUES ( Keyword[Keyword] ),
        SEARCH (
            Keyword[Keyword],
            thiscustomer,
            1,
            0
        ) > 0
    )
VAR matchkeyword =
    MAXX (
        matchtable,
        Keyword[Keyword]
    )
RETURN
    IF (
        ISBLANK ( matchkeyword ),
        "#No Match",
        matchkeyword
    )

Regards,

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

View solution in original post

imranamikhan · ‎01-01-2021

Thanks @mahoneypat

This is a super impressive formula and I am getting the exact number of matches for 200k+ records as the M query. Performance times when running a refresh are equal to the M query version. So I now have a choice between both options.

Thanks again all for the support.

mahoneypat · ‎12-31-2020

@imranamikhan Thanks for the sample data. This DAX column expression returned all the same results as the M approach. I called your tables Data and Keyword, so you'll need to change the expression and use your actual table names. There is no relationship between the tables. Runs fast on this small dataset, but please try on your actual.

NewColumn =
VAR thiscustomer = Customer[Customer Text]
VAR matchtable =
    FILTER (
        VALUES ( Keyword[Keyword] ),
        SEARCH (
            Keyword[Keyword],
            thiscustomer,
            1,
            0
        ) > 0
    )
VAR matchkeyword =
    MAXX (
        matchtable,
        Keyword[Keyword]
    )
RETURN
    IF (
        ISBLANK ( matchkeyword ),
        "#No Match",
        matchkeyword
    )

Regards,

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

WagnerCR · ‎05-10-2023

¡Hola! @mahoneypat , tengo una pregunta. Si no quiero que me devuelva como resultado "matchkeyword", sino los valores de una columna "result", que corresponden al segmento de esos datos.

Keyword[Keyword] - Keyword[result] Customer[Customer Text]

factura Consultor Junior Duplicidad de factura

Cobro Consultor Senior Cobro no aplicado

Devoluciones Consultor Medio Cambio de devoluciones anual no a aplicado

Gracias !!! Att. Frank

imranamikhan · ‎12-30-2020

Good to hear from you and thanks @PhilipTreacy - this is significantly faster (down from 40 minutes to 15). And I believe the rest of the time is spent on loading other queries I have running.

Hi @mahoneypat - I have added some sample data to OneDrive: Onedrive Link

@lbendlin - thank you for the explaination.

mahoneypat · ‎12-30-2020

I agree you should buffer outside the List.Accumulate, but re-reading your original post I believe this would perform faster in DAX. If you can provide some example data in the two tables/columns, a column or measure expression can be suggested.

Regards,

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

PhilipTreacy · ‎12-30-2020

Hi @imranamikhan

Try buffering the list before List.Accumulate

let
    Source = Table.Buffer(SharePoint.Files("https://rbcom.sharepoint.com/sites/CustomerScope", [ApiVersion = 15])),
    #"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/" = 
    Source{[Name="PBI_Customer_Master_Tables.csv",#"Folder Path"="https://rbcom.sharepoint.com/sites/CustomerScope/Shared Documents/1. Customer Master Cleansing/3. PowerBI/0. Power BI Master Files/0. Test Data/"]}[Content],
    #"Imported CSV" = Csv.Document(#"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/",[Delimiter=",", Columns=17, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Customer", type text}, {"Data Source", type text}, {"Customer Text", type text}, {"Data Origin", type text}, {"IAG", type text}, {"IAG Text", type text}, {"IBG", type text}, {"IBG Text", type text}, {"Trade Type Text", type text}, {"Trade Type", type text}, {"Parent Customer Code", type text}, {"Group Company Code", type text}, {"Record Timestamp", type date}}),
    #"Appended Query" = Table.Combine({#"Changed Type", DB_CustomerMasterData_History}),
    #"Removed Other Columns" = Table.SelectColumns(#"Appended Query",{"Customer", "Data Source", "Customer Text", "Data Origin", "IAG", "IAG Text", "IBG", "IBG Text", "Trade Type Text", "Trade Type", "Parent Customer Code", "Group Company Code", "Record Timestamp"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Trade Type", type text}}),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Changed Type3", "Text After Delimiter", each Text.AfterDelimiter([Customer], "/"), type text),
    
    BufferedList = List.Buffer(#"Tbl_eComm"[Keyword]),

    #"Added Custom" = Table.AddColumn(#"Inserted Text After Delimiter", "eComm Keyword", each List.Accumulate
            (
                BufferedList,

                "",

                (state, current) => 
                
                    if List.Contains(Text.Split([Customer Text], " "), current, Comparer.OrdinalIgnoreCase) and state = ""
                    
                    then state & " " & current 
                    
                    else state
                
    ))
in
    #"Added Custom"

regards

Phil

Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.

Blog :: YouTube Channel :: Connect on Linkedin

Proud to be a Super User!

imranamikhan · ‎12-30-2020

Hi @lbendlin - Yes I only need to use once per power query refresh. So whenever new data comes in, I refresh Power Query to see the latest results.

I do not understand what you meant by re-use. If you could explain that would be appreciated.

lbendlin · ‎12-30-2020

List.Buffer() and Table.Buffer() are pointless unless you plan to re-use their data in the same query multiple times. All they do is pre-spool the data into memory, so that you can reference it later at better speed. If you don't reference the list or the table later in the query then you are wasting memory.

lbendlin · ‎12-30-2020

You only get a performance boost if you re-use your buffer (table or list). If you declare it and then don't use it at least twice, or - worse, as in your case - recreate the buffer in each iteration of List.Accumulate, then performance will be worse, not better.

Buffered lists with List.Buffer()

Gracias !!! Att. Frank

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly