cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
rudisoft
Frequent Visitor

List.Contains() is slow 2.0

Hi all,

 

I'm having a power query on a medium sized table (15k rows, ~30 columns) where I need to iterate through  (working on atlassian jira rest api) like so:

 

result = Table.ReplaceValue(buffered_table, // yes, this is buffered
             each [Epic Link],
             each if [Epic Link]<>null
                     then
                             if List.Contains(buffered_list,[Epic Link])
                             then [Epic Link]
                             else ""
                     else null,
              Replacer.ReplaceText,{"Epic Link"}
         ),

The buffered table has internal self-references (epics are issues that are referenced by other issues) that are problematic on the DAX side.

 

I've read through https://community.powerbi.com/t5/Desktop/List-Contains-is-slow/m-p/950128#M455289, read through  @ImkeF performance blogposts and checked @Greg_Deckler's and @chriswebb18  sites as well and following (I believe) most of the hints I could find. 

 

I'm however getting read rates (During "Loading data into model ...") of about 10 rows/sec with the above code inside the power query. Without the code above (monitoring using fiddler) it's at least 100 times faster. I'd love to perform the code in power query since the dataset's consumers (as well as myself) have trouble filtering for this using DAX.

 

Any further hints to improve performance would be highly appreceated. Maybe there's a way to have List.Contains use an index?

1 ACCEPTED SOLUTION
AlexisOlson
Super User
Super User

Instead of using Table.Replace, try using Table.AddColumn to define the same thing as a custom column (and then delete the original [Epic Link] column and rename the new custom column to [Epic Link]).

View solution in original post

5 REPLIES 5
rudisoft
Frequent Visitor

Thanks for the help @AlexisOlson and @watkinnc , the hints combined got the job done:

 

// this is for performance boosting the following list.contains function call
buffered_table = Table.Buffer(renamed),
buffered_list = List.Buffer(List.Distinct(buffered_table[issue key])),

result_a = Table.ReplaceValue(buffered_table,
each
[Epic Link],
each
if [Epic Link]<>null
then
if List.Contains(buffered_list,[Epic Link])
then [Epic Link]
else ""
else null,
Replacer.ReplaceText,{"Epic Link"}
),

result_b = Table.AddColumn(buffered_table,
"Epic Link (unlinked)",
each if [Epic Link]<>null
then
if List.Contains(buffered_list, [Epic Link])
then [Epic Link]
else ""
else null)

 

Result_b is about 75 times faster than result_a, but I'm puzzled that:

- adding a column and putting (new) values in there is faster than replacing a value in another column

- List.Buffer(List.Distinct(buffered_table[issue key])) should be faster than List.Distinct(List.Buffer(buffered_table[issue key]))

Both of those seem intuitive to me. Adding a column is often faster than modifying an existing one.

 

For your second point, buffered_table[issue key] is already in memory as part of buffered_table, so buffering it into memory separately as a list probably(?) doesn't help that much. Since you are using buffered_list in List.Contains for each row, you want it to be as fast as possible and using a distinct version is more efficient. Having the distinct version already computed and explicitly buffered in that form is even better.

watkinnc
Super User
Super User

You could also try marking the list as distinct, which almost always makes a difference:

 

 if List.Contains(List.Distinct(buffered_list)...

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Probably even better to use List.Distinct before List.Buffer if possible.

AlexisOlson
Super User
Super User

Instead of using Table.Replace, try using Table.AddColumn to define the same thing as a custom column (and then delete the original [Epic Link] column and rename the new custom column to [Epic Link]).

Helpful resources

Announcements
Power Platform Conf 2022 768x460.jpg

Join us for Microsoft Power Platform Conference

The first Microsoft-sponsored Power Platform Conference is coming in September. 100+ speakers, 150+ sessions, and what's new and next for Power Platform.

Power BI Dev Camp Session 23 768x460.jpg

Check it Out!

Mark your calendars and join us on Thursday, June 30 at 11a PDT for a great session with Ted Pattison!

June 20 episode 7 with aka link 768x460.jpg

The Power BI Community Show

Join us on June 20 at 11 am PDT when Kim Manis shares the latest on Azure Synapse analytics, the Microsoft Intelligent Data Platform, and notable Power BI Updates from Microsoft Build 2022.

Top Solution Authors