Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi all,
I'm having a power query on a medium sized table (15k rows, ~30 columns) where I need to iterate through (working on atlassian jira rest api) like so:
result = Table.ReplaceValue(buffered_table, // yes, this is buffered
each [Epic Link],
each if [Epic Link]<>null
then
if List.Contains(buffered_list,[Epic Link])
then [Epic Link]
else ""
else null,
Replacer.ReplaceText,{"Epic Link"}
),
The buffered table has internal self-references (epics are issues that are referenced by other issues) that are problematic on the DAX side.
I've read through https://community.powerbi.com/t5/Desktop/List-Contains-is-slow/m-p/950128#M455289, read through @ImkeF performance blogposts and checked @Greg_Deckler's and @Anonymous sites as well and following (I believe) most of the hints I could find.
I'm however getting read rates (During "Loading data into model ...") of about 10 rows/sec with the above code inside the power query. Without the code above (monitoring using fiddler) it's at least 100 times faster. I'd love to perform the code in power query since the dataset's consumers (as well as myself) have trouble filtering for this using DAX.
Any further hints to improve performance would be highly appreceated. Maybe there's a way to have List.Contains use an index?
Solved! Go to Solution.
Instead of using Table.Replace, try using Table.AddColumn to define the same thing as a custom column (and then delete the original [Epic Link] column and rename the new custom column to [Epic Link]).
Both of those seem intuitive to me. Adding a column is often faster than modifying an existing one.
For your second point, buffered_table[issue key] is already in memory as part of buffered_table, so buffering it into memory separately as a list probably(?) doesn't help that much. Since you are using buffered_list in List.Contains for each row, you want it to be as fast as possible and using a distinct version is more efficient. Having the distinct version already computed and explicitly buffered in that form is even better.
Thanks for the help @AlexisOlson and @watkinnc , the hints combined got the job done:
// this is for performance boosting the following list.contains function call
buffered_table = Table.Buffer(renamed),
buffered_list = List.Buffer(List.Distinct(buffered_table[issue key])),
result_a = Table.ReplaceValue(buffered_table,
each
[Epic Link],
each
if [Epic Link]<>null
then
if List.Contains(buffered_list,[Epic Link])
then [Epic Link]
else ""
else null,
Replacer.ReplaceText,{"Epic Link"}
),
result_b = Table.AddColumn(buffered_table,
"Epic Link (unlinked)",
each if [Epic Link]<>null
then
if List.Contains(buffered_list, [Epic Link])
then [Epic Link]
else ""
else null)
Result_b is about 75 times faster than result_a, but I'm puzzled that:
- adding a column and putting (new) values in there is faster than replacing a value in another column
- List.Buffer(List.Distinct(buffered_table[issue key])) should be faster than List.Distinct(List.Buffer(buffered_table[issue key]))
Both of those seem intuitive to me. Adding a column is often faster than modifying an existing one.
For your second point, buffered_table[issue key] is already in memory as part of buffered_table, so buffering it into memory separately as a list probably(?) doesn't help that much. Since you are using buffered_list in List.Contains for each row, you want it to be as fast as possible and using a distinct version is more efficient. Having the distinct version already computed and explicitly buffered in that form is even better.
Got it solved as described above. The column was distinct already.
You could also try marking the list as distinct, which almost always makes a difference:
if List.Contains(List.Distinct(buffered_list)...
--Nate
Probably even better to use List.Distinct before List.Buffer if possible.
Instead of using Table.Replace, try using Table.AddColumn to define the same thing as a custom column (and then delete the original [Epic Link] column and rename the new custom column to [Epic Link]).
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
102 | |
53 | |
21 | |
12 | |
12 |