Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi All,
i am facing terrible performance in a power query table after a couple of joins and transformation.
Let me give you a broad overview over the steps i do.
It would be great to find out which option i have to apply to improve the performance.
My steps:
1) I load a table from Excel (12k rows and 20 columns):
2) I add around 38 calculated columns with some transformations with power query
So far, it takes nearly 50 secs to load and create the Table A1 with all the added columns.
When I try to join Table A1 with another table B, I see that it does all the calculations of all steps again.
Is there anyway to use a copy of my final table (something like copy values in a new table) in order to use it for the join ?
I tried the duplicate and reference options of my final table A1, but It does the recalculation every time when I do the join with My duplicated/referred table A1 and the table B ....
Thanks in advance
Kind regards
Solved! Go to Solution.
No!
You have to use a separate expression for
Table.SelectColumns(Table.SelectRows(Table1, (sel)=> sel[#"filter_item"] = 1)),{"key_1", "field_1", "field_2", "field_3", "field_4", "field_5"})
for instance:
TABBUFF=Table.Buffer(Table.SelectColumns(Table.SelectRows(Table1, (sel)=> sel[#"filter_item"] = 1)),{"key_1", "field_1", "field_2", "field_3", "field_4", "field_5"}),
then the rest (check the sintax: ican't do)
#"join_sum_T1_T2_" = let #"GroupedLines" = Table.Group(TABBUFF, {"key", "filter_item"}, {{"sum_field_1", each List.Sum([#"field_1"]), type nullable number}, {"sum_field_2", each List.Sum([field_32), type nullable number}, {"Ssum_field_3", each List.Sum([#"field_3"]), type nullable number}, {"sum_field_4", each List.Sum([#"field_4"]), type nullable number}}), #"MergedQueries" = Table.NestedJoin(#"Table2_source", {"key_2"},
#"GroupedLines", {"key_1"}, "Table1_sum", JoinKind.LeftOuter),
#"Tableexpanded" = Table.ExpandTableColumn(#"MergedQueries", "Table1_sum", {"sum_field_1", "sum_field_2", "sum_field_3", "sum_field_4"}, {"sum_field_1", "sum_field_2", "sum_field_3", "sum_field_4"})
in #"Tableexpanded",
@Jimmy801 @Anonymous
I tried the Table.buffer but it takes the same time to evaluate all the queries.
I will try to refactor my previous code to optimize its performances
Thanks anyway
Kind regards
Saam
But before using table.buffer did you strip this piece of code from inside the table.group function?
Table.SelectColumns(Table.SelectRows(Table1, (sel)=> sel[#"filter_item"] = 1),{"key_1", "field_1", "field_2", "field_3", "field_4", "field_5"})
Can you show the code you used?
The code I used is:
#"join_sum_T1_T2_" = let #"GroupedLines" = Table.Group(Table.Buffer(Table.SelectColumns(Table.SelectRows(Table1, (sel)=> sel[#"filter_item"] = 1)),{"key_1", "field_1", "field_2", "field_3", "field_4", "field_5"}), {"key", "filter_item"}, {{"sum_field_1", each List.Sum([#"field_1"]), type nullable number}, {"sum_field_2", each List.Sum([field_32), type nullable number}, {"Ssum_field_3", each List.Sum([#"field_3"]), type nullable number}, {"sum_field_4", each List.Sum([#"field_4"]), type nullable number}}), #"MergedQueries" = Table.NestedJoin(#"Table2_source", {"key_2"}, #"GroupedLines", {"key_1"}, "Table1_sum", JoinKind.LeftOuter), #"Tableexpanded" = Table.ExpandTableColumn(#"MergedQueries", "Table1_sum", {"sum_field_1", "sum_field_2", "sum_field_3", "sum_field_4"}, {"sum_field_1", "sum_field_2", "sum_field_3", "sum_field_4"}) in #"Tableexpanded",
No!
You have to use a separate expression for
Table.SelectColumns(Table.SelectRows(Table1, (sel)=> sel[#"filter_item"] = 1)),{"key_1", "field_1", "field_2", "field_3", "field_4", "field_5"})
for instance:
TABBUFF=Table.Buffer(Table.SelectColumns(Table.SelectRows(Table1, (sel)=> sel[#"filter_item"] = 1)),{"key_1", "field_1", "field_2", "field_3", "field_4", "field_5"}),
then the rest (check the sintax: ican't do)
#"join_sum_T1_T2_" = let #"GroupedLines" = Table.Group(TABBUFF, {"key", "filter_item"}, {{"sum_field_1", each List.Sum([#"field_1"]), type nullable number}, {"sum_field_2", each List.Sum([field_32), type nullable number}, {"Ssum_field_3", each List.Sum([#"field_3"]), type nullable number}, {"sum_field_4", each List.Sum([#"field_4"]), type nullable number}}), #"MergedQueries" = Table.NestedJoin(#"Table2_source", {"key_2"},
#"GroupedLines", {"key_1"}, "Table1_sum", JoinKind.LeftOuter),
#"Tableexpanded" = Table.ExpandTableColumn(#"MergedQueries", "Table1_sum", {"sum_field_1", "sum_field_2", "sum_field_3", "sum_field_4"}, {"sum_field_1", "sum_field_2", "sum_field_3", "sum_field_4"})
in #"Tableexpanded",
Thank you all guys.
It worked that way !
"It worked that way !"
Could You quantify the time difference between before and after "that way"?
It drastically decreased from 10' to 4'
Hello @SaaM
do you how long it takes to load after the join? Do you there some other transformation of the joined data? What data you are using out of the joined data?
Could you post the M-code here `?
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
Hello,
Here is my M-code for one of the queries to create my columns and it takes too much time (around 10 mn)
I follow the same steps for other columns
Just for information my Table1 takes around 50 sec to add the 38 columns to the original source
my key_1 and key_2 are Text and I sum quantities
#"join_sum_T1_T2_" = let
#"GroupedLines" = Table.Group(Table.SelectColumns(Table.SelectRows(Table1, (sel)=> sel[#"filter_item"] = 1),{"key_1", "field_1", "field_2", "field_3", "field_4", "field_5"}), {"key", "filter_item"}, {{"sum_field_1", each List.Sum([#"field_1"]), type nullable number}, {"sum_field_2", each List.Sum([field_32), type nullable number}, {"Ssum_field_3", each List.Sum([#"field_3"]), type nullable number}, {"sum_field_4", each List.Sum([#"field_4"]), type nullable number}}),
#"MergedQueries" = Table.NestedJoin(#"Table2_source", {"key_2"}, #"GroupedLines", {"key_1"}, "Table1_sum", JoinKind.LeftOuter),
#"Tableexpanded" = Table.ExpandTableColumn(#"MergedQueries", "Table1_sum", {"sum_field_1", "sum_field_2", "sum_field_3", "sum_field_4"}, {"sum_field_1", "sum_field_2", "sum_field_3", "sum_field_4"})
in
#"Tableexpanded",
The equivalent SQL code would be:
with GroupedLines as (select key, sum(field_1) as sum_field_1, sum(field_2) as sum_field_2, sum(field_3) as sum_field_3, sum(field_4) as sum_field_4
from Table1
where filter_item = 1
group by key)
select * from Table2_source
left join GroupedLines
on key_1 = key_2
Thanks
Kind regards
Hello @SaaM
you are referencing 2 external tables. One is the Table1 that takes 10 seconds to load. What about the table Table2_source? How long this takes to load?
At the end your final query needs to load both tables and then apply also your steps.
I also tried to apply some changes to your code. Check out if this helps also
#"join_sum_T1_T2_" = let
BufferTable1 = Table.Buffer(Table.SelectColumns(Table.SelectRows(Table1, (sel)=> sel[#"filter_item"] = 1),{"key_1", "field_1", "field_2", "field_3", "field_4", "field_5"})),
BufferTable2 = Table.Buffer(#"Table2_source"),
#"GroupedLines" = Table.Group(BufferTable1 , {"key", "filter_item"}, {{"sum_field_1", each List.Sum([#"field_1"]), type nullable number}, {"sum_field_2", each List.Sum([field_32), type nullable number}, {"Ssum_field_3", each List.Sum([#"field_3"]), type nullable number}, {"sum_field_4", each List.Sum([#"field_4"]), type nullable number}}),
#"MergedQueries" = Table.NestedJoin(BufferTable2 , {"key_2"}, #"GroupedLines", {"key_1"}, "Table1_sum", JoinKind.LeftOuter),
#"Tableexpanded" = Table.ExpandTableColumn(#"MergedQueries", "Table1_sum", {"sum_field_1", "sum_field_2", "sum_field_3", "sum_field_4"}, {"sum_field_1", "sum_field_2", "sum_field_3", "sum_field_4"})
in
#"Tableexpanded",
If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too
Have fun
Jimmy
If adding 38 columns takes 50 ", the 10 'depends on the rest of the code not on recalculating the columns. try to strip the table.SelectRows and table select column statements from within the table.group and buffer the last table before grouby.
You can try to use Table.Buffer() around the last step of one of your tables. It would be helpful though to know what the 38 custom columns are - (calculated columns are a DAX thing, not Power Query) and then we could see if you had a model issue or a better way to approach this.
How to get good help fast. Help us help you.
How To Ask A Technical Question If you Really Want An Answer
How to Get Your Question Answered Quickly - Give us a good and concise explanation
How to provide sample data in the Power BI Forum - Provide data in a table format per the link, or share an Excel/CSV file via OneDrive, Dropbox, etc.. Provide expected output using a screenshot of Excel or other image. Do not provide a screenshot of the source data. I cannot paste an image into Power BI tables.
DAX is for Analysis. Power Query is for Data Modeling
Proud to be a Super User!
MCSA: BI ReportingThanks for your reply.
I tried your solution Table.Buffer(myTable), but it reacts the same if i did a reference or a duplicate, i.e it executes all the previous queries again and again.
What I meant by calculated columns, it's added columns to the original Source with power query formulas (sums of columns with group by key, row counts with group by key etc...).
I start in my model with my original Table with 20 columns and end with a Table with 58 columns (20+38 added columns)
What I am looking for is something similar or that does the same thing as in Excel (Copy and paste table values without the formulas and calculus).
Kind regards