cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
TheCreepster1
Helper I
Helper I

Very Slow Performance on Group By

Hi,


Unfortunately, our inventory and sales data reside in different systems so I have had to construct two data queries to bring this information in to Power Query. I have constructed the two queries so that in they are laid out in the same format:

 

TheCreepster1_0-1647599904876.png

I've then appended the two queries and performed a group by on the Location and Product, aggregating the measure fields. However, the volume of data is such that it's taken nearly an hour so far to load the data into the Power BI data model, with no sign that is near to being finished. In total I have approx 13.5m rows of inventory data and 2.5m rows of sales data. Below is the actual M code being generated by Power Query. Is there anything I can do to make this more efficient? I thought the append rather than a merge with full outer join would be a more efficient way of doing this, but so far it's not proving that way!

 

 

let
    #"Sales Query" = SapHana.Database("xxxxxxxx:xxxxx", [Query="select#(tab)CREATION_WEEK as WEEK_ID,#(lf)#(tab)#(tab)ltrim(gsv.PRCTR, '0') as SITE,#(lf)#(tab)#(tab)GENERIC,#(lf)#(tab)#(tab)sum(UNITS) as UNITS,#(lf)#(tab)#(tab)sum(NET_VALUE) as NET_VALUE,#(lf)#(tab)#(tab)sum(NDW) as COST_OF_SALES,#(lf)#(tab)#(tab)sum(GROSS_MARGIN) as GROSS_MARGIN,#(lf)#(tab)#(tab)0 as STOCK#(lf)#(lf)from aggregates.global_single_view gsv#(lf)#(lf)left join ""_SYS_BIC"".""EuropeMerch/CV_MER_LOCATION_CHANNEL_HIERARCHY"" lh on case when gsv.vtweg = '10' then gsv.SOLD_TO else gsv.PRCTR end = lh.LOCATION_JOIN and lh.LOCATION_JOIN != ''#(lf)#(lf)where LEDGER = 'LOCAL'#(lf)and TYPE like 'SALE%'#(lf)and lh.REGION = 'Americas'#(lf)and lh.CHANNEL = 'Direct to Consumer'#(lf)and CREATION_WEEK >201899#(lf)group by CREATION_WEEK,#(lf)#(tab)#(tab)gsv.PRCTR,#(lf)#(tab)#(tab)GENERIC#(lf)", Implementation="2.0"]),
    #"Appended Query" = Table.Combine({#"Sales Query", #"Stock Query"}),
    #"Grouped Rows" = Table.Group(#"Appended Query", {"WEEK_ID", "SITE", "GENERIC"}, {{"UNITS", each List.Sum([UNITS]), type nullable number}, {"NET_VALUE", each List.Sum([NET_VALUE]), type nullable number}, {"COST_OF_SALES", each List.Sum([COST_OF_SALES]), type nullable number}, {"GROSS_MARGIN", each List.Sum([GROSS_MARGIN]), type nullable number}, {"STOCK", each List.Sum([STOCK]), type nullable number}})
in
    #"Grouped Rows"

 

 

1 ACCEPTED SOLUTION
BA_Pete
Super User
Super User

Hi @TheCreepster1 ,

 

Don't append and merge at all.

 

Set your model up in the STAR SCHEMA structure, where your Sales and Inventory tables are the FACT tables, and any common fields (e.g. Location, Product etc.) are aggregated into DIMENSION tables that are then related to both fact tables.

You can then write measures that span both fact tables, as well as display data from both in visuals using the dimension table fields to maintain commonality between them.

https://docs.microsoft.com/en-us/power-bi/guidance/star-schema 

 

Another option would be to have your SQL query written on the source as a view, rather than being a native query from Power Query. Presuming both of your tables are on the same DB/server, your append/group steps should be able to be folded back to the source, which would be significantly quicker at performing these functions than Power Query.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

3 REPLIES 3
BA_Pete
Super User
Super User

Hi @TheCreepster1 ,

 

Don't append and merge at all.

 

Set your model up in the STAR SCHEMA structure, where your Sales and Inventory tables are the FACT tables, and any common fields (e.g. Location, Product etc.) are aggregated into DIMENSION tables that are then related to both fact tables.

You can then write measures that span both fact tables, as well as display data from both in visuals using the dimension table fields to maintain commonality between them.

https://docs.microsoft.com/en-us/power-bi/guidance/star-schema 

 

Another option would be to have your SQL query written on the source as a view, rather than being a native query from Power Query. Presuming both of your tables are on the same DB/server, your append/group steps should be able to be folded back to the source, which would be significantly quicker at performing these functions than Power Query.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Hi,

 

Don't know why this didn't even cross my mind! Everything was set up to join as a star schema but I had it in my head I needed to smash the inventory and sales together for the end user. Unfortunately the two data sources are coming from different databases, hence the inability to do it in the SQL back end but I've managed to sort it through your first suggestion, thank you!

Vijay_A_Verma
Super User
Super User

Try buffering the table to improve the performance.

Replace 

#"Appended Query" = Table.Combine({#"Sales Query", #"Stock Query"}),

with 

#"Appended Query" = Table.Buffer(Table.Combine({#"Sales Query", #"Stock Query"})),

Helpful resources

Announcements
Carousel_PBI_Wave1

2023 Release Wave 1 Plans

Power BI release plans for 2023 release wave 1 describes all new features releasing from April 2023 through September 2023.

Power BI Summit Carousel 2

Global Power BI Training

Make sure you register today for the Power BI Summit 2023. Don't miss all of the great sessions and speakers!

Thank you 2022 Review

2022 Monthly Feature Releases

We had a great 2022 with a ton of feature releases to help you drive a data culture.

Top Solution Authors
Top Kudoed Authors