cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
addicted87
Helper I
Helper I

Compare to reference group

hey power bi users,

i've got following issue with my power bi report:

 

What I'm going to do?

I want to compare one single supplier to a reference group consisting of 1...n different suppliers with same properties.

 

For Example:

 

Supplier 1:

column_customers = '50 - 100 customers'

column_region = germany

column_employees = '<100 employees'

 

So the reference group consists of all suppliers, which have the same properties (except supplier 1) as well:

column_customers = '50 - 100 customers'

column_region = germany

column_employees = '<100 employees'

 

Lets say, there are 200000 suppliers and 5 of them (No. 7, 100, 450, 1230, 1999) meet the requirements/properties of supplier 1.

 

Here is my dax formula:

 

number of suppliers in reference group =
VAR customers= max(supplier[column_customers])
VAR region = max(supplier[column_region])
VAR employees = max(supplier[column_employees])

VAR supplier = max(supplier[column_supplier_id])

 

RETURN CALCULATE(
distinctcount(supplier[column_supplier_id]); ALL(supplier[column_supplier_id]);supplier[column_customers]=customers;supplier[column_region]=region;supplier[column_employees] = employees; supplier[column_supplier_id] <> supplier)

 

So the result in a straight table looks like this:

 

 |

supplier_idnumber of suppliers in reference group
15
213
325
......
200000133

 

 

This one is already working, but pretty slow. Is there a way to optimize the formula? There are millions of rows in my data model, so i have to increase the performance.

 

Hopefully someone of you can help me 😄

1 ACCEPTED SOLUTION
OwenAuger
Super User
Super User

Hey @addicted87 

 

Here is a measure you could try. Using some test data at my end, it performed significantly better than the original measure:

number of suppliers in reference group =
VAR FilteredSupplierKey =
    VALUES ( supplier[column_supplier_id] )
VAR SimilarSupplierKey =
    CALCULATETABLE (
        VALUES ( supplier[column_supplier_id] );
        ALL ( supplier[column_supplier_id] );
        SUMMARIZE (
            supplier;
            supplier[column_customers];
            supplier[column_region];
            supplier[column_employees]
        )
    )
RETURN
    COUNTROWS ( EXCEPT ( SimilarSupplierKey; FilteredSupplierKey ) )

This should follow the same logic as your original measure when filtered on a single Supplier ID.

This measure also handles cases where you have filtered on multiple Supplier IDs, by counting the union of "similar" Suppliers.

 

Does this perform better with your data, and still produce correct results?

 

Regards

Owen


Owen Auger

Did I answer your question? Mark my post as a solution!

My Blog
Connect on Twitter
Connect on LinkedIn

View solution in original post

3 REPLIES 3
OwenAuger
Super User
Super User

Hey @addicted87 

 

Here is a measure you could try. Using some test data at my end, it performed significantly better than the original measure:

number of suppliers in reference group =
VAR FilteredSupplierKey =
    VALUES ( supplier[column_supplier_id] )
VAR SimilarSupplierKey =
    CALCULATETABLE (
        VALUES ( supplier[column_supplier_id] );
        ALL ( supplier[column_supplier_id] );
        SUMMARIZE (
            supplier;
            supplier[column_customers];
            supplier[column_region];
            supplier[column_employees]
        )
    )
RETURN
    COUNTROWS ( EXCEPT ( SimilarSupplierKey; FilteredSupplierKey ) )

This should follow the same logic as your original measure when filtered on a single Supplier ID.

This measure also handles cases where you have filtered on multiple Supplier IDs, by counting the union of "similar" Suppliers.

 

Does this perform better with your data, and still produce correct results?

 

Regards

Owen


Owen Auger

Did I answer your question? Mark my post as a solution!

My Blog
Connect on Twitter
Connect on LinkedIn

View solution in original post

@OwenAugerThank you so much for your solution. This one is incredible fast compared to my solution. My load time has decreased from 30 seconds to 1 second...amazing 😉

 

regards

robert

V-pazhen-msft
Community Support
Community Support

@addicted87 
Create variables in a measure can lead to extra calculations, try put them into the return expression see if it goes faster:

 

number of suppliers in reference group =
CALCULATE(
distinctcount(supplier[column_supplier_id]); Filter(ALL(supplier[column_supplier_id]);
supplier[column_customers]=max(supplier[column_customers]);
supplier[column_region]=max(supplier[column_region]);
supplier[column_employees] = max(supplier[column_employees]); 
supplier[column_supplier_id] <> max(supplier[column_supplier_id]))

 

 

However, there is no way to increase performance speed dramatically since measure is calculated at the time you interact. The other option is to change the measure to a calculated column, that would increase pbix size, it pre-calculates the formula at the time you open the pbix, so it would be faster when you interact on the visuals.


Paul Zheng _ Community Support Team
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
  

 

Helpful resources

Announcements
UG GA Amplification 768x460.png

Launching new user group features

Learn how to create your own user groups today!

November Power BI Update 768x460.png

Check it Out!

Click here to read more about the November 2021 Updates!

M365 768x460.jpg

Microsoft 365 Collaboration Conference | December 7–9, 2021

Join us, in-person, December 7–9 in Las Vegas, for the largest gathering of the Microsoft community in the world.