Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
addicted87
Helper I
Helper I

Compare to reference group

hey power bi users,

i've got following issue with my power bi report:

 

What I'm going to do?

I want to compare one single supplier to a reference group consisting of 1...n different suppliers with same properties.

 

For Example:

 

Supplier 1:

column_customers = '50 - 100 customers'

column_region = germany

column_employees = '<100 employees'

 

So the reference group consists of all suppliers, which have the same properties (except supplier 1) as well:

column_customers = '50 - 100 customers'

column_region = germany

column_employees = '<100 employees'

 

Lets say, there are 200000 suppliers and 5 of them (No. 7, 100, 450, 1230, 1999) meet the requirements/properties of supplier 1.

 

Here is my dax formula:

 

number of suppliers in reference group =
VAR customers= max(supplier[column_customers])
VAR region = max(supplier[column_region])
VAR employees = max(supplier[column_employees])

VAR supplier = max(supplier[column_supplier_id])

 

RETURN CALCULATE(
distinctcount(supplier[column_supplier_id]); ALL(supplier[column_supplier_id]);supplier[column_customers]=customers;supplier[column_region]=region;supplier[column_employees] = employees; supplier[column_supplier_id] <> supplier)

 

So the result in a straight table looks like this:

 

 |

supplier_idnumber of suppliers in reference group
15
213
325
......
200000133

 

 

This one is already working, but pretty slow. Is there a way to optimize the formula? There are millions of rows in my data model, so i have to increase the performance.

 

Hopefully someone of you can help me 😄

1 ACCEPTED SOLUTION
OwenAuger
Super User
Super User

Hey @addicted87 

 

Here is a measure you could try. Using some test data at my end, it performed significantly better than the original measure:

number of suppliers in reference group =
VAR FilteredSupplierKey =
    VALUES ( supplier[column_supplier_id] )
VAR SimilarSupplierKey =
    CALCULATETABLE (
        VALUES ( supplier[column_supplier_id] );
        ALL ( supplier[column_supplier_id] );
        SUMMARIZE (
            supplier;
            supplier[column_customers];
            supplier[column_region];
            supplier[column_employees]
        )
    )
RETURN
    COUNTROWS ( EXCEPT ( SimilarSupplierKey; FilteredSupplierKey ) )

This should follow the same logic as your original measure when filtered on a single Supplier ID.

This measure also handles cases where you have filtered on multiple Supplier IDs, by counting the union of "similar" Suppliers.

 

Does this perform better with your data, and still produce correct results?

 

Regards

Owen


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

View solution in original post

3 REPLIES 3
OwenAuger
Super User
Super User

Hey @addicted87 

 

Here is a measure you could try. Using some test data at my end, it performed significantly better than the original measure:

number of suppliers in reference group =
VAR FilteredSupplierKey =
    VALUES ( supplier[column_supplier_id] )
VAR SimilarSupplierKey =
    CALCULATETABLE (
        VALUES ( supplier[column_supplier_id] );
        ALL ( supplier[column_supplier_id] );
        SUMMARIZE (
            supplier;
            supplier[column_customers];
            supplier[column_region];
            supplier[column_employees]
        )
    )
RETURN
    COUNTROWS ( EXCEPT ( SimilarSupplierKey; FilteredSupplierKey ) )

This should follow the same logic as your original measure when filtered on a single Supplier ID.

This measure also handles cases where you have filtered on multiple Supplier IDs, by counting the union of "similar" Suppliers.

 

Does this perform better with your data, and still produce correct results?

 

Regards

Owen


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

@OwenAugerThank you so much for your solution. This one is incredible fast compared to my solution. My load time has decreased from 30 seconds to 1 second...amazing 😉

 

regards

robert

V-pazhen-msft
Community Support
Community Support

@addicted87 
Create variables in a measure can lead to extra calculations, try put them into the return expression see if it goes faster:

 

number of suppliers in reference group =
CALCULATE(
distinctcount(supplier[column_supplier_id]); Filter(ALL(supplier[column_supplier_id]);
supplier[column_customers]=max(supplier[column_customers]);
supplier[column_region]=max(supplier[column_region]);
supplier[column_employees] = max(supplier[column_employees]); 
supplier[column_supplier_id] <> max(supplier[column_supplier_id]))

 

 

However, there is no way to increase performance speed dramatically since measure is calculated at the time you interact. The other option is to change the measure to a calculated column, that would increase pbix size, it pre-calculates the formula at the time you open the pbix, so it would be faster when you interact on the visuals.


Paul Zheng _ Community Support Team
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
  

 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.