Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
hey power bi users,
i've got following issue with my power bi report:
What I'm going to do?
I want to compare one single supplier to a reference group consisting of 1...n different suppliers with same properties.
For Example:
Supplier 1:
column_customers = '50 - 100 customers'
column_region = germany
column_employees = '<100 employees'
So the reference group consists of all suppliers, which have the same properties (except supplier 1) as well:
column_customers = '50 - 100 customers'
column_region = germany
column_employees = '<100 employees'
Lets say, there are 200000 suppliers and 5 of them (No. 7, 100, 450, 1230, 1999) meet the requirements/properties of supplier 1.
Here is my dax formula:
number of suppliers in reference group =
VAR customers= max(supplier[column_customers])
VAR region = max(supplier[column_region])
VAR employees = max(supplier[column_employees])
VAR supplier = max(supplier[column_supplier_id])
RETURN CALCULATE(
distinctcount(supplier[column_supplier_id]); ALL(supplier[column_supplier_id]);supplier[column_customers]=customers;supplier[column_region]=region;supplier[column_employees] = employees; supplier[column_supplier_id] <> supplier)
So the result in a straight table looks like this:
|
supplier_id | number of suppliers in reference group |
1 | 5 |
2 | 13 |
3 | 25 |
... | ... |
200000 | 133 |
This one is already working, but pretty slow. Is there a way to optimize the formula? There are millions of rows in my data model, so i have to increase the performance.
Hopefully someone of you can help me 😄
Solved! Go to Solution.
Hey @addicted87
Here is a measure you could try. Using some test data at my end, it performed significantly better than the original measure:
number of suppliers in reference group =
VAR FilteredSupplierKey =
VALUES ( supplier[column_supplier_id] )
VAR SimilarSupplierKey =
CALCULATETABLE (
VALUES ( supplier[column_supplier_id] );
ALL ( supplier[column_supplier_id] );
SUMMARIZE (
supplier;
supplier[column_customers];
supplier[column_region];
supplier[column_employees]
)
)
RETURN
COUNTROWS ( EXCEPT ( SimilarSupplierKey; FilteredSupplierKey ) )
This should follow the same logic as your original measure when filtered on a single Supplier ID.
This measure also handles cases where you have filtered on multiple Supplier IDs, by counting the union of "similar" Suppliers.
Does this perform better with your data, and still produce correct results?
Regards
Owen
Hey @addicted87
Here is a measure you could try. Using some test data at my end, it performed significantly better than the original measure:
number of suppliers in reference group =
VAR FilteredSupplierKey =
VALUES ( supplier[column_supplier_id] )
VAR SimilarSupplierKey =
CALCULATETABLE (
VALUES ( supplier[column_supplier_id] );
ALL ( supplier[column_supplier_id] );
SUMMARIZE (
supplier;
supplier[column_customers];
supplier[column_region];
supplier[column_employees]
)
)
RETURN
COUNTROWS ( EXCEPT ( SimilarSupplierKey; FilteredSupplierKey ) )
This should follow the same logic as your original measure when filtered on a single Supplier ID.
This measure also handles cases where you have filtered on multiple Supplier IDs, by counting the union of "similar" Suppliers.
Does this perform better with your data, and still produce correct results?
Regards
Owen
@OwenAugerThank you so much for your solution. This one is incredible fast compared to my solution. My load time has decreased from 30 seconds to 1 second...amazing 😉
regards
robert
@addicted87
Create variables in a measure can lead to extra calculations, try put them into the return expression see if it goes faster:
number of suppliers in reference group =
CALCULATE(
distinctcount(supplier[column_supplier_id]); Filter(ALL(supplier[column_supplier_id]);
supplier[column_customers]=max(supplier[column_customers]);
supplier[column_region]=max(supplier[column_region]);
supplier[column_employees] = max(supplier[column_employees]);
supplier[column_supplier_id] <> max(supplier[column_supplier_id]))
However, there is no way to increase performance speed dramatically since measure is calculated at the time you interact. The other option is to change the measure to a calculated column, that would increase pbix size, it pre-calculates the formula at the time you open the pbix, so it would be faster when you interact on the visuals.
Paul Zheng _ Community Support Team
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
User | Count |
---|---|
125 | |
106 | |
99 | |
63 | |
62 |
User | Count |
---|---|
135 | |
116 | |
101 | |
71 | |
61 |