Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hey All,
I have data at the below grouping level and there are around more than a billion rows in the fact table. As you can see in below example, we have a particular prescription and that contains multiple products. In case of prescription #1, this prescription contains product #1, 2 & 7. This prescription with the particular product combinatin was prescribed to 200 patients. Overall, i have 700 Patients from below dataset. For example, product #1, was presribed to 700 patients & Product #7, was prescribed to 200 patients.
Goal: Create a measure for # of Patients with optimized performance.
Note:
Technically, I am able to achive this, but it's taking around 5 mins to return the results which is unacceptable. Below implementation works, but takes lot of time and this is on the server with huge configuration.
SUMX(DISTINCT(FactTable[PrescriptionKey]),FIRSTNONBLANK(FactTable[NoOfPatient],0))
I am aware that we can break up these tables and create multiple fact tables, but then the Prescription Key records are in around 50-60 million and product is around 1 billion. So, in that case as well, the performance is not good.
PrescriptionKey | ProductKey | NoOfPatients | QtyPrescribed |
1 | 1 | 200 | 50 |
1 | 2 | 200 | 25 |
1 | 7 | 200 | 100 |
2 | 1 | 500 | 75 |
2 | 2 | 500 | 100 |
2 | 3 | 500 | 250 |
2 | 4 | 500 | 525 |
Total | 700 | 1125 |
Any help is appreciated.
Thanks
Amit
@AmitTiwari , I think this should be done like
sumx(Values(FactTable[PrescriptionKey]), max(FactTable[NoOfPatient])) // or use Min in place of Max
or
sumx(summarize(FactTable,FactTable[PrescriptionKey],"_1", max(FactTable[NoOfPatient])),[_1])
Hi @amitchandak
Thanks for the quick response. Below dax does not give correct results. It takes the max/min of all the result sets and sums it up. Gives me a result of 1000 in case of MAX and 400 in case of MIN.
sumx(Values(FactTable[PrescriptionKey]), max(FactTable[NoOfPatient]))
This one is working correctly, but for my prod data, it's taking around 485 seconds to complete.
sumx(summarize(FactTable,FactTable[PrescriptionKey],"_1", max(FactTable[NoOfPatient])),[_1])
Is there any way we can get faster results?
Hi, @AmitTiwari , you might try these measures out,
SUMX (
VALUES ( FactTable[PrescriptionKey] ),
CALCULATE ( MAX ( FactTable[NoOfPatients] ) )
)
and
SUMX (
SUMMARIZECOLUMNS (
FactTable[PrescriptionKey],
"col", MAX ( FactTable[NoOfPatients] )
),
[col]
)
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
@AmitTiwari , if there is guaranteed product key 1 always there then it would be the fastest
calculate(sum(FactTable[NoOfPatient]) , FactTable[productkey]=1)
Or create a column they use in measure as filter , like above. This will distribute cost
if(minx(filter(FactTable,[PrescriptionKey] = earlier([PrescriptionKey])),[productkey]) = [productkey],1,0)
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
45 | |
21 | |
20 | |
15 | |
13 |
User | Count |
---|---|
45 | |
41 | |
39 | |
19 | |
19 |