Solved: Best Modeling - Joining to multiple records withou...

CB · ‎06-25-2018

Hello,

I have a dataset where there are multiple patients and each patient has multiple values (several lab results ) . The report needs to have the ability to filter/slice on the diagnosis of the patient such as (Diabetes 1, Diabetes 2). Each patient can have 1 or more diagnoses.

We need to show the average lab result per patient/patients for the diagnoses selected but we don't want to incorrectly reflect a multiplied value of the lab results based on how many diagnoses exist.

Example: (patient abc ghi has 2 diagnoses so their lab events are duplicated, 1 for each diagnosis).
labeventId Patient Lab result Diagnosis

1 abc 1.6 Diabetes 1

1 abc 1.6 Diabetes 2

2 abc 5.0 Diabetes 1
2 abc 5.0 Diabetes 2

3 def 4 Diabetes 1

4 ghi 2.0 Diabetes 1

4 ghi 2,0 Diabetes 2

Using the above would result in counting a single labevent twice (1.6, 1.6). We want to count each event once only no matter how many diagnoses there are.

What is the best way to model this:

a. Include the diagnosis in the lab result dataset

b. have a separate table with the patientid and diagnoses and join to the labevent based on patient.

In either one how can we ensure the actual lab results don't get duplicated/inflated based on the number of diagnoses?

CB · ‎07-11-2018

I resolved this by including the diagnosis in the main result data. Even though there are multiple records per lab event if there is more than one diagnosis to a patient, the measures are not duplicating their values.

I believe the true correct model would be to have a separate table for patients and a separate table with patientId, diagnosis. Then to relate Patient > PatientDiagnosis and relate "labEvent" to Patient. However, when I did this, regardless of the crossjoin and cardinality configuration of the joins/relationships, the measures were not calculating correctly so I had to resort to including the diagnosis in the main labEvent table and not use relationships.

View solution in original post

v-piga-msft · ‎06-27-2018

Hi @CB,

You may create a calculated table with the formula below.

Table =
SUMMARIZE (
    'Table1';
    'Table1'[Labeventid];
    'Table1'[Patient];
    "result"; CALCULATE (
        MAX ( Table1[Lab result] );
        FILTER (
            ALL ( Table1 );
            'Table1'[Labeventid] = MAX ( 'Table1'[Labeventid] )
                && 'Table1'[Patient] = MAX ( 'Table1'[Patient] )
        )
    )
)

Then you will get the out put below.

For more details, you could have a reference of the test file attached below.

Best Regards,

Cherry

Community Support Team _ Cherry Gao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

CB · ‎07-04-2018

Hello,

Thank you for your input (apologies for the delay in getting back to this). The calculated table would not suffice as all my slicers still need to be taken into account. Is there a way I can do something like a subquery that says "pull all the patients that have a diagnosis in ( table that holds patientid , diagnosis) " rather than say join the lab event data to the diagnosis data which would pull multiple records of each labevent if there were multiple diagnoses?

CB · ‎07-11-2018

I resolved this by including the diagnosis in the main result data. Even though there are multiple records per lab event if there is more than one diagnosis to a patient, the measures are not duplicating their values.

I believe the true correct model would be to have a separate table for patients and a separate table with patientId, diagnosis. Then to relate Patient > PatientDiagnosis and relate "labEvent" to Patient. However, when I did this, regardless of the crossjoin and cardinality configuration of the joins/relationships, the measures were not calculating correctly so I had to resort to including the diagnosis in the main labEvent table and not use relationships.

Best Modeling - Joining to multiple records without duplicating/multiplying the values

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly