Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
karkar
Helper III
Helper III

Summarizing data

 

 

Hello,


I am new to using Power Bi. Any help would be highly appreciated.

 

I have the same patient listed several times since every hour we are checking if they are on medication. I have the hospital hours which is basically the difference between the admit and discharge date (in hours). While summarizing the on_med, i need to consider the hosp_hours field only once per patient (its repeating currently). I have shown only 2 patients here for simplicity but i have around 1000 patients in my data.

 

HAVE


patient            admit_date                      discha_date              on_med        hosp_hrs


pat1      Dec20,2016 4:20pm      Dec21,2016 4:20pm                0                     24                              


pat1     Dec20,2016 4:20pm      Dec21,2016 4:20pm                 1                    24


pat1    Dec20,2016 4:20pm      Dec21,2016 4:20pm                  1                    24


pat1    Dec20,2016 4:20pm      Dec21,2016 4:20pm                  1                    24

 

pat2      Dec20,2016 4:20pm      Dec21,2016 4:20am                0                     12                              


pat2     Dec20,2016 4:20pm      Dec21,2016 4:20am                 1                    12

 

WANT

patient           on_med        hosp_hrs     percent_hours

pat1                    3                     24         3/24*100= 12.5% 

pat2                    1                     12         3/24*100= 8.3% 

TOTAL                4                     36          4/36*100=11.11%

                  

 

        

2 ACCEPTED SOLUTIONS
Anonymous
Not applicable

@karkar,

Adapting the SUMMARIZE idea from @tommcmaster to get rid of the duplicate records for admit_date and discha_date, you can get the result you want as follows, without creating extra tables:

 

Create a measure to calculate the hours in hospital, that caters for the weird way that a Power BI matrix or Excel PowerPivot total works (see https://www.powerpivotpro.com/2012/03/subtotals-and-grand-totals-that-add-up-correctly/ as background):

Hospital Hours =
IF (
    HASONEVALUE ( PatientData[patient] ),
    MAX ( [hosp_hrs] ),
    SUMX (
        SUMMARIZE ( PatientData, PatientData[patient], PatientData[hosp_hrs] ),
        [hosp_hrs]
    )
)

If you're only dealing with a few thouand patient records, this should perform fine.

 

I prefer to create an explicit measure for Total Meds, for readability and traceability (it looks better as a column header, and you don't have to dig into the visual in a few months time to see that it's a Sum, not an Average etc.:

Total Meds = SUM( PatientData[on_med] )

Create a measure for your percent_hours, and format as a percentage via the Modelling tab:

Meds Percentage = DIVIDE ( [Total Meds], [Hospital Hours] )

 

Drag a Matrix visual onto your canvas, with patient as Rows, and your three new measures as Values.

 

Done.

 

 

 

View solution in original post

Hello Steve,

 

Thanks for the reply and your recommendations helped me get what i wanted.

So it works fine when we create measures rather than row by row for percentage calculations?

in the hospital Hours calculation below :

  1. first the IF condition is used to get a single value using HASONEVALUE (since we have the same value of hosp_hrs repeated several times for a patient) and then take the max value ?
  2. What is SUmmarize function within SUMX for?
  3. If there was a single value for Hosp_hrs (basically single row per patient) in my dataset , can I bypass all these steps? or would power Bi still add up the percentages by default and give me percentages greater than 100%?

Thanks a lot

 

 

Hospital Hours =
IF (
    HASONEVALUE ( PatientData[patient] ),
    MAX ( [hosp_hrs] ),
    SUMX (
        SUMMARIZE ( PatientData, PatientData[patient], PatientData[hosp_hrs] ),
        [hosp_hrs]
    )
)

View solution in original post

9 REPLIES 9
tommcmaster
Frequent Visitor

Hi,

 

I think you could use CALCULATE or SUMMARIZE to do this. Will all the values in hosp_hours all be the same per each patient? So will pat1 always have 24 hours?

 

http://www.wiseowl.co.uk/blog/s2480/summarize.htm gives a good first explanation of SUMMARIZE.

 

In the Data view you could try.

 

NewTable =SUMMARIZE('YourDataSet', 'YourDataSet'[Patient], 'YourDataSet'[hosp_hours],"new_onmed", SUM('YourDataSet'[on_med]))

 

That will give you a new table and you can then add a calculated column to the end for your percent_hours.

 

Tom.

Hello Tom, Thanks for the reply. Yes, the values in hosp_hours will always be the same for each patient.

Also i wrote a SQL code to come up with the data i currently have. The new table calcualtion you have mentioned uses my current table and creates another table?

 

Thanks

Hi,

 

Yes, you can click on the Data icon in the menu on the left and use the New Table button. If you add the formula to the forumla bar that appears then you can find the new table that's made in your data sets.

 

Tom.

Hello Tom,

 

Thanks for the detailed explanation. Every thing works fine except that the percentage column is adding up all the percentages instead of performing (4/36)*100.

 

HAVE

patient           on_med        hosp_hrs     percent_hours

pat1                    3                     24         3/24*100= 12.5% 

pat2                    1                     12         3/24*100= 8.3% 

TOTAL                4                     36         12.5+8.3=20.8% (not as desired)

                  

 

WANT

patient           on_med        hosp_hrs     percent_hours

pat1                    3                     24         3/24*100= 12.5% 

pat2                    1                     12         3/24*100= 8.3% 

TOTAL                4                     36          4/36*100=11.11%

                  

Hi Karkar,

 

I've put your data into a CSV and created a new .PBX from it. I think I have solved the issue but I have used two new tables to do it. The first gives you information summarized by patient and the second uses that to get your overall percentages. I'm very new to Power BI so there may be a more efficient way to solve the problem.

 

1) Load your data into a data set and call it Patient.

 

2) Create a new table and use the following formula:

 

SUMMARIZEBYPATIENT = SUMMARIZE(
'Patient',
Patient[patient],
Patient[hosp_hrs],
"SUMONMEDHRS",SUM(Patient[on_med]))

 

That will summarize all your data but group by patient.

 

3) Create a further new table which summarizes that created in step 2) 

 

OVERALLSUMMARY = SUMMARIZE('SUMMARIZEBYPATIENT', "SUMONMED", SUM(SUMMARIZEBYPATIENT[SUMONMEDHRS]), "SUMHOSPHRS", SUM(SUMMARIZEBYPATIENT[hosp_hrs]), "PERCENTAGE", SUM(SUMMARIZEBYPATIENT[SUMONMEDHRS])/SUM(SUMMARIZEBYPATIENT[hosp_hrs])*100)

 

That will give you all the details you need to get the overall totals and percentages.

 

Hopefully someone else can use this as an example to make it more efficient and in one DAX statement. 

 

Let me know how it goes. It's an interesting problem 🙂

 

Tom.

Hello Tom,

 

First;y,  want to thank you for working on this issue and also wish you a happy new year,

Unfortunately, the step you suggested did not work for me....As you can see below, to the left side is STEP-2 where we had the issue with percentage calculation by default(it was adding up to give 21913.38%)..., This was got by dividing both the varaibles and clicking on % symbol in the ribbon.

To the right is the variables after creating another table using SUMMARIZE as you suugested...SUMHSPHOURS is the same as HOURS_IN_HOSP on the left......and interestingly they both are difierent now and same is the case with SUMONMED which is the also numerically same as NEW_MEAS_VALUE on the left..  i dont understand the resason why they have to differ now....This step to the right also gives the same numbers to all the rows....

 

 

 

 

Anonymous
Not applicable

@karkar,

Adapting the SUMMARIZE idea from @tommcmaster to get rid of the duplicate records for admit_date and discha_date, you can get the result you want as follows, without creating extra tables:

 

Create a measure to calculate the hours in hospital, that caters for the weird way that a Power BI matrix or Excel PowerPivot total works (see https://www.powerpivotpro.com/2012/03/subtotals-and-grand-totals-that-add-up-correctly/ as background):

Hospital Hours =
IF (
    HASONEVALUE ( PatientData[patient] ),
    MAX ( [hosp_hrs] ),
    SUMX (
        SUMMARIZE ( PatientData, PatientData[patient], PatientData[hosp_hrs] ),
        [hosp_hrs]
    )
)

If you're only dealing with a few thouand patient records, this should perform fine.

 

I prefer to create an explicit measure for Total Meds, for readability and traceability (it looks better as a column header, and you don't have to dig into the visual in a few months time to see that it's a Sum, not an Average etc.:

Total Meds = SUM( PatientData[on_med] )

Create a measure for your percent_hours, and format as a percentage via the Modelling tab:

Meds Percentage = DIVIDE ( [Total Meds], [Hospital Hours] )

 

Drag a Matrix visual onto your canvas, with patient as Rows, and your three new measures as Values.

 

Done.

 

 

 

Hello Steve,

 

Thanks for the reply and your recommendations helped me get what i wanted.

So it works fine when we create measures rather than row by row for percentage calculations?

in the hospital Hours calculation below :

  1. first the IF condition is used to get a single value using HASONEVALUE (since we have the same value of hosp_hrs repeated several times for a patient) and then take the max value ?
  2. What is SUmmarize function within SUMX for?
  3. If there was a single value for Hosp_hrs (basically single row per patient) in my dataset , can I bypass all these steps? or would power Bi still add up the percentages by default and give me percentages greater than 100%?

Thanks a lot

 

 

Hospital Hours =
IF (
    HASONEVALUE ( PatientData[patient] ),
    MAX ( [hosp_hrs] ),
    SUMX (
        SUMMARIZE ( PatientData, PatientData[patient], PatientData[hosp_hrs] ),
        [hosp_hrs]
    )
)

Anonymous
Not applicable

@karkar,

 

  1. first the IF condition is used to get a single value using HASONEVALUE (since we have the same value of hosp_hrs repeated several times for a patient) and then take the max value ?
    1. The matrix is aggregating/filtering your data down to one row per patient.  The HASONEVALUE is testing if you are on one patient's row of the matrix, or on a Total row (with many patients). If you are on one row for one patient, take the MAX of hosp_hrs for that patient's records, which should all be the same.  
    2. NOTE: This assumes that each patient only presents once - i.e. admit_date and disch_date are always the same. If not, you'll need more complex measures or a different data model.
  2. What is SUmmarize function within SUMX for?
    1. Your dataset has duplicate rows per patient for admit_date and discha_date and hosp_hrs.  The SUMMARIZE groups each patient's records down to one row, so you can add the hosp_hrs for each patient to get the total.  Again, if you can have a patient with different admit_date and discha_date, you'll need more complex measures or a different data model.
  3. If there was a single value for Hosp_hrs (basically single row per patient) in my dataset , can I bypass all these steps? or would power Bi still add up the percentages by default and give me percentages greater than 100%?
    1. Probably.  If you can SUMMARIZE down to one row per patient before import, you can use simpler measures.

Without seeing your full dataset and what else you are trying to visualise, it's hard to recommend something that will work for your complete solution - e.g. do you have other patient data you want to slice by such as age, gender, location, surgery type etc.?  To simplify just this piece though, you could:

  1. SUMMARIZE PatientData up front via a GROUP BY in your SQL code (or in Power Query) to create a data table with just one row per patient by admit_date, with a SUM of on_med.  This would also better allow for patients with more that one presentation - i.e. different admit_date and discha_date values.
  2. Add 3 x simple measures - Total Meds = SUM([on_med]), Total Hours = SUM ([hosp_hrs]), and Percent Meds=DIVIDE([Total Meds], [Total Hours]).  These measures should work on all patient rows (a different row , AND in the total row in a Matrix or Table. 

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.