Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Calculate STDEV include 0 values - PROBLEM IS WITH MY DATA LINEAGE

Hi,

I'm pretty new to Power BI. I have learned a TON from this forum so thank you to the many contributors.

 

I have seen a post about this problem before and understand the theory of joining with another table to ensure 0 values get included. https://community.powerbi.com/t5/Desktop/Including-0-values-when-calculating-STDEV/m-p/195269 . However, I am struggling mightily implementing it due to my data structure (and lack of DAX experience beyond basic measures).

 

My source data is from .txt files where each line corresponds to a worked shift by an employee. A slightly simplified table structure is:

[Date][EmployeeID][WorkerType][Location][ShiftCode][ShiftLength][ShiftTOD][DayOfSched]

 

Descriptions:

[Location] = A unit of the hospital

[WorkerType] = RN, RPN, PSW 

[ShiftCode] = Characters that correspond to types of shift, such as "straight time" or "overtime"

[ShiftLength] = total minutes worked (typical 450 = 1 shift)

[ShiftTOD] = 1, 2, or 3 (corresponding to Day, Evening, Night shift)

[DayOfSched] = a number from 1 to 28 (4 week schedule = 28 days) - essentially a repeating index

 

In Power Query I have brought the data in and created additional columns that break out the ShiftLength for TOD1, TOD2, TOD3 using an IF statement. I did this in PQ because I thought it would be better to calculate once than do in memory for hundreds of thousands or millions of rows of data.

 

I have a Date Dimension table, and Dimensions for Worker Type, Location, and Shift Codes. 

 

My goal is to calculate the standard deviation (and other similar calculations) of the number of shifts over a given schedule (28 days) as a measure of how good our schedules are (STDEV = 0 being good). For the calculation to be meaningful, I need to use slicers on [Location], [WorkerType], and [ShiftTOD] because we desire consistent numbers of each worker type for each shift, on each unit, on each day.

 

I successfullydid this using the below, which I learned from this forum:

 

 

 

 

Measure StdDev = CALCULATE(
VAR __table = SUMMARIZE('Future Schedule',[Date],"__Measure",[FutureShifts])
RETURN STDEVX.P(__table,[__Measure]) ,'Shift Code'[ShiftType] = "Straight Time")

 

 

 

 

However, I've realized that for some [Locations] they may have dates where NO staff were scheduled, meaning that __table  will be missing that date and therefore STDEVX.P will be wrong. 

 

I have tried a few methods with no success. 

1) Create a calculated table that covers all dates and uses = 0+SUMMARIZE(...) to populate a second column with the number of shifts. This works, but IS NOT DYNAMIC so I can't filter it by slicers.

 

2) Dabbled in creating a Measure using VAR to declare virtual tables to attempt the same thing, but I do not understand the syntax well enough to create the table I need. I gather there might be issues in that VAR is also a scalar. I have not wrapped my brain around this one. 

 

3) In Excel I created a base table with first column [DayOfSched] populated with numbers 1 thru 28 (recall schedules have 28 days and that is my unit of analysis for calculating STDEV) and second column [Shifts] populated with 0's (zero). Imported using PQ. and linked DayOfSched to my Fact table. My goal was to use DAX Measure to call this base column, add values from a Summarize expression, and return the STDEV. Something like:

 

 

 

Measure StdDev = CALCULATE(
VAR __table = SUMMARIZE('Future Schedule',[DayOfSched],"__Measure",[FutureShifts])
VAR __ZerosPlusShifts = NATURALLEFTOUTERJOIN('ScheduleShiftsTable',__table)
RETURN STDEVX.P(__ZerosPlusShifts,[__Measure]) ,'Shift Code'[ShiftType] = "Straight Time")

 

 

 

However, I get error "No common join columns detected. The join function 'NATURALLEFTOUTERJOIN' requires at-least one common join column."

 

4) The last idea I have not fully contemplated would be a merge in PQ to insert 0 data into my fact table. However, I am not clear on whether I would need to create a dummy data for EVERY SINGLE [location], [TOD], [StaffType], etc. and would that would take to make and how it would balloon my data set.

 

I have tried a few other iterations with no success. I would appreciate any help!

1 ACCEPTED SOLUTION
StefanoGrimaldi
Resident Rockstar
Resident Rockstar

try on power query create 2 table, one with the range of dates you want, another with all posible combination of location, stafftype etc etc, with a column tha concatenate all columns as a reference, now on both tables create a dummy column named "ref" and with a "." as data on all records xD, merge both tables using that last column as reference you will have all possible locations ,stadtype etc combination by each date on the range, load this to the data model, now using dax create a new column to read or sum the data from the fact table and populate the new reference table you did, or using a merge again with the fact and new table on power query both final apporach should work fine 





Did I answer your question? Mark my post as a solution! / Did it help? Give some Kudos!

Proud to be a Super User!




View solution in original post

3 REPLIES 3
StefanoGrimaldi
Resident Rockstar
Resident Rockstar

try on power query create 2 table, one with the range of dates you want, another with all posible combination of location, stafftype etc etc, with a column tha concatenate all columns as a reference, now on both tables create a dummy column named "ref" and with a "." as data on all records xD, merge both tables using that last column as reference you will have all possible locations ,stadtype etc combination by each date on the range, load this to the data model, now using dax create a new column to read or sum the data from the fact table and populate the new reference table you did, or using a merge again with the fact and new table on power query both final apporach should work fine 





Did I answer your question? Mark my post as a solution! / Did it help? Give some Kudos!

Proud to be a Super User!




Anonymous
Not applicable

Hi Stefano,

Thanks for your advice. I followed your suggestion, though I did't use a concatenated column (I was unclear the benefit). The dummy column method was perfect for creating my "0" data. It is avery large set but seems manageable. 

 

Thanks!

the first option give you the oportunity to use dax to change blank results to 0, on power query you should use replace vaue to replace blank results by 0





Did I answer your question? Mark my post as a solution! / Did it help? Give some Kudos!

Proud to be a Super User!




Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.