cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Anonymous
Not applicable

Categorizing workers based on shifts worked

Hi everyone! I need help with this: I have to categorize workers based on their seasonality. The model is very simple, i'll illustrate it on the table:

 

IDTotal_ShiftsMonthYear

Year_Month

112122019

2019-12

2141220192019-12
3151220192019-12
133012020

2020-01

2240120202020-01
330120202020-01

 

So, I have every worker with their ID and the total of shifts they worked every month of the given year (the database covers 3 years with hundreds of workers).

 

I have to group everyone based on this:

Occasional worker: every worker that works from 15 to 25 shifts a year.

Seasonal worker: every worker that works an average of 5,5 to 20,5 shifts by month, but only if worked for four months or less of the given year.

Sporadic worker: every worker that works an average of 5,5 to 20,5 shifts by month.

Regular: from 15 to 25 shifts by month.

The others just fit into the "Other" category.

 

I think I have to make this with a calculated column and a bunch of ifs.
If I'd have to make a measure of this I guess it wouldn't be that difficult because the chart in itself groups the values, but I don't know how can I do to group this calculations year by year as it happens with the Occasional worker or how can I use moving months for the Seasonal worker (the one I think is the most difficult).

After categorizing, I have to calculate how many people fit into every category and calculate the average shifts for every category.

I'd very much appreciate all the help you can give me,

Thank you for your time,
Best regards,
Gabriela

 

1 ACCEPTED SOLUTION

Ok, I think I have a better understanding, and I'd do this in Power Query. A simple SWITCH() won't cut it. A more complex one would but you need more table transformations. Thank you for the data. Insanely helpful understanding the issue.

 

See my PBIX file here. It is linked to your data, so you'll need to change it so it points to the "BD Sample.xlsx" on your computer via the Data Sources icon, Change Source item.

 

Here is what I did:

  1. I wanted to verify there were no duplicate data. There was, so I grouped it together. You can see this data in Power Query called [Hoja1 DUPLICATES] - each of these has 2 records with the same year and month. I added the Total Shifts together. This was important for the next step.
    1. 2020-03-11 19_51_29-20200311 - Categorizing Workers - Power Query Editor.png
    2. If that is wrong, then you need to fix your data, or change how my "Grouped Rows to combine duplicates" step works in the [Hoja1] query.
  2. Then I did a 2nd group by by ID and year only, adding the count of records as "Month" and Total Shifts. The count works because now every ID/Month/Year is unique, so grouping by ID/Month and counting how many records are there are the number of months.
    1. 2020-03-11 19_56_02-Group By.png
  3. Then I added an "Average Shifts" column which is just Total Shifts / Months. Now I have Average shifts per month. I arbitrarly rounded to 2 decimals. Do what you want there.
  4. Then I added the Status column. This is where the work is.

 

if [Total Shifts] >= 15 and [Total Shifts] <= 25 then "Occasional" else
if [Months Worked] =4 and [Average Shifts] >= 5.5 and [Average Shifts] <= 20.5 then "Seasonal" else
if [Months Worked] > 4 and [Average Shifts] >= 5.5 and [Average Shifts] <= 20.5 then "Sporadic" else
if [Average Shifts] >= 15 and [Average Shifts] <=25 then "Regular" else 
"Other"

It goes through the 4 IF statements, and if none match, it defaults to "Other."

 

If my logic is wrong because I misunderstood your requirements, you should be able to modify that code. The way I did it is, as I understood your requirements:

  1. if between 15-25 shifts per year, "Occasional"
  2. if months worked = 4 and average shifts were between 5.5 and 20.5, "Seasonal"
  3. if shifts were more than 4 per year and average was between 5.5 and 20.5, "Sporadic"
  4. If shifts were an average between 15 and 25 per month, "Regular"
  5. Everything else falls out as "Other"

You can change that logic, and add an additional if/then/else line before the "other." Just be careful not to cause any to overlap. For example, if you created a new 5th line of "If months worked = r and average shifts between 15 and 25" that would not work as rule #2 would grab all of those in the 5.5-20.5 range and new rule #5 would only get those between 20.5 and 25.

 

If you need to do some OR logic, just replace and with or, and you can use parentheses.

if ([Months Worked] < 3 or [Months Worked > 12) and [Total Shifts] < 25

as an example would first return TRUE if months were 1-3 or > 12, and then look at total shifts.

 

Ping back any questions.

 

Note: You can do this like you'd do in Excel too perhaps. Do one column per critera, then when done, combine those column logics into a single formula, but the if/then/else logic is pretty easy to construct, but you have to watch out for overlaps.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

5 REPLIES 5
edhans
Super User III
Super User III

I don't think you gave us enough info to do this, but I agree, as much as I hate calculated columns, this is what you need. And you want to use SWITCH() here.

 

Here is an example of how this would work:

Category = 
VAR CurrentID = [ID]
VAR AverageShifts = 
    AVERAGEX(
        FILTER(
            'Table',
            'Table'[ID]=CurrentID
        ),
        'Table'[Total_Shifts]
    )
RETURN
SWITCH(
    TRUE(),
    [Total_Shifts] >= 15 && [Total_Shifts] <= 25, "Occasional",
    AverageShifts >= 5.5 && AverageShifts <= 20.5, "Seasonal"
)

 

That returns the following:

2020-03-10 20_04_42-Untitled - Power BI Desktop.png

Ignore the "Average Shift" column. I used that to make everything between the first VAR and the RETURN logic that I then copied and pasted into the Category measure as shown above. But you can see worker #1 has 22.5 average shifts, and worker 2 has 19, etc.

 

But I didn't see how I was supposed to determine how many months in a year they worked with that data. But you can just add more conditions to each SWITCH line like you would in Excel. Use && for AND, and || for OR, and group with parenthesis as needed. THIS && (THAT || SOMETHIGN ELSE) for example

Hopefully that is enough to get you started. 

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
Anonymous
Not applicable

Hi @edhans 
I tried your solution and though the code works it doesn't quite fits to what I need: the Occasional worker is the one that works from 15 to 25 shifts a year; and the Seasonal worker is the one that worked that amount of average shifts but for four months or less so for example if a worker worked 6 shifts every month of the year he would be Sporadic (AVG=6) but if he worked 20 shifts for only four months he would be Seasonal (AVG = 20*4/12 = 6,6) he would be Seasonal. The database considers more than one year, and hundreds of workers, so I have to take account of the years. I share a sample DB Sample 

Best regards,
Gabriela

 

Ok, I think I have a better understanding, and I'd do this in Power Query. A simple SWITCH() won't cut it. A more complex one would but you need more table transformations. Thank you for the data. Insanely helpful understanding the issue.

 

See my PBIX file here. It is linked to your data, so you'll need to change it so it points to the "BD Sample.xlsx" on your computer via the Data Sources icon, Change Source item.

 

Here is what I did:

  1. I wanted to verify there were no duplicate data. There was, so I grouped it together. You can see this data in Power Query called [Hoja1 DUPLICATES] - each of these has 2 records with the same year and month. I added the Total Shifts together. This was important for the next step.
    1. 2020-03-11 19_51_29-20200311 - Categorizing Workers - Power Query Editor.png
    2. If that is wrong, then you need to fix your data, or change how my "Grouped Rows to combine duplicates" step works in the [Hoja1] query.
  2. Then I did a 2nd group by by ID and year only, adding the count of records as "Month" and Total Shifts. The count works because now every ID/Month/Year is unique, so grouping by ID/Month and counting how many records are there are the number of months.
    1. 2020-03-11 19_56_02-Group By.png
  3. Then I added an "Average Shifts" column which is just Total Shifts / Months. Now I have Average shifts per month. I arbitrarly rounded to 2 decimals. Do what you want there.
  4. Then I added the Status column. This is where the work is.

 

if [Total Shifts] >= 15 and [Total Shifts] <= 25 then "Occasional" else
if [Months Worked] =4 and [Average Shifts] >= 5.5 and [Average Shifts] <= 20.5 then "Seasonal" else
if [Months Worked] > 4 and [Average Shifts] >= 5.5 and [Average Shifts] <= 20.5 then "Sporadic" else
if [Average Shifts] >= 15 and [Average Shifts] <=25 then "Regular" else 
"Other"

It goes through the 4 IF statements, and if none match, it defaults to "Other."

 

If my logic is wrong because I misunderstood your requirements, you should be able to modify that code. The way I did it is, as I understood your requirements:

  1. if between 15-25 shifts per year, "Occasional"
  2. if months worked = 4 and average shifts were between 5.5 and 20.5, "Seasonal"
  3. if shifts were more than 4 per year and average was between 5.5 and 20.5, "Sporadic"
  4. If shifts were an average between 15 and 25 per month, "Regular"
  5. Everything else falls out as "Other"

You can change that logic, and add an additional if/then/else line before the "other." Just be careful not to cause any to overlap. For example, if you created a new 5th line of "If months worked = r and average shifts between 15 and 25" that would not work as rule #2 would grab all of those in the 5.5-20.5 range and new rule #5 would only get those between 20.5 and 25.

 

If you need to do some OR logic, just replace and with or, and you can use parentheses.

if ([Months Worked] < 3 or [Months Worked > 12) and [Total Shifts] < 25

as an example would first return TRUE if months were 1-3 or > 12, and then look at total shifts.

 

Ping back any questions.

 

Note: You can do this like you'd do in Excel too perhaps. Do one column per critera, then when done, combine those column logics into a single formula, but the if/then/else logic is pretty easy to construct, but you have to watch out for overlaps.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

Anonymous
Not applicable

Hi @edhans !! thank you very much for the dedication and for explaining step by step, I didn't know you could do all this on Power Query it seems a lot more simple.

I tried to do it on DAX mixing a bunch of other formulas I found and got a solution with a calculated column, here it is: 

 

Category = 
VAR ANUAL_SUM = 
CALCULATE(SUM('DAX_Test'[Total Shifts]),
        FILTER(ALLEXCEPT('DAX_Test', 'DAX_Test'[Year]),
        'DAX_Test'[ID]=EARLIER('DAX_Test'[ID])
            )
)

VAR ANUAL_AVG = ANUAL_SUM/12

VAR WORKEDMONTHS =
CALCULATE(DISTINCTCOUNT('DAX_Test'[Month]),
        FILTER(ALLEXCEPT('DAX_Test','DAX_Test'[Year]),
        'DAX_Test'[ID]=EARLIER('DAX_Test'[ID])
            )
)

RETURN
    /*Occasional: 15 to 25 shifts per year*/
    if(
        (ANUAL_SUM <= 25 && ANUAL_SUM >= 15) ,
            "Occasional" ,

                /*Seasonal: average between 5,5 to 20,5 per month on 4 months or less*/
                if(
                    (ANUAL_AVG <= 20,5 && ANUAL_AVG >= 5,5) && (WORKEDMONTHS <= 4 ) ,
                    "Seasonal",

                        /*Sporadic: 5,5 to 20,5 shifts per month*/
                        if(
                            ANUAL_AVG <= 20,5 && ANUAL_AVG >= 5,5 ,
                            "Sporadic" ,

                                /*Regular: 9 to 24 shifts per month*/
                                if(
                                    (WORKEDMONTHS >= 10) && (MAX('DAX_Test'[Total Shifts]) <= 24 && MIN('DAX_Test'[Total Shifts]) >= 9) ,
                                    "Regular" ,
                                                                               
                                    "Other"
                                )
                        )
                )
    )

 


I validated every category with your solution and mine and got some differences that I changed on Power Query and finally got a perfect match for Sporadic, Occasional and Seasonal but not for Regular. The differences were:
- The average I have to calculate takes no account of how many months the person worked, it always considers 12. This I changed and worked well.

 

- The Regular worker has to work from 15 to 25 shifts per month every month of the year except for two to consider vacations, so I solve that with an "and 'months worked' >=10", but couldn't figure out how to put that the range for every month must be between 15 to 25, that's different from the average and I wouldn't know how to add it on Power Query. On DAX I put a MIN and a MAX, but I'm not sure that solves it.
Could you please check this out to see if we can get to a perfect match on Regular too? Well it seems by the data that it will be almost impossible for a worker to get a Regular Status, given so many rules, but the data from one worker could be change to see if this works.
Here is the link for the pbix file 

Thank you for your time,
Best regards,
Gabriela

 

Yes, you can do this, but I had to think about it. See my PBIX file again and look at the [Hoja1] query. The original stil lthere in [Hoja1 Original]

 

I basically had to figure out who was Regular (15-25 shifts) and worked 10+ months that way.

  1. I started with the original query.
  2. After the step "Grouped Rows to combine duplicates" I added 4 steps:
    1. First I added a simple column called Temp Status that if [Total Shifts] is between 15 and 25, categorize as Regular, otherwise null.
    2. I then modified the "Group Rows by ID and Year" step to add an additional column, [All Rows] - it is in red below
      1. 2020-03-12 19_01_53-Group By.png
      2. This returns a column with a table in it. In the one I've shown below, it has 9 records with "Regular" in it, and 3 that are null. This particular record will fail the next test as we need 10-12 without null.
      3. 2020-03-12 19_02_43-20200312 - Categorizing Workers - Power Query Editor.png
    3. Added the "Added Filtered List of Rows <> null" step which returns filters out the nulls from the previous step.
      1. =Table.SelectRows([All Rows], each [Temp Status] <> null)
      2. You can see on the 2nd record, this returns a nested table of just two records. This will also fail
      3. 2020-03-12 19_07_52-20200312 - Categorizing Workers - Power Query Editor.png
    4. Next I added a formula to count those filtered rows
      1. = Table.RowCount([Regular Filter Out Null])
      2. Now I know 100% for sure who worked 10+ months each year between 15-25 shifts
  3. I then changed the step that adds the status to be the formula below. The "Regular" assignment was near the end. Now it is first and only checks for the row count of the filtered table.

 

if [All Rows Count] >= 10 then "Regular" else 
if [Total Shifts] >= 15 and [Total Shifts] <= 25 then "Occasional" else
if [Months Worked] <= 4 and [Average Shifts] >= 5.5 and [Average Shifts] <= 20.5 then "Seasonal" else
if [Months Worked] > 4 and [Average Shifts] >= 5.5 and [Average Shifts] <= 20.5 then "Sporadic" else
"Other"

 

 

Last step was to keep only those columns I needed and remove all of that intermediate junk.

 

Kudos appreciated on this one if it is helpful. The query was pretty simple, but I had to noodle through the logic of how to get a reliable count since Average wasn't going to assure it met your requirements.

 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group!

MBAS on Demand

Microsoft Business Applications Summit sessions

On-demand access to all the great content presented by the product teams and community members! #MSBizAppsSummit #CommunityRocks

MBAS Attendee Badge

Claim Your Badge & Digital Swag!

Check out how to claim yours today!

secondImage

Are You Ready?

Test your skills now with the Cloud Skills Challenge.

Top Solution Authors