Need Help with Data Shaping for performance and us...

thmonte · ‎02-09-2019

I am currently building reports for some survey data that asks a bunch of questions. Each question response is its own column in the data. The survey can be completed for multiple people in one entry so there is columns for Person 1, Person 2, Person 3.

For the purpose of reporting on a specific questions values (Ex. How many times quesiton 1 was a no) and bucketing some questions into groups I felt like I needed to unpivot the question columns so they each have their own row per question but then I ran into the issue of reporting on people individually so I then thought to create 3 seperate query for each person and combine them. That creates alot of rows for just one survey record and can put a toll on running the query the bigger the dataset gets. Is there a better way to do it, acheive the same result and not hurt the performance?

Here is an example Data Set

Date

Location

Staff 1 Name

Staff 1 ID

Staff 2 Name

Staff 2 ID

Staff 3 Name

Staff 3 ID

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

2/1/2019

Location A

Person 1

001

Person 8

008

Person 15

015

yes

no

yes

no

yes

no

yes

2/1/2019

Location B

Person 2

yes

no

yes

no

yes

no

yes

2/2/2019

Location C

Person 3

003

Person 10

010

Person 17

017

yes

no

yes

no

yes

no

yes

2/2/2019

Location D

Person 4

004

Person 11

011

yes

no

yes

no

yes

no

yes

2/3/2019

Location E

Person 5

005

Person 12

012

Person 19

019

yes

no

yes

no

yes

no

yes

2/3/2019

Location F

Person 6

yes

no

yes

no

yes

no

yes

2/4/2019

Location G

Person 7

007

Person 14

014

yes

no

yes

no

yes

no

yes

I thought about not unpivoting the question columns which would save alot of rows but then I felt like I would lose the grouping of the questions. For example:

If I want to group questions 1-5 as Section 1 and 6-10 as Section 2 and then say Section 1 is being answered as yes 20% of the time and Section 2 as yes 80% of the time in a matrix visual (each section getting its own row and then the column would be the percentage value) is that possible?

Hoping I'm getting the end goal across in text the best I can!

Anonymous · ‎10-28-2019

@thmonte

In my experience in IRT, the optimal model - even for large scale - is one row per respondent and one column per question.

I would split up the different rows per person and add a column with a an identifier per grouped survey (repeating 1 to 3 times based on the number of people participating).

Lastly, performance wise, you should remap your yes/no into 0/1 (as integer). If it is just one table, Power BI should be able to perform well, even with a large number of rows.

Need Help with Data Shaping for performance and use

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly