Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
thmonte
Helper IV
Helper IV

Need Help with Data Shaping for performance and use

I am currently building reports for some survey data that asks a bunch of questions.  Each question response is its own column in the data.  The survey can be completed for multiple people in one entry so there is columns for Person 1, Person 2, Person 3.

 

For the purpose of reporting on a specific questions values (Ex. How many times quesiton 1 was a no) and bucketing some questions into groups I felt like I needed to unpivot the question columns so they each have their own row per question but then I ran into the issue of reporting on people individually so I then thought to create 3 seperate query for each person and combine them.  That creates alot of rows for just one survey record and can put a toll on running the query the bigger the dataset gets.  Is there a better way to do it, acheive the same result and not hurt the performance?

 

Here is an example Data Set

 

DateLocationStaff 1 NameStaff 1 IDStaff 2 NameStaff 2 IDStaff 3 NameStaff 3 IDQuestion 1Question 2Question 3Question 4Question 5Question 6Question 7Question 8Question 9Question 10
2/1/2019Location APerson 1001Person 8008Person 15015yesyesyesnoyesnoyesnonoyes
2/1/2019Location BPerson 2     yesyesyesnoyesnoyesnonoyes
2/2/2019Location CPerson 3003Person 10010Person 17017yesyesyesnoyesnoyesnonoyes
2/2/2019Location DPerson 4004Person 11011  yesyesyesnoyesnoyesnonoyes
2/3/2019Location EPerson 5005Person 12012Person 19019yesyesyesnoyesnoyesnonoyes
2/3/2019Location FPerson 6     yesyesyesnoyesnoyesnonoyes
2/4/2019Location GPerson 7007Person 14014  yesyesyesnoyesnoyesnono

yes

 

I thought about not unpivoting the question columns which would save alot of rows but then I felt like I would lose the grouping of the questions.  For example:

 

If I want to group questions 1-5 as Section 1 and 6-10 as Section 2 and then say Section 1 is being answered as yes 20% of the time and Section 2 as yes 80% of the time in a matrix visual (each section getting its own row and then the column would be the percentage value) is that possible?

Hoping I'm getting the end goal across in text the best I can!

1 REPLY 1
Anonymous
Not applicable

@thmonte 

 

In my experience in IRT, the optimal model - even for large scale - is one row per respondent and one column per question.

I would split up the different rows per person and add a column with a an identifier per grouped survey (repeating 1 to 3 times based on the number of people participating).

 

Lastly, performance wise, you should remap your yes/no into 0/1 (as integer). If it is just one table, Power BI should be able to perform well, even with a large number of rows.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.