Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hello!
Been cracking my head on this.
My goal: to achieve Exhibit 3
Thank you in advance!
So here's my situation.
Exhibit 1 is my raw data in Excel. This is a survey kind of response. We're asking people to rate the importance of these 4 elements.
For example, Person A says Family is the most important, followed by Money, Health, and then Friends. So on and so forth.
Exhibit 1
Name | Age | Gender | Q1. Family | Q1. Money | Q1. Health | Q1. Friends |
Person A | 22 | F | 1 | 2 | 3 | 4 |
Person B | 21 | F | 3 | 1 | 2 | 4 |
Person C | 25 | F | 4 | 3 | 1 | 2 |
Person D | 26 | M | 2 | 1 | 4 | 3 |
Person E | 27 | M | 2 | 4 | 1 | 3 |
Person F | 22 | M | 1 | 3 | 2 | 4 |
Exhibit 2 is where I have re-arranged it properly in Excel. I paste-transpose the survey answers vertically and the ratings are in the first row from 1 - 4. The values (in orange font) are basically the count of ratings for each survey answer.
For example in the Rating 1 column: there were 2 people rated 1 for Family, 2 people rated 1 for Money 2 people rated 1 for Health and 0 people rated 1 for Friends. so on and so forth
Exhibit 2
Rating | 1 | 2 | 3 | 4 |
Q1. Family | 2 | 2 | 1 | 1 |
Q1. Money | 2 | 1 | 2 | 1 |
Q1. Health | 2 | 2 | 1 | 1 |
Q1. Friends | 0 | 1 | 2 | 3 |
Exhibit 3 is where I need to calculate the weighted average (or at least that's what they called it). The formula is to take the value multiplied by the level of importance.
For example, all of the values under Rating 1 will multiply by 4 hence you see there are 8, 8, 8, and 0 in the Rating 1 column.
Another example, all of the values under Rating 2 will multiply by 3 hence you see there are 6, 3, 6, 3 in the Rating 2 column.
Another example, all of the values under Rating 3 will multiply by 2 hence you see there are 2, 4, 2, 4 in the Rating 3 column.
Another example, all of the values under Rating 4 will multiply by 1 hence you see there are 1, 1, 1, 3 in the Rating 4 column.
After that I'll need to sum up the rows hence there is a Total Column. Then the Total will be divided by 6 as there are 6 people on this list
Exhibit 3
Rating | 1 | 2 | 3 | 4 | Total | weighted average |
Q1. Family | 8 | 6 | 2 | 1 | 17 | 2.83 |
Q1. Money | 8 | 3 | 4 | 1 | 16 | 2.67 |
Q1. Health | 8 | 6 | 2 | 1 | 17 | 2.83 |
Q1. Friends | 0 | 3 | 4 | 3 | 10 | 1.67 |
Solved! Go to Solution.
Here's example code for normalising your raw data in PQ:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtKs7PU3BU0lEyMgISbkBsCOIAsTEQmyjF6sBVOYEkDKGqjJFUoqhyBgmaQlWZoKhEUuUCEjADEr5QMwzhqpFUuYIkzZFUmUBVoqhyg7neFy4Ld1csAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Age = _t, Gender = _t, #"Q1. Family" = _t, #"Q1. Money" = _t, #"Q1. Health" = _t, #"Q1. Friends" = _t]),
unpivOthCols = Table.UnpivotOtherColumns(Source, {"Name", "Age", "Gender"}, "Option", "Response"),
addWeight = Table.AddColumn(unpivOthCols, "Weight", each (4 - Number.From([Response])) + 1, type number)
in
addWeight
It's literally just unpivoting and adding a semi-hardcoded [Weight] column.
In a production model, you would further normalise this by removing all the 'Person' fields except the person identifier (in this case [Name]), and using a related dimension table for the person information fields (in this case [Age] and [Gender]). There's an example of this in the working PBIX I've attached below if you want to take this extra step (you should).
Once you have your data in this efficient format, you just need a few small measures:
// Count the number of responses:
_noofResponses = COUNTROWS(factTable)
// Get the sum of weight value
_sumWeight = SUM(factTable[Weight])
// Calculate weighted average:
_weightedAvg = DIVIDE([_sumWeight], [_noofResponses], 0)
Example Output:
The beauty of setting it up like this is that you can now calculate any of your measures over any dimensions, so you're not stuck with a single hard-coded data format. It's also very fast and HDD size efficient.
Let me know how you get on.
Pete
Proud to be a Datanaut!
Yes, that's possible. Use this in your [Weight] column instead:
( Number.From(List.Max(previousStepName[Response])) - Number.From([Response]) ) + 1
Pete
Proud to be a Datanaut!
Hi @Keith011 ,
Are you bringing the data into Power BI to be reported on, or is it all staying in Excel?
The reason I ask is that the most efficient way to create your output would be to normalise your raw data then create measures for the values, but this is harder in Excel so would stick to a PQ-only solution if you're not using Power BI.
Pete
Proud to be a Datanaut!
hi @BA_Pete
I'm doing it in Power BI by using Power Query , is this possible?
when you said normalise my raw data meaning to prepare my raw data like how i intend it to be in excel only then i visualize it in power bi?
No, I mean that we'll leave your raw data in Excel, import it into PBI Power Query, transform it there into the most efficient structure for storage and scanning, then write the measures over it that will allow you to visualise it in Power BI how you want it.
Leave it with me, I'll have a look first thing this morning and get back to you.
Pete
Proud to be a Datanaut!
Here's example code for normalising your raw data in PQ:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtKs7PU3BU0lEyMgISbkBsCOIAsTEQmyjF6sBVOYEkDKGqjJFUoqhyBgmaQlWZoKhEUuUCEjADEr5QMwzhqpFUuYIkzZFUmUBVoqhyg7neFy4Ld1csAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Age = _t, Gender = _t, #"Q1. Family" = _t, #"Q1. Money" = _t, #"Q1. Health" = _t, #"Q1. Friends" = _t]),
unpivOthCols = Table.UnpivotOtherColumns(Source, {"Name", "Age", "Gender"}, "Option", "Response"),
addWeight = Table.AddColumn(unpivOthCols, "Weight", each (4 - Number.From([Response])) + 1, type number)
in
addWeight
It's literally just unpivoting and adding a semi-hardcoded [Weight] column.
In a production model, you would further normalise this by removing all the 'Person' fields except the person identifier (in this case [Name]), and using a related dimension table for the person information fields (in this case [Age] and [Gender]). There's an example of this in the working PBIX I've attached below if you want to take this extra step (you should).
Once you have your data in this efficient format, you just need a few small measures:
// Count the number of responses:
_noofResponses = COUNTROWS(factTable)
// Get the sum of weight value
_sumWeight = SUM(factTable[Weight])
// Calculate weighted average:
_weightedAvg = DIVIDE([_sumWeight], [_noofResponses], 0)
Example Output:
The beauty of setting it up like this is that you can now calculate any of your measures over any dimensions, so you're not stuck with a single hard-coded data format. It's also very fast and HDD size efficient.
Let me know how you get on.
Pete
Proud to be a Datanaut!
@BA_Pete Worked like a charm!! thanks for this Pete.
I have a question , the added column in PQ in your solution (like below), can the "4" be swaped with a MAX formula? For instance, another survey might have a rating from 1 to 13 rather than just 1 to 4. So if I could tell PQ to always take the MAX number instead of a hard-coded "4"
each (4 - Number.From([Response])) + 1
is that possible?
Yes, that's possible. Use this in your [Weight] column instead:
( Number.From(List.Max(previousStepName[Response])) - Number.From([Response]) ) + 1
Pete
Proud to be a Datanaut!