Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Keith011
Helper III
Helper III

Shaping data and calculate weighted average

Hello!

 

Been cracking my head on this. 

My goal: to achieve Exhibit 3

Thank you in advance!

 

So here's my situation.

 

Exhibit 1 is my raw data in Excel. This is a survey kind of response. We're asking people to rate the importance of these 4 elements.

  • 1 = the most important
  • 4 = the least important

For example, Person A says Family is the most important, followed by Money, Health, and then Friends. So on and so forth.

Exhibit 1

NameAgeGenderQ1. FamilyQ1. MoneyQ1. HealthQ1. Friends
Person A22F1234
Person B21F3124
Person C25F4312
Person D26M2143
Person E27M2413
Person F22M1324

 

Exhibit 2 is where I have re-arranged it properly in Excel. I paste-transpose the survey answers vertically and the ratings are in the first row from 1 - 4. The values (in orange font) are basically the count of ratings for each survey answer.

 

For example in the Rating 1 column: there were 2 people rated 1 for Family, 2 people rated 1 for Money 2 people rated 1 for Health and 0 people rated 1 for Friends. so on and so forth

 

Exhibit 2

Rating1234
Q1. Family2211
Q1. Money2121
Q1. Health2211
Q1. Friends0123

 

Exhibit 3 is where I need to calculate the weighted average (or at least that's what they called it). The formula is to take the value multiplied by the level of importance. 

 

For example, all of the values under Rating 1 will multiply by 4 hence you see there are 8, 8, 8, and 0 in the Rating 1 column.

Another example, all of the values under Rating 2 will multiply by 3 hence you see there are 6, 3, 6, 3 in the Rating 2 column.

Another example, all of the values under Rating 3 will multiply by 2 hence you see there are 2, 4, 2, 4 in the Rating 3 column.

Another example, all of the values under Rating 4 will multiply by 1 hence you see there are 1, 1, 1, 3 in the Rating 4 column.

 

After that I'll need to sum up the rows hence there is a Total Column. Then the Total will be divided by 6 as there are 6 people on this list

 

Exhibit 3

Rating1234Totalweighted average
Q1. Family8621172.83
Q1. Money8341162.67
Q1. Health8621172.83
Q1. Friends0343101.67
2 ACCEPTED SOLUTIONS

 

Here's example code for normalising your raw data in PQ:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtKs7PU3BU0lEyMgISbkBsCOIAsTEQmyjF6sBVOYEkDKGqjJFUoqhyBgmaQlWZoKhEUuUCEjADEr5QMwzhqpFUuYIkzZFUmUBVoqhyg7neFy4Ld1csAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Age = _t, Gender = _t, #"Q1. Family" = _t, #"Q1. Money" = _t, #"Q1. Health" = _t, #"Q1. Friends" = _t]),
    unpivOthCols = Table.UnpivotOtherColumns(Source, {"Name", "Age", "Gender"}, "Option", "Response"),
    addWeight = Table.AddColumn(unpivOthCols, "Weight", each (4 - Number.From([Response])) + 1, type number)
in
    addWeight

It's literally just unpivoting and adding a semi-hardcoded [Weight] column.

In a production model, you would further normalise this by removing all the 'Person' fields except the person identifier (in this case [Name]), and using a related dimension table for the person information fields (in this case [Age] and [Gender]). There's an example of this in the working PBIX I've attached below if you want to take this extra step (you should).

 

Once you have your data in this efficient format, you just need a few small measures:

// Count the number of responses:
_noofResponses = COUNTROWS(factTable)

// Get the sum of weight value
_sumWeight = SUM(factTable[Weight])

// Calculate weighted average:
_weightedAvg = DIVIDE([_sumWeight], [_noofResponses], 0)

 

Example Output:

BA_Pete_0-1669972494947.png

 

The beauty of setting it up like this is that you can now calculate any of your measures over any dimensions, so you're not stuck with a single hard-coded data format. It's also very fast and HDD size efficient.

 

Let me know how you get on.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

 

Yes, that's possible. Use this in your [Weight] column instead:

( Number.From(List.Max(previousStepName[Response])) - Number.From([Response]) ) + 1

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




View solution in original post

6 REPLIES 6
BA_Pete
Super User
Super User

Hi @Keith011 ,

 

Are you bringing the data into Power BI to be reported on, or is it all staying in Excel?

The reason I ask is that the most efficient way to create your output would be to normalise your raw data then create measures for the values, but this is harder in Excel so would stick to a PQ-only solution if you're not using Power BI.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




hi @BA_Pete 

 

I'm doing it in Power BI by using Power Query , is this possible?

 

when you said normalise my raw data meaning to prepare my raw data like how i intend it to be in excel only then i visualize it in power bi?

 

No, I mean that we'll leave your raw data in Excel, import it into PBI Power Query, transform it there into the most efficient structure for storage and scanning, then write the measures over it that will allow you to visualise it in Power BI how you want it.

 

Leave it with me, I'll have a look first thing this morning and get back to you.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




 

Here's example code for normalising your raw data in PQ:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkgtKs7PU3BU0lEyMgISbkBsCOIAsTEQmyjF6sBVOYEkDKGqjJFUoqhyBgmaQlWZoKhEUuUCEjADEr5QMwzhqpFUuYIkzZFUmUBVoqhyg7neFy4Ld1csAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Age = _t, Gender = _t, #"Q1. Family" = _t, #"Q1. Money" = _t, #"Q1. Health" = _t, #"Q1. Friends" = _t]),
    unpivOthCols = Table.UnpivotOtherColumns(Source, {"Name", "Age", "Gender"}, "Option", "Response"),
    addWeight = Table.AddColumn(unpivOthCols, "Weight", each (4 - Number.From([Response])) + 1, type number)
in
    addWeight

It's literally just unpivoting and adding a semi-hardcoded [Weight] column.

In a production model, you would further normalise this by removing all the 'Person' fields except the person identifier (in this case [Name]), and using a related dimension table for the person information fields (in this case [Age] and [Gender]). There's an example of this in the working PBIX I've attached below if you want to take this extra step (you should).

 

Once you have your data in this efficient format, you just need a few small measures:

// Count the number of responses:
_noofResponses = COUNTROWS(factTable)

// Get the sum of weight value
_sumWeight = SUM(factTable[Weight])

// Calculate weighted average:
_weightedAvg = DIVIDE([_sumWeight], [_noofResponses], 0)

 

Example Output:

BA_Pete_0-1669972494947.png

 

The beauty of setting it up like this is that you can now calculate any of your measures over any dimensions, so you're not stuck with a single hard-coded data format. It's also very fast and HDD size efficient.

 

Let me know how you get on.

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




@BA_Pete Worked like a charm!! thanks for this Pete.

 

I have a question , the added column in PQ in your solution (like below), can the "4" be swaped with a MAX formula? For instance, another survey might have a rating from 1 to 13 rather than just 1 to 4. So if I could tell PQ to always take the MAX number instead of a hard-coded "4" 

each (4 - Number.From([Response])) + 1

 

is that possible?

 

Yes, that's possible. Use this in your [Weight] column instead:

( Number.From(List.Max(previousStepName[Response])) - Number.From([Response]) ) + 1

 

Pete



Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!




Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors