cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
BobBonner Frequent Visitor
Frequent Visitor

Iterative matching using weighting of multiple columns

I have to come up with an algorithm to find suitable matches for mentors and proteges.  To simplify it (maybe), we have a table with employee id, role (mentor or protegee), 5 columns of answers to preferences (where the weighting will happen).  In the columns for preferences, an employee has ranked his/her preferences 1 through 5.  E.g. PreferenceTopic1 = 4, PreferenceTopic2 = 3, PreferenceTopic3 = 1, PreferenceTopic4 = 5, and PreferenceTopic5 = 2.  If this person were a protege, a good mentor would have rank similarly.  So, I have to iterate through combinations of employee ids who are proteges against the employee ids who are mentors and weight them on the shortest differences in preference topics.  E.g. If possible mentor rates PT1 = 4, PT2 = 3, PT3 = 2, PT4 = 5, PT5 = 1, then they would be a really good match for the original example.  There low number (good match) would be 0+0+1+0+1 = 2.  I'm not worried about perfectly matching every mentor to a protege, but care about ranking them as "best fits".  Not sure if this is best suited for m language in editor or DAX in formula bar.

3 REPLIES 3
Super User
Super User

Re: Iterative matching using weighting of multiple columns

You can do this in the query editor:

1) create a list with the profiles attributes for both

2) do a cartesian join, so that each item will be compared with all of the others.

3) For the comparison use List.Sum(List.Transform(List.Zip({protegeList, mentorList}), each Number.Abs(_{0}-_{1}))). That will add up all deviations

 

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




Super User
Super User

Re: Iterative matching using weighting of multiple columns

@BobBonner

Assuming you use @ImkeF's method to calculate the "difference" between all possible pairs of Mentors and Proteges, do you also want to come up with an optimal pairing of Mentors and Proteges?

 

This would be an example of the Assignment Problem (assuming you define your objective as minimizing the sum of differences between pairs selected), which you could solve using lp.assign in the lpSolve R package, within Power Query.

I have previously used the lpSolve package in the Power BI service to solve linear programming problems, so there shouldn't be a problem using that package in the service.

 

It would be possible but difficult to write your own M code to solve the Assignment Problem in Power Query without using R, so would certainly recommend R.

 

Post back if you need help with that.

 

Regards,

Owen



Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!




Super User
Super User

Re: Iterative matching using weighting of multiple columns

Thanks @OwenAuger for pointing towards this pattern! 

As this seems to be a general pattern, I thought it is worth a try in M ;-) 

 

 
let
    CatFields = {"pref1","pref2","pref3","pref4","pref5"}, 
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUcpNzSvJLwIyQBwjIDYGYhMgNlWK1YkGi8CVwKQg0jAlxuhKjKDYEIxBSkAaCoryS1LTU6FaTZDUQtSYoqjB7hozFDXGUGwIxUA1sQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"employee id" = _t, role = _t, pref1 = _t, pref2 = _t, pref3 = _t, pref4 = _t, pref5 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"employee id", Int64.Type}, {"role", type text}, {"pref1", Int64.Type}, {"pref2", Int64.Type}, {"pref3", Int64.Type}, {"pref4", Int64.Type}, {"pref5", Int64.Type}}),
    Mentors = Table.Buffer(Table.SelectRows(#"Changed Type", each ([role] = "mentor"))),
    Proteges = Table.SelectRows(#"Changed Type", each ([role] = "protege")),

	IterativeCombine = List.Skip(List.Generate( ()=>
		[Proteges = Proteges, Counter = 0],
		each [Counter]<=Table.RowCount(Mentors),
		each [	SelectedMentor = Record.FieldValues(Record.SelectFields(Mentors{[Counter]}, CatFields)),
				SelectVals = Table.AddColumn([Proteges], "CatVals", each Record.FieldValues(Record.SelectFields(_, CatFields))),
				CombineWithSelectedMentor = Table.AddColumn(SelectVals, "Combine", (x) => List.Zip({x[CatVals],List.Repeat(SelectedMentor, List.Count(x[CatVals]))})),
				Score = Table.AddColumn(CombineWithSelectedMentor, "Score", each List.Sum(List.Transform([Combine], each Number.Abs(_{0}-_{1})))),
				EmpID = Table.SelectRows(Score, each ([Score] = List.Min(Score[Score]))){0}[employee id],
				Proteges = Table.SelectRows([Proteges], each [employee id] <> EmpID),
				Counter = [Counter]+1],
		each [EmpID]
	)),
    CreateTable = Table.FromColumns({Mentors[employee id], IterativeCombine}, {"Mentor", "Protege"})
in
    CreateTable

It doesn't use the algorithm you mentioned, just iterates through the (table of) mentors, checks who is closest, and returns the first of that result. Then it eliminates the allocated protege from the table of proteges that goes into the next round of matching.

So not a real optimization, as it is dependent on the order of the elements, but a reasonalbe result.

 

Would be interested if that solves your problem @BobBonner and how the performance of this code is. Thx!

(You have to replace the code in step "Source" by a reference to your table and eventually adjust the values in step "CatFields")

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries