cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
macas01
Frequent Visitor

Group data according to a lookup table and select the first non-null value

Hello all,

I am trying to solve the following problem in in Power Bi using Power Query. I have found some stuff around this forum but it is not exactly what i need to do, and i am not so expert to figure out by myself.

 

I have a lookup table like this:

NAME  ID
aaaid12
aaaid34
aaaid23
bbbid15
bbbid25
cccid67
cccid54
cccid33
dddid20
dddid65
dddid78

 

And a Raw Data Table like this:

(The very first row is not part of table. I added it for the sake of clarity)

 

  #aaa#aaa#aaa#bbb#bbb#ccc#ccc#ccc#ddd#ddd#ddd
timestamp              object  id12id34id23id15id25id67id54id33id20id65id78
13/04/2022 13:54L13202  4841  5548   7142
12/04/2022 04:22L32541  8963  5047   7596
11/04/2022 06:47N6 6165 28348034  35697984  
10/04/2022 18:42M42586 8555 61359539  6527  
09/04/2022 11:11M41290 9642 99445077  3355  
08/04/2022 12:11L28060  5835  8539   6354
07/04/2022 14:00N6 6910 39553967  28689192  
06/04/2022 13:56N6 2370 30024200  72578699  
05/04/2022 01:11L17230  4520  1606   3594
04/04/2022 08:37M43205 5283 89868943  5553  
03/04/2022 23:37L14740  6182  9407   6881
02/04/2022 10:00L21445  6379  1282   7989
01/04/2022 14:37L19878  3534  7907   2312

 

I would like to create an output table like the following:

timestamp              object   aaabbbcccddd
13/04/2022 13:54L13202484155487142
12/04/2022 04:22L32541896350477596
11/04/2022 06:47N66165283435697984
10/04/2022 18:42M42586613595396527
09/04/2022 11:11M41290994450773355
08/04/2022 12:11L28060583585396354
07/04/2022 14:00N66910395528689192
06/04/2022 13:56N62370300272578699
05/04/2022 01:11L17230452016063594
04/04/2022 08:37M43205898689435553
03/04/2022 23:37L14740618294076881
02/04/2022 10:00L21445637912827989
01/04/2022 14:37L19878353479072312

 

So that, it reprocess the raw data table replacing the "idxx" columns with "name" according to the lookup table and assign for that "name" only one value. The value must be the first non-null value of that "name" group for each row

Hope that i have been clear enough. The output example should help to better understand what i am trying to do here

 

Please note that the Lookup table and the Raw data table are both queried from MySQL database, so if necessary i could rearrange those two tables when loading them in power bi.

 

Thank you

1 ACCEPTED SOLUTION
Vijay_A_Verma
Super User
Super User

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately. If you have columns other than these, then delete Changed type step and do a Changed type for complete table from UI again)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VZPZjQQhDERTWfX3SINPbGLo3QRGk38ay2EakBCX4FG2i8/n4jfQGxPiD1ARvl7XDbWjulWH0dgYnoUI27OoLQPj9X11Eg4SF2yXb6odynbXXGmBEucDJK4TBAOkpR/503FCQWXM0KhJtdSH0UjUG8Zt7gUsRXxWuMn65S7LAmoiMvHUZi7kD1QF80HzgEEBmDBAT+OQK0fa3Jl7kHkFSTSfCpgFDAfsxh6TppUjI3kWtgtr2qjWa4BygLiktCfMIVjk/WlyXXLQtFXSwfFQpcsPusGQ8oSlbg7GtJRmlEY2dT9gErCIEPpRWvdYcC1Ak+4RkviMkAfHCuWZ9urRSI5UP0yH9bqa82Y0ETo0heGRBqyL4sxLh4It9zunw6dqBgEKv0Maae/1A+ZVMqW8Sga4UYdTfXr0qd9S5JbXRyPZnJ79VIQE9Qt+/wE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [timestamp = _t, object = _t, id12 = _t, id34 = _t, id23 = _t, id15 = _t, id25 = _t, id67 = _t, id54 = _t, id33 = _t, id20 = _t, id65 = _t, id78 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"timestamp", type datetime}, {"object", type text}, {"id12", Int64.Type}, {"id34", Int64.Type}, {"id23", Int64.Type}, {"id15", Int64.Type}, {"id25", Int64.Type}, {"id67", Int64.Type}, {"id54", Int64.Type}, {"id33", Int64.Type}, {"id20", Int64.Type}, {"id65", Int64.Type}, {"id78", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"timestamp", "object", "Index"}, "Attribute", "Value"),
    #"Added Custom" = Table.AddColumn(#"Unpivoted Other Columns", "Custom", each Table1{[ID=[Attribute]]}[NAME]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Attribute"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Custom]), "Custom", "Value", List.Sum),
    #"Sorted Rows" = Table.Sort(#"Pivoted Column",{{"Index", Order.Ascending}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"Index"})
in
    #"Removed Columns1"

Test code for Table1

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSkxMVNJRykwxNFKK1UFwjU1QuEbGYG5SUhJEsSkK1wjCTU5OBnPNzFG4piYoXGOIUSkpKRC9BihcM1MUrrmFUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [NAME = _t, ID = _t])
in
    Source

View solution in original post

4 REPLIES 4
Vijay_A_Verma
Super User
Super User

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately. If you have columns other than these, then delete Changed type step and do a Changed type for complete table from UI again)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VZPZjQQhDERTWfX3SINPbGLo3QRGk38ay2EakBCX4FG2i8/n4jfQGxPiD1ARvl7XDbWjulWH0dgYnoUI27OoLQPj9X11Eg4SF2yXb6odynbXXGmBEucDJK4TBAOkpR/503FCQWXM0KhJtdSH0UjUG8Zt7gUsRXxWuMn65S7LAmoiMvHUZi7kD1QF80HzgEEBmDBAT+OQK0fa3Jl7kHkFSTSfCpgFDAfsxh6TppUjI3kWtgtr2qjWa4BygLiktCfMIVjk/WlyXXLQtFXSwfFQpcsPusGQ8oSlbg7GtJRmlEY2dT9gErCIEPpRWvdYcC1Ak+4RkviMkAfHCuWZ9urRSI5UP0yH9bqa82Y0ETo0heGRBqyL4sxLh4It9zunw6dqBgEKv0Maae/1A+ZVMqW8Sga4UYdTfXr0qd9S5JbXRyPZnJ79VIQE9Qt+/wE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [timestamp = _t, object = _t, id12 = _t, id34 = _t, id23 = _t, id15 = _t, id25 = _t, id67 = _t, id54 = _t, id33 = _t, id20 = _t, id65 = _t, id78 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"timestamp", type datetime}, {"object", type text}, {"id12", Int64.Type}, {"id34", Int64.Type}, {"id23", Int64.Type}, {"id15", Int64.Type}, {"id25", Int64.Type}, {"id67", Int64.Type}, {"id54", Int64.Type}, {"id33", Int64.Type}, {"id20", Int64.Type}, {"id65", Int64.Type}, {"id78", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"timestamp", "object", "Index"}, "Attribute", "Value"),
    #"Added Custom" = Table.AddColumn(#"Unpivoted Other Columns", "Custom", each Table1{[ID=[Attribute]]}[NAME]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Attribute"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns", List.Distinct(#"Removed Columns"[Custom]), "Custom", "Value", List.Sum),
    #"Sorted Rows" = Table.Sort(#"Pivoted Column",{{"Index", Order.Ascending}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"Index"})
in
    #"Removed Columns1"

Test code for Table1

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSkxMVNJRykwxNFKK1UFwjU1QuEbGYG5SUhJEsSkK1wjCTU5OBnPNzFG4piYoXGOIUSkpKRC9BihcM1MUrrmFUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [NAME = _t, ID = _t])
in
    Source

Hello Vijay_A_Verma,

Sorry to bug you again, but i was too excited for the quick solution (i have been on it for 1 entire week) that i didn't notice that it does not cover all cases.

There are some cases in the raw data table with more the one value for each id group, for example:

  #aaa#aaa#aaa#bbb#bbb#ccc#ccc#ccc#ddd#ddd#ddd
timestamp              object  id12id34id23id15id25id67id54id33id20id65id78
10/04/2022 18:42M42586 8555 61359539  6527  
09/04/2022 11:11M41290 9642 99445077  3355  

 

Here for the "aaa" there are 2 values, and i would like to take only the first non-null value, so i am expecting 

timestamp              object   aaabbbcccddd
10/04/2022 18:42M42586613595396527
09/04/2022 11:11M41290994450773355

 

While with your solutions, it is adding all the "idxx" of same group:

Screenshot 2022-05-30 at 16.25.02.png

Do you know how it could be possible to manage this cases as well?

Thank you!

@Vijay_A_Verma 

Never mind, i figured it out

I added a "remove duplicate" steps, selecting the "custom" and the "timestamp" (in order to have an unique releation for each row). Is not so elegant as solution but it does the trick!

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VZPbjcQgDEVbWeV7pMFPbGrIbgOj6b+N5WECSIgAgcO1ffl8Ln4DvTEh/gAV4et13VA7qkv1MxobwzMRYXsmtWVgvL6vTsJB4oLt8E21Q9nOmistUOJ8gMR1gmCAtPQtfzp2KKiMERo1qZb6ZzQS9YZxm2sBSxGfFW6yfrnLsoCaiEw8tZEL+QNVwXzQPGBQACYM0NPY5MqRNnfmHmReQRLNqwJmAcMBu7HHpGnlyEieie3Cmjaq9RqgHCAuKe0JcwgWeb+aXJccNG2VdHA8VOnyg24wpDxhqZuDMS2lGaWRTd0PmAQsIoQ9ABZcANCk+08Sn9HxYFihPFNe/RmJkeqF6a5eU3PeTCZCh54wO9KAdUGceelQsOV853R4VM0gQOF1SCPlvXbAvMqllFe5ADfqcKlPfz61W4rc8npkJJvLs5+KkKA+v+8/", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [timestamp = _t, object = _t, id12 = _t, id34 = _t, id23 = _t, id15 = _t, id25 = _t, id67 = _t, id54 = _t, id33 = _t, id20 = _t, id65 = _t, id78 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"timestamp", type datetime}, {"object", type text}, {"id12", Int64.Type}, {"id34", Int64.Type}, {"id23", Int64.Type}, {"id15", Int64.Type}, {"id25", Int64.Type}, {"id67", Int64.Type}, {"id54", Int64.Type}, {"id33", Int64.Type}, {"id20", Int64.Type}, {"id65", Int64.Type}, {"id78", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"timestamp", "object", "Index"}, "Attribute", "Value"),
    #"Added Custom" = Table.AddColumn(#"Unpivoted Other Columns", "Custom", each Table1{[ID=[Attribute]]}[NAME]),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Attribute"}),
    #"Removed Duplicates" = Table.Distinct(#"Removed Columns", {"Custom", "timestamp"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Custom]), "Custom", "Value", List.Sum),
    #"Sorted Rows" = Table.Sort(#"Pivoted Column",{{"Index", Order.Ascending}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"Index"})
in
    #"Removed Columns1"

 Thanks!

Thank you very much Vijay_A_Verma!

It seems exactly what i was looking for. I am going to try it on my actual data (that are a bit different) and let you know!

But it looks like its working, so i am not expecting surprises.
Thanks again!
 

Helpful resources

Announcements
collabdays lisbon

CollabDays Lisbon - 26 November 2022

Sessions include practical, hands-on experience that will help you take the next step in your career and know-how.

Difinity Conference

Difinity Conference 2022

Difinity is the largest Microsoft Data, AI, Power BI, Power Platform and Business Applications Conference in New Zealand.

European Share Point Conference

European Share Point Conference

This conference returns live and in-person in Copenhagen this 28 November-01 December with 4 Microsoft Keynotes, 9 Tutorials, and 120 Sessions.

Microsoft 365 Conference â__ December 6-8, 2022

Microsoft 365 Conference - 06-08 December

Join us in Las Vegas to experience community, incredible learning opportunities, and connections that will help grow skills, know-how, and more.