Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Trying to parse a JSON object where one of the values is a list of multiple integers. In a separate table (the lookup table), I have a list of integers which map to a text string. For example (this data is from themoviedb.org😞
Data Table
Title genre_ids Wonder Woman 28,12,14,878
Lookup Table
genre.id genre.name 28 Action 12 Adventure 16 Animation 35 Comedy ...
Question
In the Data Table, how would I make a new column (e.g. genre_fullname) where the contents are the string name taken from the look up table? In other words, how can I make this?
Title genre_ids genre_fullname Wonder Woman 28,12,14,878 Action, Adventure, Fantasy, Science Fiction
Bonus Question
Is this type of multiple data within a cell the right way to go about this? In my data analysis with Python I strive for tidy data, with one value per cell (so the above row would actually be 4 rows). I see thatPower BI has an option to expand lists into individual rows rather than the comma separated values like I have above. Which approach is the *right* way? (End goal: Have a list of movies with their genres listed next to them, and allow filtering the list by genre)
Solved! Go to Solution.
You can do this by edit query, and the final power query looke like:
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCs/PS0ktUgjPz03MU9JRMrLQMTTSMTTRsTC3UIrVgcv7gmWBUsZmOkA1xqZKsbEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Title = _t, Genre_ids = _t]), #"Split Column by Delimiter" = Table.SplitColumn(Source, "Genre_ids", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Genre_ids.1", "Genre_ids.2", "Genre_ids.3", "Genre_ids.4"}), #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter", {"Title"}, "Attribute", "Value"), #"Merged Queries" = Table.NestedJoin(#"Unpivoted Columns",{"Value"},Lookup,{"Genre_ids"},"NewColumn",JoinKind.LeftOuter), #"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {"Genre.name"}, {"NewColumn.Genre.name"}), #"Grouped Rows" = Table.Group(#"Expanded NewColumn", {"Title"}, {{"Genre_ids", each Text.Combine([Value],","), type text}, {"Genre_fullname", each Text.Combine([NewColumn.Genre.name],","), type text}}) in #"Grouped Rows"
Regards,
Charlie Liao
A better alternative is to split the cells to nested lists and expand these.
When splitting into columns, you get code with hard coded column names which will not be adjusted if you have more than 4 gneres (in this case).
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCs/PS0ktUgjPz03MU9JRMrLQMTTSMTTRsTC3UIrVgcv7gmWBUsZmOkA1xqZKsbEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Title = _t, Genre_ids = _t]), Splitted = Table.TransformColumns(Source,{{"Genre_ids", each Text.Split(_,",")}}), Expanded1 = Table.ExpandListColumn(Splitted, "Genre_ids"), Merged = Table.NestedJoin(Expanded1,{"Genre_ids"},Lookup,{"Genre_ids"},"NewColumn",JoinKind.LeftOuter), Expanded2 = Table.ExpandTableColumn(Merged, "NewColumn", {"Genre.name"}, {"Genre.name"}) in Expanded2
I also left out the last "group by" step as I understand this is not desired.
Thank you to @v-caliao-msft and @MarcelBeug for your help! I combined both of your work into what I needed.
For anyone interested, here is code that creates two tables, a main (data) table and a lookup table that demonstrates what I originally set out to accomplish.
Main Table
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSsssKi5RKMkoSk1V0lEy1FEw0lEwVorViVZKLUvNAwoB+SY6CmY6ChZg0fyUFIg6Yx0FU6XYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Group = _t, Numbers = _t]), #"Changed Type" = Table.TransformColumnTypes(Source, {{"Group", type text}, {"Numbers", type text}}), #"Split on commas" = Table.TransformColumns(Source, {{"Numbers", each Text.Split(_, ", ")}}), #"Expand Split" = Table.ExpandListColumn(#"Split on commas", "Numbers"), #"Lookup Values" = Table.NestedJoin(#"Expand Split", {"Numbers"}, #"Lookup Table", {"numeral"}, "number as word", JoinKind.LeftOuter), #"Expand Lookup" = Table.ExpandTableColumn(#"Lookup Values", "number as word", {"word"}, {"number as word"}), #"Combine Rows" = Table.Group(#"Expand Lookup", {"Group"}, { {"Numbers", each Text.Combine([Numbers], ", "), type text}, {"numbers as word", each Text.Combine([number as word], ", "), type text} }) in #"Combine Rows"
Lookup Table
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("Lc09DsAgCIDRuzA71P73LMYRK4sm1toev4DdvhcIOAcWDOSE4I2Dkbs+WXuSjgX7ZGaFfBfFIqDWJyvjold7k8aGSbWzkM5YVQcr0f/IDnJd9vwH", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [numeral = _t, word = _t]) in Source
You can do this by edit query, and the final power query looke like:
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCs/PS0ktUgjPz03MU9JRMrLQMTTSMTTRsTC3UIrVgcv7gmWBUsZmOkA1xqZKsbEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Title = _t, Genre_ids = _t]), #"Split Column by Delimiter" = Table.SplitColumn(Source, "Genre_ids", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Genre_ids.1", "Genre_ids.2", "Genre_ids.3", "Genre_ids.4"}), #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Split Column by Delimiter", {"Title"}, "Attribute", "Value"), #"Merged Queries" = Table.NestedJoin(#"Unpivoted Columns",{"Value"},Lookup,{"Genre_ids"},"NewColumn",JoinKind.LeftOuter), #"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {"Genre.name"}, {"NewColumn.Genre.name"}), #"Grouped Rows" = Table.Group(#"Expanded NewColumn", {"Title"}, {{"Genre_ids", each Text.Combine([Value],","), type text}, {"Genre_fullname", each Text.Combine([NewColumn.Genre.name],","), type text}}) in #"Grouped Rows"
Regards,
Charlie Liao
A better alternative is to split the cells to nested lists and expand these.
When splitting into columns, you get code with hard coded column names which will not be adjusted if you have more than 4 gneres (in this case).
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCs/PS0ktUgjPz03MU9JRMrLQMTTSMTTRsTC3UIrVgcv7gmWBUsZmOkA1xqZKsbEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Title = _t, Genre_ids = _t]), Splitted = Table.TransformColumns(Source,{{"Genre_ids", each Text.Split(_,",")}}), Expanded1 = Table.ExpandListColumn(Splitted, "Genre_ids"), Merged = Table.NestedJoin(Expanded1,{"Genre_ids"},Lookup,{"Genre_ids"},"NewColumn",JoinKind.LeftOuter), Expanded2 = Table.ExpandTableColumn(Merged, "NewColumn", {"Genre.name"}, {"Genre.name"}) in Expanded2
I also left out the last "group by" step as I understand this is not desired.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
111 | |
100 | |
80 | |
64 | |
58 |
User | Count |
---|---|
146 | |
110 | |
93 | |
84 | |
67 |