Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi all!
I am trying to combine two data sets which contain duplicate information. Instead of getting my desired outcome of assigning a unique identifier, I am left with more duplicates.
Data set 1
Name | Role |
a | x |
b | x |
c | y |
d | y |
Data set 2
Role | Position ID |
x | 1 |
x | 2 |
y | 3 |
y | 4 |
I am trying to assign the name to a unique position ID.
Instead I just get the following:
Name | Position ID |
a | 1 |
a | 1 |
b | 1 |
b | 1 |
c | 3 |
c | 3 |
Really hope you can help!!!
Thanks.
@Anonymous, try that:
Query name = ds1
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUapQitWJVkqCs5KBrEowKwXOSgWyqsCsNDgrHc7KALLKwaxMICtRKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Role = _t]),
#"Grouped Roles" = Table.Group(Source, {"Role"}, {{"Names", each [Name], type {text}}})
in
#"Grouped Roles"
Query name = ds2
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUdI1VIrVgTANwKwKIMsQzjICsyqBLGM4ywTMqgKyTOEsMzDLL79EIVGhKD8nVSEzTyGl2BAoY64UGwsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Role = _t, #"Position ID" = _t]),
#"Grouped Roles" = Table.Group(Source, {"Role"}, {{"Positions", each [Position ID], type {text}}})
in
#"Grouped Roles"
Query name = anything you want
let
#"Merge per Role" = Table.ExpandTableColumn(
Table.NestedJoin(ds1, "Role", ds2, "Role", "ds2", JoinKind.LeftOuter),
"ds2", {"Positions"}, {"Positions"}),
#"Magically Combine Names and Positions" = Table.CombineColumns(#"Merge per Role",
{"Names", "Positions"},
(x) => List.Zip({x{0}??{}, x{1}??{} }),
"Names & Positions"),
#"Expanded Names & Positions" = Table.ExpandListColumn(#"Magically Combine Names and Positions", "Names & Positions")[#"Names & Positions"],
#"Converted to Table" = Table.FromList(#"Expanded Names & Positions", (x) => x, {"Name", "Position ID"})
in
#"Converted to Table"
Best,
Spyros
Hi @Anonymous
How does PQ know what ID to assign to each Name?
In Data Set 2 there are roles that are associated with multipe ID's. How do we know that a has ID 1 and b has ID 2? Why isn't it the other way around or some other combination?
What's the logic in assigning 1 to a , 2 to b , 3 to c etc?
Maybe a fuller explanation would shed light on this?
It's not as simple as just adding an Index column to Data Set 1 then deleting the Role column?
let
Source = Table.FromRecords({[Name = "a", Role = "x"],[Name = "b", Role = "x"],[Name = "c", Role = "y"],[Name = "d", Role = "y"]}),
#"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type),
#"Removed Columns" = Table.RemoveColumns(#"Added Index",{"Role"})
in
#"Removed Columns"
Phil
If I answered your question please mark my post as the solution.
If my answer helped solve your problem, give it a kudos by clicking on the Thumbs Up.
Proud to be a Super User!
Thanks for getting back to me!
You are exactly right. There is no valid logic in the function I am performing.
I was hoping to find an answer which would perhaps exclude the Position ID if it had previously been included. However, I am not sure if this is an available function.
Thanks again!!!
What is the desired outcome with the example data? Show us
Thanks for getting back to me HotChilli!! I would like to get the following data:
Name | Position ID |
a | 1 |
b | 2 |
c | 3 |
d | 4 |
Each row in 'Name' should be linked to an appropriate, unique 'Position ID' via the type of 'Role'.
Does that make sense?
Let me know if you have any questions, or if you require a greater data set!
Thanks again!!!
You can add a sub index for role column in both tables and then do a merge using both columns. However, if you truly have the same number of rows in both tables, you can just use Table.FromColumns with the formula shown in the pic below to get the shown table. DS1 and DS2 and the names I gave to the two queries/tables.
Regards,
Pat
To learn more about Power BI, follow me on Twitter or subscribe on YouTube.
Hi @mahoneypat!!
Thanks for getting back to me 🙂
A sub-index is a great idea! However, my DS is quite large and it would be time consuming to add in a sub-index to specific each specific role and ensure they matched up correctly.
I was hoping to be able to implement logic which could exclude previoulsy used Position IDs.
Thanks again Pat you are a legend
peraphs to get a good specific solution you should provide more info about your data sets.
Some, but not all possible question about your datasets
How big is your data set?
The names have a predetermined order can they be associated with the first available id or is there some constraint?
Do the two tables have the same number of rows?
If the structure is the same as the examples shown, an immediate solution is to sort the two data sets primarily on the "role" fields and secondarily on the other field and then build a table with the two columns "name" and "id".
Hi @Anonymous
Thanks for getting back to me!
In answer to your questions:
How big is your data set? 100+
The names have a predetermined order can they be associated with the first available id or is there some constraint? No constraint, any 'Name' can be associated with any of the correct available 'Roles'
Do the two tables have the same number of rows? No
Unfortunately the structure isn't the same... 😞
Thanks again!!!!