Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Combining data sets with duplicate information (real tester!!)

Hi all!

 

I am trying to combine two data sets which contain duplicate information. Instead of getting my desired outcome of assigning a unique identifier, I am left with more duplicates. 

 

Data set 1

NameRole
ax
bx
cy
dy

 

Data set 2

RolePosition ID
x1
x2
y3
y4

 

I am trying to assign the name to a unique position ID.

 

Instead I just get the following:

NamePosition ID
a1
a1
b1
b1
c3
c3

 

 

Really hope you can help!!!

 

Thanks.

9 REPLIES 9
Smauro
Solution Sage
Solution Sage

@Anonymous, try that:

Query name = ds1

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUapQitWJVkqCs5KBrEowKwXOSgWyqsCsNDgrHc7KALLKwaxMICtRKTYWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t, Role = _t]),
    #"Grouped Roles" = Table.Group(Source, {"Role"}, {{"Names", each [Name], type {text}}})
in
    #"Grouped Roles"

Query name = ds2

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUdI1VIrVgTANwKwKIMsQzjICsyqBLGM4ywTMqgKyTOEsMzDLL79EIVGhKD8nVSEzTyGl2BAoY64UGwsA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Role = _t, #"Position ID" = _t]),
    #"Grouped Roles" = Table.Group(Source, {"Role"}, {{"Positions", each [Position ID], type {text}}})
in
    #"Grouped Roles"

Query name = anything you want

let
    #"Merge per Role" = Table.ExpandTableColumn(
        Table.NestedJoin(ds1, "Role", ds2, "Role", "ds2", JoinKind.LeftOuter),
        "ds2", {"Positions"}, {"Positions"}),
    
    #"Magically Combine Names and Positions" = Table.CombineColumns(#"Merge per Role",
        {"Names", "Positions"},
        (x) => List.Zip({x{0}??{}, x{1}??{} }),
        "Names & Positions"),
    #"Expanded Names & Positions" = Table.ExpandListColumn(#"Magically Combine Names and Positions", "Names & Positions")[#"Names & Positions"],
    #"Converted to Table" = Table.FromList(#"Expanded Names & Positions", (x) => x, {"Name", "Position ID"})
in
    #"Converted to Table"

 


Best,
Spyros




Feel free to connect with me:
LinkedIn

PhilipTreacy
Super User
Super User

Hi @Anonymous 

How does PQ know what ID to assign to each Name?

In Data Set 2 there are roles that are associated with multipe ID's.  How do we know that a has ID 1 and b has ID 2? Why isn't it the other way around or some other combination?

What's the logic in assigning 1 to a , 2 to b , 3 toetc?

Maybe a fuller explanation would shed light on this?

It's not as simple as just adding an Index column to Data Set 1 then deleting the Role column?

let
    Source = Table.FromRecords({[Name = "a", Role = "x"],[Name = "b", Role = "x"],[Name = "c", Role = "y"],[Name = "d", Role = "y"]}),
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1, Int64.Type),
    #"Removed Columns" = Table.RemoveColumns(#"Added Index",{"Role"})
in
    #"Removed Columns"

Phil


If I answered your question please mark my post as the solution.

If my answer helped solve your problem, give it a kudos by clicking on the Thumbs Up.



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


Anonymous
Not applicable

Hi @PhilipTreacy 

 

Thanks for getting back to me!

 

You are exactly right. There is no valid logic in the function I am performing. 

 

I was hoping to find an answer which would perhaps exclude the Position ID if it had previously been included. However, I am not sure if this is an available function. 

 

Thanks again!!!

HotChilli
Super User
Super User

What is the desired outcome with the example data?  Show us 

Anonymous
Not applicable

Thanks for getting back to me HotChilli!! I would like to get the following data:

NamePosition ID
a1
b2
c3
d4

 

Each row in 'Name' should be linked to an appropriate, unique 'Position ID' via the type of 'Role'. 

 

Does that make sense?

 

Let me know if you have any questions, or if you require a greater data set!

 

Thanks again!!!

You can add a sub index for role column in both tables and then do a merge using both columns.  However, if you truly have the same number of rows in both tables, you can just use Table.FromColumns with the formula shown in the pic below to get the shown table.  DS1 and DS2 and the names I gave to the two queries/tables.

 

mahoneypat_0-1603237657423.png

 

Regards,

Pat

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Anonymous
Not applicable

Hi @mahoneypat!!

 

Thanks for getting back to me 🙂 

 

A sub-index is a great idea! However, my DS is quite large and it would be time consuming to add in a sub-index to specific each specific role and ensure they matched up correctly.

 

I was hoping to be able to implement logic which could exclude previoulsy used Position IDs.

 

Thanks again Pat you are a legend

Anonymous
Not applicable

peraphs to get a good specific solution you should provide more info about your data sets.

 

Some, but not all possible question about your datasets
How big is your data set?
The names have a predetermined order can they be associated with the first available id or is there some constraint?

Do the two tables have the same number of rows?

 

 

If the structure is the same as the examples shown, an immediate solution is to sort the two data sets primarily on the "role" fields and secondarily on the other field and then build a table with the two columns "name" and "id".

 

 

 

image.png

 

image.png

 

 

 

 

image.png

Anonymous
Not applicable

Hi @Anonymous 

 

Thanks for getting back to me!

 

In answer to your questions: 

How big is your data set? 100+
The names have a predetermined order can they be associated with the first available id or is there some constraint? No constraint, any 'Name' can be associated with any of the correct available 'Roles'

Do the two tables have the same number of rows? No

 

 

Unfortunately the structure isn't the same... 😞

 

 

Thanks again!!!!

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors