Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply

finding superset and subset

Hi team
i am having table called "PACKAGE" which is having two coumns as below

rajeshapunu1234_0-1709348409479.png

now wanted to find superset package and supset package from the table
note: the above is sample data the package is more the 700 and need to find the superset and subset package from all the 700 

can someone please help me out to get it done?

1 ACCEPTED SOLUTION

Hello again @rajeshapunu1234, with @jgeddes help you can find new code here:

 

You can decide whether you want to include supersets with 0 subsets or not:

dufoq3_0-1709905607909.png

 

Now this query takes around a minute on my PC with whole dataset (4806 rows) and finishes with 718 rows and 303 subset columns

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkhMzk5MT1UwVNJRCkktLgEyYnWwCBthFzZGETbCbogRdkOMCBhigiJsChM2xS5shl0Y1SUm2A0xJiCMarYZdtVmBFQbYhcG+jIWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [p = _t, t = _t]),
    // use "yes" or "no"
    #"IncludeSupersetsWithoutSubets?" = "yes",
    StepBack = Source
,
    RenamedColumns = Table.RenameColumns(StepBack,{{"p", "Superset"}, {"t", "Sub"}}),
    GroupedRows = Table.Group(RenamedColumns, {"Superset"}, {{"Sub", each [Sub], type list}}),
    Ad_PrevTable = Table.AddColumn(GroupedRows, "PrevTable", each GroupedRows, type table),
    ExpandedGroupedRows = Table.ExpandTableColumn(Ad_PrevTable, "PrevTable", {"Superset", "Sub"}, {"Superset2", "Sub2"}),
    Ad_SubCheck = Table.AddColumn(ExpandedGroupedRows, "Sub Check", each 
       if [Superset] = [Superset2] then true
       //else if not List.Contains([Sub], [Sub2]{0}) then false
       else if List.ContainsAll([Sub], [Sub2]) then true else false
     , type logical),
    FilteredRows = Table.SelectRows(Ad_SubCheck, each ([Sub Check] = true)),
    Ad_Order = Table.AddColumn(FilteredRows, "Order", each List.Count([Sub2]), Int64.Type),
    RemovedOtherColumns = Table.SelectColumns(Ad_Order,{"Superset", "Superset2", "Order"}),
    GroupedRows2 = Table.Group(RemovedOtherColumns, {"Superset"}, {{"Subsets", each 
       Table.Transpose(Table.RemoveFirstN(Table.SelectColumns(Table.Sort(_, {{"Order", Order.Descending}}), {"Superset2"}), 1))
 , type table}}),
    Ad_ColCount = Table.AddColumn(GroupedRows2, "ColCount", each Table.ColumnCount([Subsets]), Int64.Type),
    // Based on "IncludeSupersetsWithoutSubets?" parameter
    FilteredColCountParameter = Table.SelectRows(Ad_ColCount, each if Text.Trim(Text.Lower(#"IncludeSupersetsWithoutSubets?")) = "yes" then [ColCount] <> -1 else [ColCount] <> 0),
    ColNames = [ a = List.Max(FilteredColCountParameter[ColCount]), 
    b = List.Buffer(List.Generate(
            () => 1,
            each _ <= a,
            each _ +1,
            each { "Column" & Text.From(_), "Subset" & Text.From(_) } ))
  ][b],
    StepBack2 = FilteredColCountParameter,
    RemovedColumns = Table.RemoveColumns(StepBack2,{"ColCount"}),
    ExpandedSubsets = Table.ExpandTableColumn(RemovedColumns, "Subsets", List.Transform(ColNames, each _{0}), List.Transform(ColNames, each _{1}))
in
    ExpandedSubsets

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

View solution in original post

11 REPLIES 11

Sir actually my need to to identify the superset and supset packages example

rajeshapunu1234_1-1709370073373.png

 

from above table i want to have list of superset pakcage in first column its subset package in second column and if any subset package is coming from the another subset should come in the third column with the same package row as follow

rajeshapunu1234_2-1709370111453.png

can u please help me to acheive it?

 



 

Hi @rajeshapunu1234,

for future requests: provide sample data as table so we can copy/paste and don't forget to provide also expected result please.

 

Result

dufoq3_0-1709385797357.png

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkhMzk5MT1UwVNJRCkktLgEyYnWwCBthFzZGETbCbogRdkOMCBhigiJsDBM2xS5shiJsgl21CQHVqO42RTIkFgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [p = _t, t = _t]),
    GroupedRows = Table.Group(Source, {"p"}, {{"t1", each [t], type list}}),
    Ad_Subset = Table.AddColumn(GroupedRows, "Subset", each List.RemoveItems(List.RemoveNulls(List.Accumulate(
     { 0..Table.RowCount(GroupedRows) -1 },
     {},
     (s,c)=> s & { if List.ContainsAll([t1], GroupedRows{c}[t1]) then GroupedRows{c}[p] else null }
     )), {[p]}), type list),
    FilterSupersets = Table.SelectRows(Ad_Subset, each not List.Contains(List.Combine(Ad_Subset[Subset]), [p])),
    Ad_FinalTable = Table.AddColumn(FilterSupersets, "FinalTable", each List.Accumulate( 
     List.Zip({{0..List.Count([Subset]) -1}, [Subset]}),
     #table(type table[Superset = text], {{[p]}}),
     (s,c)=> Table.AddColumn(s, "Subset" & Text.From(c{0} +1), each c{1}, type text)), type table),
    FinalTable = Table.Combine(Ad_FinalTable[FinalTable])
in
    FinalTable

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Hi it is working for small set of data but not working for the huge data
like i am having 700 packages with above 4000 test

I used same column names but with your code it is taking time but not giving result

 

and i dont know how to attach excel here can you please guide me to attach so that you can help me out

You can upload your file to google drive or one drive and share link with us. Don't for get to grant public permissions.


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Hi @rajeshapunu1234, I tried my best.

 

  • If you try code below you will see that for first 1000 rows it finishes in a few seconds wit 14 subset columns
  • With first 2000 rows, it takes around 30seconds and 47 subset columns.
  • With first 3000 rows, it takes 1min 50sec and 77 subset columns
  • I tried to run my query with all 4806 rows, it takes 9minutes and 169 subset columns (327 rows).
  • You need to understand that the more lines there are, the number of combinations grows exponentially because we have to compare every single row with all other rows. To be honest I'm not sure if there is such way to speed this up. Maybe someone else could help.

You can download the result here.

 

 

Old code removed

 

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Hello again @rajeshapunu1234, with @jgeddes help you can find new code here:

 

You can decide whether you want to include supersets with 0 subsets or not:

dufoq3_0-1709905607909.png

 

Now this query takes around a minute on my PC with whole dataset (4806 rows) and finishes with 718 rows and 303 subset columns

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCkhMzk5MT1UwVNJRCkktLgEyYnWwCBthFzZGETbCbogRdkOMCBhigiJsChM2xS5shl0Y1SUm2A0xJiCMarYZdtVmBFQbYhcG+jIWAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [p = _t, t = _t]),
    // use "yes" or "no"
    #"IncludeSupersetsWithoutSubets?" = "yes",
    StepBack = Source
,
    RenamedColumns = Table.RenameColumns(StepBack,{{"p", "Superset"}, {"t", "Sub"}}),
    GroupedRows = Table.Group(RenamedColumns, {"Superset"}, {{"Sub", each [Sub], type list}}),
    Ad_PrevTable = Table.AddColumn(GroupedRows, "PrevTable", each GroupedRows, type table),
    ExpandedGroupedRows = Table.ExpandTableColumn(Ad_PrevTable, "PrevTable", {"Superset", "Sub"}, {"Superset2", "Sub2"}),
    Ad_SubCheck = Table.AddColumn(ExpandedGroupedRows, "Sub Check", each 
       if [Superset] = [Superset2] then true
       //else if not List.Contains([Sub], [Sub2]{0}) then false
       else if List.ContainsAll([Sub], [Sub2]) then true else false
     , type logical),
    FilteredRows = Table.SelectRows(Ad_SubCheck, each ([Sub Check] = true)),
    Ad_Order = Table.AddColumn(FilteredRows, "Order", each List.Count([Sub2]), Int64.Type),
    RemovedOtherColumns = Table.SelectColumns(Ad_Order,{"Superset", "Superset2", "Order"}),
    GroupedRows2 = Table.Group(RemovedOtherColumns, {"Superset"}, {{"Subsets", each 
       Table.Transpose(Table.RemoveFirstN(Table.SelectColumns(Table.Sort(_, {{"Order", Order.Descending}}), {"Superset2"}), 1))
 , type table}}),
    Ad_ColCount = Table.AddColumn(GroupedRows2, "ColCount", each Table.ColumnCount([Subsets]), Int64.Type),
    // Based on "IncludeSupersetsWithoutSubets?" parameter
    FilteredColCountParameter = Table.SelectRows(Ad_ColCount, each if Text.Trim(Text.Lower(#"IncludeSupersetsWithoutSubets?")) = "yes" then [ColCount] <> -1 else [ColCount] <> 0),
    ColNames = [ a = List.Max(FilteredColCountParameter[ColCount]), 
    b = List.Buffer(List.Generate(
            () => 1,
            each _ <= a,
            each _ +1,
            each { "Column" & Text.From(_), "Subset" & Text.From(_) } ))
  ][b],
    StepBack2 = FilteredColCountParameter,
    RemovedColumns = Table.RemoveColumns(StepBack2,{"ColCount"}),
    ExpandedSubsets = Table.ExpandTableColumn(RemovedColumns, "Subsets", List.Transform(ColNames, each _{0}), List.Transform(ColNames, each _{1}))
in
    ExpandedSubsets

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Vijay_A_Verma
Super User
Super User

If I understand correctly, a simple group operation should meet your needs. See the code below

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKjBU0lEqMVSK1QGyjUBsIwjbGIlthK7GGMI2QRI3hKqPBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [p = _t, t = _t]),
    #"Grouped Rows" = Table.Group(Source, {"p"}, {{"t", each Text.Combine([t], ", ")}})
in
    #"Grouped Rows"

where can i have another column which mention which package is superset and which package is subest of the superset with identification?

 

in the mean time want to identify the unique package as well
can you please support here in both need?

column p is superset which is unique and column t is subset. If you need, you can rename the columns. Use the revised code where I have sorted and prepared unique list of subsets.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WKjBU0lEqMVSK1QGyjUBsIwjbGIlthEONCZK4IVQ8FgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [p = _t, t = _t]),
    #"Grouped Rows" = Table.Group(Source, {"p"}, {{"Subset", each Text.Combine(List.Sort(List.Distinct([t])), ", ")}})
in
    #"Grouped Rows"

 

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors