Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Philip_Warriner
Regular Visitor

Splitting a repeating set of columns in a row into separate rows

My source data has a dynamic number of repeating sections within a single record. Each section may also have a varying number of fields but will always start and end with the same fields.  I need to be able to separate the repeating sections into their own records and also retain the header and footer fields from the original record.  The record header fields are A, B, C, and D, the repeating section are fields E, F, G, and somtimes Ga, and there is a footer field H.

I have included links to the Data CSV, PIBX, and a text file of the current query below.

 

The original data source is a two-column table containing the Field Names and Data, so I need to get from:

 

Philip_Warriner_0-1619777690040.png   

To

 

Philip_Warriner_1-1619777733414.png

* I have removed my current intrim soloution.  Some of the suggested soloutions below replicated this instead of the desired result above.

 

If I had a fixed number of sections in each the record then I would use the code in the PIBX at the links below to split it up, but I am not sure how to do it with an unknown number of repeating sections and a varying number of fields in each section.

 

Query Text File:  https://drive.google.com/file/d/1eurMyvL8nM7brUjs-3hqP6UmRr9VSoqY/view?usp=sharing

CSV data: https://drive.google.com/file/d/1990kevmTABgx5kMq3ybD_AKhANkq_rkJ/view?usp=sharing

PIBX:     https://drive.google.com/file/d/19WFDlQoQNFlzQXJE57E2FrWmjoq1P6fy/view?usp=sharing

 

2 ACCEPTED SOLUTIONS
Anonymous
Not applicable

Try this version: I have implemented, I believe, all conceivable generalizations based on your example. Perhaps they are not all necessary in concrete cases and therefore the code could be simplified. The problem as it is posed seems very complex. Perhaps you could act in the previous steps to try to better prepare the data so that it can be transferred later more easily.

 

View solution in original post

Anonymous
Not applicable

9 REPLIES 9
Anonymous
Not applicable

try this

Jakinta
Solution Sage
Solution Sage

Hooh...

Try this in blank query, might be dynamic solution.

I had huge trouble to order columns as desired.

 

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TZMxbsMwDEXv4jmDKUqiNLZNk97ByBCgHQp0a++PyrJBvu0beCKfbP9tW27fXz+fv8tluT7/nsvjsi0v40Fmeh0pzfQ2ks50HSnP9D5SOcDbHg/yvkcNIAdQAqgBWAAtgD7jx4jVpcylmkt1l5LVZ4qElkh4iSiYDKaAqWAMTAMTeuf4+dLUBc/hu+E5eypitEHRoGhQNCgaFI1zoGhQNCi2UOzxYVdXTOKKKfnopKGYNBSTKpgMpoCpYAxMAxOKKbtiKqFYQ9FCscXoDsUOxX4o3p/zIeNAwYGKAwamgelg8IuNB64W7j7/g/1euvq9NDqlKFW0SkNT0StFsRTNUlRL0S1FuRTtUtRLz37N9zOWrdzM1cLdvNfY8fgH", BinaryEncoding.Base64), Compression.Deflate)), {"Column1","Column2"}),
    Promoted = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    Index = Table.AddIndexColumn(Promoted, "Index", 0, 1, Int64.Type),
    RKey = Table.AddColumn(Index, "Record Key", each if [Fields] = "A" then [Index] else null),
    FilledD = Table.FillDown(RKey,{"Record Key"}),
    Grouped = Table.Group(FilledD, {"Fields", "Record Key"}, {{"GR", each _, type table [Fields=nullable text, Data=nullable text, Index=number, Record Key=number]}, {"Count", each Table.RowCount(_), Int64.Type}}),
    FieldIndex = Table.AddColumn(Grouped, "Custom", each if [Count]>1 then Table.AddIndexColumn([GR], "FieldsIndex", 1) else Table.AddIndexColumn([GR], "FieldsIndex", 0)),
    #"Expanded Custom" = Table.ExpandTableColumn(FieldIndex, "Custom", {"Data", "Index", "FieldsIndex"}, {"Data", "Index", "FieldsIndex"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"GR"}),
    #"Sorted Rows" = Table.Sort(#"Removed Columns",{{"Index", Order.Ascending}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Sorted Rows",{{"FieldsIndex", type text}, {"Index", Int64.Type}}),
    FieldNames = Table.AddColumn(#"Changed Type", "Fields.2", each if [FieldsIndex]="0" and Text.Length([Fields]) = 1 
then [Fields]
else if Text.Length([Fields]) = 2 
then Text.Combine({[Fields], #"Changed Type"{[Index]-1}[FieldsIndex]}) else  Text.Combine({[Fields], [FieldsIndex]})),
    Clean = Table.RemoveColumns(FieldNames,{"Fields", "Count", "Index", "FieldsIndex"}),
    Renamed = Table.RenameColumns(Clean,{{"Fields.2", "Fields"}}),
    Pivoted = Table.Pivot(Renamed, List.Distinct(Renamed[Fields]), "Fields", "Data"),
    ToTable = Table.FromList(List.RemoveFirstN(Table.ColumnNames(Pivoted),1), Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Sorted Rows1" = Table.Sort(#"ToTable",{{"Column1", Order.Ascending}}),
    #"Added Index" = Table.AddIndexColumn(#"Sorted Rows1", "Index", 0, 1, Int64.Type),
    #"Duplicated Column" = Table.DuplicateColumn(#"Added Index", "Column1", "Column1 - Copy"),
    #"Reordered Columns" = Table.ReorderColumns(#"Duplicated Column",{"Column1", "Column1 - Copy", "Index"}),
    Split = Table.SplitColumn(#"Reordered Columns", "Column1 - Copy", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"Column1 - Copy.1", "Column2"}),
    #"Replaced Value" = Table.ReplaceValue(Split,null,"0",Replacer.ReplaceValue,{"Column2"}),
    #"Grouped Rows" = Table.Group(#"Replaced Value", {"Column2"}, {{"Gr", each _, type table [Column1=text, Column2=nullable text, Index=number]}}),
    Removed = Table.SelectRows(#"Grouped Rows", each ([Column2] <> "0")),
    #"01" = Table.SelectColumns(Table.AddIndexColumn(Table.ExpandTableColumn(#"Removed Top Rows", "Gr", {"Column1", "Index"}, {"Column1", "Index"}),"Index1"),{"Column1", "Index1"}),
    #"02" = Table.AddIndexColumn(Table.RemoveColumns(Table.SelectRows(Split, each ([Column2] <> null)),{"Column1 - Copy.1", "Column2"}),"Index1"),
    MRG = Table.RemoveColumns(Table.NestedJoin(#"02", {"Index1"}, #"01", {"Index1"}, "01", JoinKind.LeftOuter), "Column1"),
    #"Expanded 01" = Table.ExpandTableColumn(MRG, "01", {"Column1"}, {"Column1.1"}),
    #"03" = Table.SelectColumns(#"Expanded 01",{"Column1.1", "Index"}),
    #"04" = Table.SelectColumns(Table.SelectRows(Split, each ([Column2] = null)),{"Column1","Index"}),
    #"05" = Table.RemoveColumns(Table.Sort( Table.CombineColumns(Table.Combine({#"04", #"03"}),{"Column1", "Column1.1"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Column1"),{{"Index", Order.Ascending}}),{"Index"}),
    ColNames = Table.AddColumn(#"05", "Names", each Text.Combine({Text.End([Column1], 1), Text.Reverse(Text.Middle(Text.Reverse([Column1]), 1))}), type text),
    #"06" = Table.ReorderColumns(Pivoted, List.Combine({List.FirstN(Table.ColumnNames(Pivoted),1), ColNames[Column1]})),
    #"07" = Table.ReplaceValue(#"06",null,"",Replacer.ReplaceValue, Table.ColumnNames(#"06")),
    FINAL = Table.RenameColumns( #"07", List.Zip({Table.ColumnNames(#"07"), List.Combine({List.FirstN(Table.ColumnNames(#"07"),1), ColNames[Names]})}))
in
    FINAL

 

 

 

 

Thanks for your post.  The output matches what I already have.  I have removed my intrim result as this may have caused some confusion over what I require.  I each "numbered" column represents a repeated section within a single record.  I need each repeated section to be its own record/row as shown above.

Anonymous
Not applicable

fai un tentativo con questo:

 

 

 

let
    Source = Csv.Document(File.Contents("C:\Users\xxxxxxxx\BI\Repeating Data.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Fields", type text}, {"Data", type text}}),
    #"Raggruppate righe" = Table.Group(#"Changed Type", {"Fields"}, {{"all", each Table.FromColumns(Table.Group(_,{"Fields"},{"al",each [Data]})[al],Table.Group(_,{"Fields"},{"al",each [Data]})[Fields])}},GroupKind.Local,(x,y)=>Number.From(y[Fields]="A")),
    #"Tabella all espansa" = Table.ExpandTableColumn(#"Raggruppate righe", "all", {"A", "B", "C", "D", "E", "F", "G", "H", "Ga"}, {"A", "B", "C", "D", "E", "F", "G", "H", "Ga"}),
    #"Ricopiato in basso" = Table.FillDown(#"Tabella all espansa",{"A", "B", "C", "D"})
in
    #"Ricopiato in basso"

 

Thanks for your post.  This is the closest soloution so far, but it has the same problems as I had when I first tried to split the repeating sections up.  The highlighted data point below is with the wrong record.  The "Ga" field is not in every repeating section, so the table grouping has put it in the first section, even though it belongs in the third.  This is why I used numbers as my test data, it makes it easier to check if the repeated sections have been kept together.

 

Philip_Warriner_0-1620032132374.png

 

 

Anonymous
Not applicable

Try this version: I have implemented, I believe, all conceivable generalizations based on your example. Perhaps they are not all necessary in concrete cases and therefore the code could be simplified. The problem as it is posed seems very complex. Perhaps you could act in the previous steps to try to better prepare the data so that it can be transferred later more easily.

 

Anonymous
Not applicable

Hi @Philip_Warriner 

You should better explain which logic to apply, in general terms since the solution is required to be general, to establish the position / selection of Ga with respect to the other fields.

 

Why, for example, must it be the third position and not the second or fourth?

Can this also happen for the other fields E, F G?

In this case, you could provide a more comprehensive example to test on.

Vera_33
Resident Rockstar
Resident Rockstar

Hi @Philip_Warriner 

 

I am not sure why Ga has Ga2 in your expected result...I've got a different one based on my understanding, it's a quick one, there must be better ways...use your previous steps until Filled Down, so paste from there

    #"Filled Down" = Table.FillDown(#"Added Conditional Column",{"Record Key"}),
    #"Grouped Rows" = Table.Group(#"Filled Down", {"Record Key"}, {{"allrows", each _, type table }}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Group([allrows],{"Fields"},{{"Count",  List.Count },{"all",  each _, type table }})),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"Fields", "Count", "all"}, {"Fields", "Count", "all"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded Custom", "Custom", (OT)=> 
[a=Table.AddIndexColumn(OT[all],"new",1,1),
b=Table.AddColumn(a,"newFields", each if OT[Count]=1 then OT[Fields] else Text.From([new])&[Fields])][b]),
    #"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Custom"}),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Custom", {"newFields","Data","Record Key"}),
    #"Grouped Rows1" = Table.Group(#"Expanded Custom1", {"Record Key"}, {{"allrows", each _, type table }}),
    #"Added Custom2" = Table.AddColumn(#"Grouped Rows1", "Custom", each [a = Table.RemoveColumns([allrows],"Record Key"),
b= Table.PromoteHeaders( Table.Transpose(a))][b]),
    Custom1 = Table.Combine(  #"Added Custom2"[Custom])
in
    Custom1

 

Vera_33_0-1619792253179.png

 

Thanks for your post.  This seems to replicate my intrim result,except the "Ga" fields have become detached from their original sections. The numbered column names represent the repeated section number the fields belonged to, 1Ga was in the first repeated section and 3Ga was in the third repeating section.  I need each repeating section to be its own record, as shown above.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors