Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
My source data has a dynamic number of repeating sections within a single record. Each section may also have a varying number of fields but will always start and end with the same fields. I need to be able to separate the repeating sections into their own records and also retain the header and footer fields from the original record. The record header fields are A, B, C, and D, the repeating section are fields E, F, G, and somtimes Ga, and there is a footer field H.
I have included links to the Data CSV, PIBX, and a text file of the current query below.
The original data source is a two-column table containing the Field Names and Data, so I need to get from:
To
* I have removed my current intrim soloution. Some of the suggested soloutions below replicated this instead of the desired result above.
If I had a fixed number of sections in each the record then I would use the code in the PIBX at the links below to split it up, but I am not sure how to do it with an unknown number of repeating sections and a varying number of fields in each section.
Query Text File: https://drive.google.com/file/d/1eurMyvL8nM7brUjs-3hqP6UmRr9VSoqY/view?usp=sharing
CSV data: https://drive.google.com/file/d/1990kevmTABgx5kMq3ybD_AKhANkq_rkJ/view?usp=sharing
PIBX: https://drive.google.com/file/d/19WFDlQoQNFlzQXJE57E2FrWmjoq1P6fy/view?usp=sharing
Solved! Go to Solution.
Try this version: I have implemented, I believe, all conceivable generalizations based on your example. Perhaps they are not all necessary in concrete cases and therefore the code could be simplified. The problem as it is posed seems very complex. Perhaps you could act in the previous steps to try to better prepare the data so that it can be transferred later more easily.
Hooh...
Try this in blank query, might be dynamic solution.
I had huge trouble to order columns as desired.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TZMxbsMwDEXv4jmDKUqiNLZNk97ByBCgHQp0a++PyrJBvu0beCKfbP9tW27fXz+fv8tluT7/nsvjsi0v40Fmeh0pzfQ2ks50HSnP9D5SOcDbHg/yvkcNIAdQAqgBWAAtgD7jx4jVpcylmkt1l5LVZ4qElkh4iSiYDKaAqWAMTAMTeuf4+dLUBc/hu+E5eypitEHRoGhQNCgaFI1zoGhQNCi2UOzxYVdXTOKKKfnopKGYNBSTKpgMpoCpYAxMAxOKKbtiKqFYQ9FCscXoDsUOxX4o3p/zIeNAwYGKAwamgelg8IuNB64W7j7/g/1euvq9NDqlKFW0SkNT0StFsRTNUlRL0S1FuRTtUtRLz37N9zOWrdzM1cLdvNfY8fgH", BinaryEncoding.Base64), Compression.Deflate)), {"Column1","Column2"}),
Promoted = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
Index = Table.AddIndexColumn(Promoted, "Index", 0, 1, Int64.Type),
RKey = Table.AddColumn(Index, "Record Key", each if [Fields] = "A" then [Index] else null),
FilledD = Table.FillDown(RKey,{"Record Key"}),
Grouped = Table.Group(FilledD, {"Fields", "Record Key"}, {{"GR", each _, type table [Fields=nullable text, Data=nullable text, Index=number, Record Key=number]}, {"Count", each Table.RowCount(_), Int64.Type}}),
FieldIndex = Table.AddColumn(Grouped, "Custom", each if [Count]>1 then Table.AddIndexColumn([GR], "FieldsIndex", 1) else Table.AddIndexColumn([GR], "FieldsIndex", 0)),
#"Expanded Custom" = Table.ExpandTableColumn(FieldIndex, "Custom", {"Data", "Index", "FieldsIndex"}, {"Data", "Index", "FieldsIndex"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Custom",{"GR"}),
#"Sorted Rows" = Table.Sort(#"Removed Columns",{{"Index", Order.Ascending}}),
#"Changed Type" = Table.TransformColumnTypes(#"Sorted Rows",{{"FieldsIndex", type text}, {"Index", Int64.Type}}),
FieldNames = Table.AddColumn(#"Changed Type", "Fields.2", each if [FieldsIndex]="0" and Text.Length([Fields]) = 1
then [Fields]
else if Text.Length([Fields]) = 2
then Text.Combine({[Fields], #"Changed Type"{[Index]-1}[FieldsIndex]}) else Text.Combine({[Fields], [FieldsIndex]})),
Clean = Table.RemoveColumns(FieldNames,{"Fields", "Count", "Index", "FieldsIndex"}),
Renamed = Table.RenameColumns(Clean,{{"Fields.2", "Fields"}}),
Pivoted = Table.Pivot(Renamed, List.Distinct(Renamed[Fields]), "Fields", "Data"),
ToTable = Table.FromList(List.RemoveFirstN(Table.ColumnNames(Pivoted),1), Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Sorted Rows1" = Table.Sort(#"ToTable",{{"Column1", Order.Ascending}}),
#"Added Index" = Table.AddIndexColumn(#"Sorted Rows1", "Index", 0, 1, Int64.Type),
#"Duplicated Column" = Table.DuplicateColumn(#"Added Index", "Column1", "Column1 - Copy"),
#"Reordered Columns" = Table.ReorderColumns(#"Duplicated Column",{"Column1", "Column1 - Copy", "Index"}),
Split = Table.SplitColumn(#"Reordered Columns", "Column1 - Copy", Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), {"Column1 - Copy.1", "Column2"}),
#"Replaced Value" = Table.ReplaceValue(Split,null,"0",Replacer.ReplaceValue,{"Column2"}),
#"Grouped Rows" = Table.Group(#"Replaced Value", {"Column2"}, {{"Gr", each _, type table [Column1=text, Column2=nullable text, Index=number]}}),
Removed = Table.SelectRows(#"Grouped Rows", each ([Column2] <> "0")),
#"01" = Table.SelectColumns(Table.AddIndexColumn(Table.ExpandTableColumn(#"Removed Top Rows", "Gr", {"Column1", "Index"}, {"Column1", "Index"}),"Index1"),{"Column1", "Index1"}),
#"02" = Table.AddIndexColumn(Table.RemoveColumns(Table.SelectRows(Split, each ([Column2] <> null)),{"Column1 - Copy.1", "Column2"}),"Index1"),
MRG = Table.RemoveColumns(Table.NestedJoin(#"02", {"Index1"}, #"01", {"Index1"}, "01", JoinKind.LeftOuter), "Column1"),
#"Expanded 01" = Table.ExpandTableColumn(MRG, "01", {"Column1"}, {"Column1.1"}),
#"03" = Table.SelectColumns(#"Expanded 01",{"Column1.1", "Index"}),
#"04" = Table.SelectColumns(Table.SelectRows(Split, each ([Column2] = null)),{"Column1","Index"}),
#"05" = Table.RemoveColumns(Table.Sort( Table.CombineColumns(Table.Combine({#"04", #"03"}),{"Column1", "Column1.1"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Column1"),{{"Index", Order.Ascending}}),{"Index"}),
ColNames = Table.AddColumn(#"05", "Names", each Text.Combine({Text.End([Column1], 1), Text.Reverse(Text.Middle(Text.Reverse([Column1]), 1))}), type text),
#"06" = Table.ReorderColumns(Pivoted, List.Combine({List.FirstN(Table.ColumnNames(Pivoted),1), ColNames[Column1]})),
#"07" = Table.ReplaceValue(#"06",null,"",Replacer.ReplaceValue, Table.ColumnNames(#"06")),
FINAL = Table.RenameColumns( #"07", List.Zip({Table.ColumnNames(#"07"), List.Combine({List.FirstN(Table.ColumnNames(#"07"),1), ColNames[Names]})}))
in
FINAL
Thanks for your post. The output matches what I already have. I have removed my intrim result as this may have caused some confusion over what I require. I each "numbered" column represents a repeated section within a single record. I need each repeated section to be its own record/row as shown above.
fai un tentativo con questo:
let
Source = Csv.Document(File.Contents("C:\Users\xxxxxxxx\BI\Repeating Data.csv"),[Delimiter=",", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Fields", type text}, {"Data", type text}}),
#"Raggruppate righe" = Table.Group(#"Changed Type", {"Fields"}, {{"all", each Table.FromColumns(Table.Group(_,{"Fields"},{"al",each [Data]})[al],Table.Group(_,{"Fields"},{"al",each [Data]})[Fields])}},GroupKind.Local,(x,y)=>Number.From(y[Fields]="A")),
#"Tabella all espansa" = Table.ExpandTableColumn(#"Raggruppate righe", "all", {"A", "B", "C", "D", "E", "F", "G", "H", "Ga"}, {"A", "B", "C", "D", "E", "F", "G", "H", "Ga"}),
#"Ricopiato in basso" = Table.FillDown(#"Tabella all espansa",{"A", "B", "C", "D"})
in
#"Ricopiato in basso"
Thanks for your post. This is the closest soloution so far, but it has the same problems as I had when I first tried to split the repeating sections up. The highlighted data point below is with the wrong record. The "Ga" field is not in every repeating section, so the table grouping has put it in the first section, even though it belongs in the third. This is why I used numbers as my test data, it makes it easier to check if the repeated sections have been kept together.
Try this version: I have implemented, I believe, all conceivable generalizations based on your example. Perhaps they are not all necessary in concrete cases and therefore the code could be simplified. The problem as it is posed seems very complex. Perhaps you could act in the previous steps to try to better prepare the data so that it can be transferred later more easily.
You should better explain which logic to apply, in general terms since the solution is required to be general, to establish the position / selection of Ga with respect to the other fields.
Why, for example, must it be the third position and not the second or fourth?
Can this also happen for the other fields E, F G?
In this case, you could provide a more comprehensive example to test on.
I am not sure why Ga has Ga2 in your expected result...I've got a different one based on my understanding, it's a quick one, there must be better ways...use your previous steps until Filled Down, so paste from there
#"Filled Down" = Table.FillDown(#"Added Conditional Column",{"Record Key"}),
#"Grouped Rows" = Table.Group(#"Filled Down", {"Record Key"}, {{"allrows", each _, type table }}),
#"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom", each Table.Group([allrows],{"Fields"},{{"Count", List.Count },{"all", each _, type table }})),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom",{"Custom"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Other Columns", "Custom", {"Fields", "Count", "all"}, {"Fields", "Count", "all"}),
#"Added Custom1" = Table.AddColumn(#"Expanded Custom", "Custom", (OT)=>
[a=Table.AddIndexColumn(OT[all],"new",1,1),
b=Table.AddColumn(a,"newFields", each if OT[Count]=1 then OT[Fields] else Text.From([new])&[Fields])][b]),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Custom1",{"Custom"}),
#"Expanded Custom1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Custom", {"newFields","Data","Record Key"}),
#"Grouped Rows1" = Table.Group(#"Expanded Custom1", {"Record Key"}, {{"allrows", each _, type table }}),
#"Added Custom2" = Table.AddColumn(#"Grouped Rows1", "Custom", each [a = Table.RemoveColumns([allrows],"Record Key"),
b= Table.PromoteHeaders( Table.Transpose(a))][b]),
Custom1 = Table.Combine( #"Added Custom2"[Custom])
in
Custom1
Thanks for your post. This seems to replicate my intrim result,except the "Ga" fields have become detached from their original sections. The numbered column names represent the repeated section number the fields belonged to, 1Ga was in the first repeated section and 3Ga was in the third repeating section. I need each repeating section to be its own record, as shown above.