Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Dion-NZ
Frequent Visitor

Help Mapping Dynamic Number of Columns with Different Names Together

Hi there,

 

Hopefully someone can help me with this because I'm running out of hacky ideas to transform a table I'm working with. I'll do my best to explain...

 

I have a table that contains a log of data posts - and I'm trying to split the parameters of the data post column up into more user-friendly data. The columns are generally like this:

  1. LogID
  2. DateSent
  3. DataPost

The Data Post column is in the format of a URI query string (KEY1=This&KEY2=That etc) in a single column with key and value pairs (some are NULL, like &KEY3=&KEY4=Something). The datapost basically contains a payload of data that has the details of a common parent item (the "Job" in my example below), followed by multiple child items related to that parent ("Record" in my example below). 

 

The query string starts with the parent data first, and these pairings only appear once - such as...

  1. JobID = 1
  2. OrganisationID = NZL

However, after that, it goes into a repetitive pattern of sequenced groups of data related to the child data, like...

  1. Record_ID_1=123
  2. Cost_1=100
  3. Record_ID_2=124
  4. Cost_2=150

So my example above would contain data for the one parent 'job' and two child "records", in the format...

JobID=1&OrganisationID=NZL&Record_ID_1=123&Cost_1=100&Record_ID_2=124&Cost_2=150 

... all of this in a single text column called 'DataPost'.

 

Another tricky thing is that I don't know how many child items will be in the query string...

 

Can anyone help me with somehow extract the records to a single separate related table, or any other ways of splitting out these child records and mapping them equally despite their order in the sequence?

 

In my mind, I'd have one table for the jobs, and another table for the records in a structure like...

  1. JobID
  2. Record_ID
  3. Cost

 

1 ACCEPTED SOLUTION

Hi @Dion-NZ 

 

No, it was based on the sample, I did not think about ID greater than 1 digit...so we need to adjust the steps split RECORD ID and digit a little bit

let
    Source = #"SYSTEMDataPostAudit (Source)",
        #"Filtered Rows" = Table.SelectRows(dbo_SYSTEMDataPostAudit, each ([IsRequest] = true)),
        #"Removed Irrelevant Columns" = Table.RemoveColumns(#"Filtered Rows",{"IsRequest", "SYSTEMDataPostActionId", "JobQueueItem", "SYSTEMDataPostAction"}),
        #"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Split([DataPost],"&")),
        #"Split Column by Delimiter" = Table.SplitColumn(#"Expanded Custom", "Custom", Splitter.SplitTextByDelimiter("="), {"Custom", "Custom2"}),
        #"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter", "index", each try Number.From(Text.Reverse( Text.BeforeDelimiter( Text.Reverse([Custom]),"_"))) otherwise 0),
        #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom1", each if [index] = 0 then [Custom] else Text.Reverse( Text.AfterDelimiter( Text.Reverse([Custom]),"_"))),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"DataPost", "Custom"}),
        JobTable = Table.SelectRows(#"Removed Columns", each ([index] = 0)),
        RecordTable = Table.SelectRows(#"Removed Columns", each ([index] <> 0)),
        PivotedTable = Table.Pivot(RecordTable, List.Distinct(RecordTable[Custom1]), "Custom1", "Custom2"),
        Custom1 = Table.Pivot(JobTable, List.Distinct(JobTable[Custom1]), "Custom1", "Custom2"),
        #"Removed Columns1" = Table.RemoveColumns(Custom1,{"index"}),
        #"Merged Queries" = Table.NestedJoin(#"Removed Columns1", {"LogID"}, PivotedTable, {"LogID"}, "Custom1", JoinKind.LeftOuter),
        #"Expanded Custom1" = Table.ExpandTableColumn(#"Merged Queries", "Custom1", List.RemoveItems( Table.ColumnNames( #"Merged Queries"[Custom1]{0}),{"LogID"}))
in
    #"Expanded Custom1"

 

You can try to work from here, and yes, it is VERY interesting journey, enjoy:)

View solution in original post

10 REPLIES 10
Vera_33
Resident Rockstar
Resident Rockstar

Hi @Dion-NZ 

 

You need to make the stucture like this

Vera_33_0-1627021947598.png

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8spP8nSxNYwpNTAwMvMvSk/MyyxOLMnMzwOK+kX5QMSDUpPzi1LiPV3iDW0NjYwhgs75xSUgvoEBuiIjoCITJEVAvqmBko6SoVJsLAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [DataPost = _t, LogID = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Split([DataPost],"&")),
    #"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Expanded Custom", "Custom", Splitter.SplitTextByDelimiter("=", QuoteStyle.Csv), {"Custom", "Custom2"})
in
    #"Split Column by Delimiter" 

 

or like this? (added LogID to differentiate)

Vera_33_1-1627022010468.png

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45W8spP8nSxNYwpNTAwMvMvSk/MyyxOLMnMzwOK+kX5QMSDUpPzi1LiPV3iDW0NjYwhgs75xSUgvoEBuiIjoCITJEVAvqmBko6SoVKsDn1tNFKKjQUA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [DataPost = _t, LogID = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Split([DataPost],"&")),
    originTable = Table.AddColumn(#"Added Custom", "JobID", each Text.AfterDelimiter( [Custom]{0},"=")),
    #"Expanded Custom" = Table.ExpandListColumn(originTable, "Custom"),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Expanded Custom", "Custom", Splitter.SplitTextByDelimiter("=", QuoteStyle.Csv), {"Custom", "Custom2"}),
    #"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter", "Custom1", each if Text.Contains([Custom],"Record_ID") then Text.BeforeDelimiter([Custom],"_",1)
else Text.BeforeDelimiter([Custom],"_")),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom1] = "Cost" or [Custom1] = "Record_ID")),
    #"Added Custom5" = Table.AddColumn(#"Filtered Rows", "index", each Text.End([Custom],1)),
    #"Removed Other Columns1" = Table.SelectColumns(#"Added Custom5",{"LogID", "Custom1", "index", "Custom2"}),
    PivotedTable = Table.Pivot(#"Removed Other Columns1", List.Distinct(#"Removed Other Columns1"[Custom1]), "Custom1", "Custom2"),
    #"Merged Queries" = Table.NestedJoin(originTable, {"LogID"}, PivotedTable, {"LogID"}, "PivotedTable", JoinKind.LeftOuter),
    #"Expanded PivotedTable" = Table.ExpandTableColumn(#"Merged Queries", "PivotedTable", {"Record_ID", "Cost"}, {"Record_ID", "Cost"})
in
    #"Expanded PivotedTable"

 

Hi @Vera_33 , thanks for your help!

 

I need a structure closer to the second example - like...

 

DataPostLogIDJobIDOrganisationIDRecord_IDCost
Etc11NZL123100
Etc11NZL124150
      

 

So we basically have a separate row for each record within a job (even though I don't know how many records will be associated with each job in the data post). In the example above, I could group two distinct records (with their own cost values) to the common JobID "1" (with the common OrganisationID "NZL").

Hi @Dion-NZ 

 

The second one is the output you want, or you mean you have more than one JobID in the datapost? Can you provide a sample?

Thanks, sorry for being unclear - I'll only one one job number in the datapost.

 

Here's a true datapost string - as you can see, they're a bit longer than my sample mockup... (and I made the job-level data bold, whereas the regular font is where the repetitive sequential record data starts - in this example, there were three records within one job).

 

ACTION=Add&CLIENT_NUMBER=TEST&AC=Y8&AD_ENTRY=EOL&FIRST_INSERTION=28%2f09%2f2016&COMPOSITE=N&COMPOSITE_CAPTION=&COMPOSITE_WIDTH=&COMPOSITE_DEPTH=&COMPOSITE_RATE=&COMPOSITE_RATE_CODE=&KEY_NUMBER=TEST&CREATIVE_NUMBER=&STATUS=P&PLAN_NUMBER_1=001223&RECORD_ID_1=&INSTRUCTION_NUMBER_1=&CAPTION_1=TESTCAPTION&ORDER_NUMBER_1=TESTORDER&PRODUCT_NUMBER_1=ZE&ADDITIONAL_DATA_A_1=&ADDITIONAL_DATA_B_1=&ADDITIONAL_DATA_C_1=&MEDIA_NUMBER_1=10015&POSITION_1=PND&INSERTION_DATE_1=28%2f09%2f2016&DATE_DESCRIPTION_1=&RATE_1=16.24&RATE_CODE_1=C&DEPTH_1=&WIDTH_1=2&SIZE_CODE_1=&COLOUR_CODE_1=&MEDIA_COM_1=20.00&REQD_COM_1=20.00&COMPANY_DISCOUNT_1=0.00&VACANCY_REF_1=&PLAN_NUMBER_2=001223&RECORD_ID_2=&INSTRUCTION_NUMBER_2=&CAPTION_2=TESTCAPTION&ORDER_NUMBER_2=TESTORDER&PRODUCT_NUMBER_2=ZE&ADDITIONAL_DATA_A_2=&ADDITIONAL_DATA_B_2=&ADDITIONAL_DATA_C_2=&MEDIA_NUMBER_2=10075&POSITION_2=PND&INSERTION_DATE_2=28%2f09%2f2016&DATE_DESCRIPTION_2=&RATE_2=4.76&RATE_CODE_2=C&DEPTH_2=&WIDTH_2=2&SIZE_CODE_2=&COLOUR_CODE_2=&MEDIA_COM_2=20.00&REQD_COM_2=20.00&COMPANY_DISCOUNT_2=0.00&VACANCY_REF_2=&PLAN_NUMBER_3=001223&RECORD_ID_3=&INSTRUCTION_NUMBER_3=&CAPTION_3=TESTCAPTION&ORDER_NUMBER_3=TESTORDER&PRODUCT_NUMBER_3=ZE&ADDITIONAL_DATA_A_3=&ADDITIONAL_DATA_B_3=&ADDITIONAL_DATA_C_3=&MEDIA_NUMBER_3=10134&POSITION_3=PND&INSERTION_DATE_3=28%2f09%2f2016&DATE_DESCRIPTION_3=&RATE_3=8.05&RATE_CODE_3=C&DEPTH_3=&WIDTH_3=2&SIZE_CODE_3=&COLOUR_CODE_3=&MEDIA_COM_3=20.00&REQD_COM_3=20.00&COMPANY_DISCOUNT_3=0.00&VACANCY_REF_3=

That's a lot🤣 you need to format all these? And from RECORD_ID, you need to "group" them within each ID?

Vera_33_0-1627277020511.png

 

Haha yes it is 😁 I want to sort of unpivot it so all the records (and their related data, such as COST, CAPTION, ORDER_NUMBER, etc) gets treated equally - so I'm imaging a separate related table of RECORD_ID, COST, ORDER NUMBER, etc... but that is tricky when even if I split using delimiters, the columns are technically named differently (_1, _2, etc... 😔

Hi @Dion-NZ 

 

Here is one attempt, you can work from here, the index should be the record counts

Vera_33_0-1627286509007.png

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jZPPbtwgEMbfZaXeqhWe2Wy2Bw4EiIrqxS7GiZw04hL1Dfr+BYPZ9R92c7HM9zEznpmf3993jFvVaMo+P//8IwSOvFZSW6f785M01MrORp1xOpzSq3D+ihmobOqoPCvTWad0J82YDU7f4C/54R9AqmNK3JzbplNWUr0QHGftGLbUX5WwP1eqkO2GapjPvCU63ojJ+SWHdWfcSGbVi5ycqHaW2b6jbTy1NdPJdxUlpALA6BjJGyOcEl6Pih+DNf041ktIKhUb9edQPp2i5ZP4i/l+8Ecp1TeN8Ckv/pucdiFUSMJqJ5hljuVaS+ep6PDsnKVQ7FKk8o0+pA8IA42f3mqRG437Dmmkd7bWPlpCdtyoqfc0uBhTHfdwuFLCtrzMU3TYdQ4ZeQh10orUW74/bb5uejPXYk+eiRBI9oRMe/st1mpAh+nBCdXxpvf/gV92Nl8YZ5oPzsjnnP0aDCiAAWUwYAEG3AED7oABN8CAIhglh2dnBgYEMB6XYEARDPgaGHANBtDD/vG45ALmXMCMC1hzARtcwIoL2OQCbnIBBS5ggwsscIFlLnDBBd7hAu9wgTe4wCIXJYdnZ8YFei4qPCy4wCIX+DUu8JoLpKc9eVhygXMucMYFrrnADS5wxQVucoE3ucACF0h333fV7uPjPw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [DataPost = _t, LogID = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Split([DataPost],"&")),
    #"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Expanded Custom", "Custom", Splitter.SplitTextByDelimiter("=", QuoteStyle.Csv), {"Custom", "Custom2"}),
    #"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter", "index", each try Number.From( Text.End([Custom],1)) otherwise 0),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom1", each if [index] = 0 then [Custom] else Text.Middle( [Custom],0,Text.Length([Custom])-2)),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"DataPost", "Custom"}),
    JobTable = Table.SelectRows(#"Removed Columns", each ([index] = 0)),
    RecordTable = Table.SelectRows(#"Removed Columns", each ([index] <> 0)),
    PivotedTable = Table.Pivot(RecordTable, List.Distinct(RecordTable[Custom1]), "Custom1", "Custom2"),
    Custom1 = Table.Pivot(JobTable, List.Distinct(JobTable[Custom1]), "Custom1", "Custom2"),
    #"Removed Columns1" = Table.RemoveColumns(Custom1,{"index"}),
    #"Merged Queries" = Table.NestedJoin(#"Removed Columns1", {"LogID"}, PivotedTable, {"LogID"}, "Custom1", JoinKind.LeftOuter),
    #"Expanded Custom1" = Table.ExpandTableColumn(#"Merged Queries", "Custom1", List.RemoveItems( Table.ColumnNames( #"Merged Queries"[Custom1]{0}),{"LogID"}))
in
    #"Expanded Custom1"

 

Thanks again for all your help on this! I'm almost there using your Power Query (and my Power Query knowledge is extremely basic, so it's been an itneresting journey walking through it step by step!), but I'm getting an error...

 

Expression.Error: The 'offset' argument is out of range.

Details:

0

 

And at one point it had a Go to error link that seems to point at the PivotedTable step as the culprit. I've tried reading up on it, but I can't find an 'offset' argument in these statements..?

 

I'll post my Power Query below - please note that my original source is actually from an Azure SQL Server database, so for now I'm referencing that original untouched query as "SYSTEMDataPostAudit (Source)".

 

let
    Source = #"SYSTEMDataPostAudit (Source)",
        #"Filtered Rows" = Table.SelectRows(dbo_SYSTEMDataPostAudit, each ([IsRequest] = true)),
        #"Removed Irrelevant Columns" = Table.RemoveColumns(#"Filtered Rows",{"IsRequest", "SYSTEMDataPostActionId", "JobQueueItem", "SYSTEMDataPostAction"}),
        #"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Split([DataPost],"&")),
        #"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
        #"Split Column by Delimiter" = Table.SplitColumn(#"Expanded Custom", "Custom", Splitter.SplitTextByDelimiter("="), {"Custom", "Custom2"}),
        #"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter", "index", each try Number.From( Text.End([Custom],1)) otherwise 0),
        #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom1", each if [index] = 0 then [Custom] else Text.Middle( [Custom],0,Text.Length([Custom])-2)),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"DataPost", "Custom"}),
        JobTable = Table.SelectRows(#"Removed Columns", each ([index] = 0)),
        RecordTable = Table.SelectRows(#"Removed Columns", each ([index] <> 0)),
        PivotedTable = Table.Pivot(RecordTable, List.Distinct(RecordTable[Custom1]), "Custom1", "Custom2"),
        Custom1 = Table.Pivot(JobTable, List.Distinct(JobTable[Custom1]), "Custom1", "Custom2"),
        #"Removed Columns1" = Table.RemoveColumns(Custom1,{"index"}),
        #"Merged Queries" = Table.NestedJoin(#"Removed Columns1", {"LogID"}, PivotedTable, {"LogID"}, "Custom1", JoinKind.LeftOuter),
        #"Expanded Custom1" = Table.ExpandTableColumn(#"Merged Queries", "Custom1", List.RemoveItems( Table.ColumnNames( #"Merged Queries"[Custom1]{0}),{"LogID"}))
in
    #"Expanded Custom1"

 

 

Oh, and one more question - the statement being used for truncating the record sequence off the end (RECORD_ID_1 becomes RECORD_ID, COST_1 becomes COST, etc) ... will that also cater for when that sequence has more than one digit? I had a quick look in the database, and found that we definitely have job's with 10+ associated records, which would mean RECORD_ID_23, COST_23, etc...

Hi @Dion-NZ 

 

No, it was based on the sample, I did not think about ID greater than 1 digit...so we need to adjust the steps split RECORD ID and digit a little bit

let
    Source = #"SYSTEMDataPostAudit (Source)",
        #"Filtered Rows" = Table.SelectRows(dbo_SYSTEMDataPostAudit, each ([IsRequest] = true)),
        #"Removed Irrelevant Columns" = Table.RemoveColumns(#"Filtered Rows",{"IsRequest", "SYSTEMDataPostActionId", "JobQueueItem", "SYSTEMDataPostAction"}),
        #"Added Custom" = Table.AddColumn(Source, "Custom", each Text.Split([DataPost],"&")),
        #"Split Column by Delimiter" = Table.SplitColumn(#"Expanded Custom", "Custom", Splitter.SplitTextByDelimiter("="), {"Custom", "Custom2"}),
        #"Added Custom1" = Table.AddColumn(#"Split Column by Delimiter", "index", each try Number.From(Text.Reverse( Text.BeforeDelimiter( Text.Reverse([Custom]),"_"))) otherwise 0),
        #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom1", each if [index] = 0 then [Custom] else Text.Reverse( Text.AfterDelimiter( Text.Reverse([Custom]),"_"))),
        #"Removed Columns" = Table.RemoveColumns(#"Added Custom2",{"DataPost", "Custom"}),
        JobTable = Table.SelectRows(#"Removed Columns", each ([index] = 0)),
        RecordTable = Table.SelectRows(#"Removed Columns", each ([index] <> 0)),
        PivotedTable = Table.Pivot(RecordTable, List.Distinct(RecordTable[Custom1]), "Custom1", "Custom2"),
        Custom1 = Table.Pivot(JobTable, List.Distinct(JobTable[Custom1]), "Custom1", "Custom2"),
        #"Removed Columns1" = Table.RemoveColumns(Custom1,{"index"}),
        #"Merged Queries" = Table.NestedJoin(#"Removed Columns1", {"LogID"}, PivotedTable, {"LogID"}, "Custom1", JoinKind.LeftOuter),
        #"Expanded Custom1" = Table.ExpandTableColumn(#"Merged Queries", "Custom1", List.RemoveItems( Table.ColumnNames( #"Merged Queries"[Custom1]{0}),{"LogID"}))
in
    #"Expanded Custom1"

 

You can try to work from here, and yes, it is VERY interesting journey, enjoy:)

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.