Solved: Re: Split text based on every second delimiter

PhilC · ‎12-04-2019

Hi, looking for assistance with manipulating columns in Power Query.

I need to split a column based on a Comma delimiter, however I need to only split every second one. The data is formatted Surname, Initials with a Comma delimiter between people. There are multiple people in each row (can be between 1 and 1,000 values in each row). I am not looking to split into columns as part of this process.

I want to extract into a list, but have each Surname, Initials combination, not Surname then Initials on the next row.

Example Data

Person

Chapman, D.A., Baker, R., Thomas, B.L.

Davis, Q., Grant, M.R.

Desired Result

Person

Chapman, D.A.

Baker, R.

Thomas, B.L.

Grant, M.R.

Davis, Q.

Have been trying the code from here: https://community.powerbi.com/t5/Desktop/Text-Replace-specific-character-only-if-it-occurs-as-the-la...

Table.TransformColumns(#"Changed Type",{{"Column1", each Text.Split(_," ")}})

Hope that explains what I am after.

Cheers

Phil

az38 · ‎12-04-2019

Hi @PhilC

try split by ".,"

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs5ILMhNzNNRcNFz1NNRcErMTi3SUQgCMkMy8nMTi4FCej56SrE60UouiWWZQH4gUM69KDGvREfBVy8IKBULAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Person = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Person", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Person", Splitter.SplitTextByDelimiter(".,", QuoteStyle.Csv), {"Person.1", "Person.2", "Person.3"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Person.1", type text}, {"Person.2", type text}, {"Person.3", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type1", "Index", 0, 1),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
    #"Trimmed Text" = Table.TransformColumns(#"Unpivoted Other Columns",{{"Value", Text.Trim, type text}}),
    #"Added Custom" = Table.AddColumn(#"Trimmed Text", "Custom", each [Value]&"."),
    #"Replaced Value" = Table.ReplaceValue(#"Added Custom","..",".",Replacer.ReplaceText,{"Custom"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Replaced Value",{"Custom"})
in
    #"Removed Other Columns"

do not hesitate to give a kudo to useful posts and mark solutions as solution
Linkedin

do not hesitate to give a kudo to useful posts and mark solutions as solution
LinkedIn

View solution in original post

Jimmy801 · ‎12-04-2019

Hello @PhilC

had created once a nice function that can be used here. Soltuion by @az38 is nice, but would work only in for this 3 cells. This function can be applied dynamically

(tSplitText as text, 
tDelimiter as text, 
nOccurance as nullable number) as list =>

let
    nOccuranceIntern = if nOccurance = null then 2 else nOccurance,
    Source = "Chapman, D.A., Baker, R., Thomas, B.L.",
    ListFromText = Text.ToList(tSplitText),
    CreateListOfDelimiter = List.Accumulate
        (
            ListFromText,
            [CurrentRow = -1, DelimiterFound={}],
            (state, current)=>
                if current = tDelimiter then
                        Record.TransformFields(state, {{"DelimiterFound", each _ & {state[CurrentRow]}},{"CurrentRow", each _ +1 }})
                        else
                        Record.TransformFields(state, {{"CurrentRow", each _ +1 }})
        )[DelimiterFound],
    ListAlternate = List.Alternate(CreateListOfDelimiter,nOccuranceIntern-1,nOccuranceIntern-1),
    ChangeTextList = List.Accumulate
        (
            ListAlternate,
            ListFromText,
            (state, current)=>
                List.ReplaceRange(state,current+1,1,{"%%%"})
        ),
    SplitTextByNewDelimiter = Text.Split
        (
            List.Accumulate
                (
                    ChangeTextList,
                    "",
                    (state, current)=> 
                        state & current
                ),
            "%%%"
         )


in
    SplitTextByNewDelimiter

If this post helps or solves your problem, please mark it as solution.
Kudos are nice to - thanks
Have fun

Jimmy

View solution in original post

Mariusz · ‎12-05-2019

Hi @PhilC

a third option for you to choose.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs5ILMhNzNNRcNFz1NNRcErMTi3SUQgCMkMy8nMTi4FCej56SrE60UouiWWZQH4gUM69KDGvREfBVy8IKBULAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Person = _t]),
    #"Replaced Value" = Table.ReplaceValue(Source,".,",".#",Replacer.ReplaceText,{"Person"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Person", Splitter.SplitTextByDelimiter("#", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Person"),
    #"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter",{{"Person", Text.Trim, type text}})
in
    #"Trimmed Text"

Best Regards,
Mariusz

If this post helps, then please consider Accepting it as the solution.

View solution in original post

Mariusz · ‎12-05-2019

Hi @PhilC

a third option for you to choose.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs5ILMhNzNNRcNFz1NNRcErMTi3SUQgCMkMy8nMTi4FCej56SrE60UouiWWZQH4gUM69KDGvREfBVy8IKBULAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Person = _t]),
    #"Replaced Value" = Table.ReplaceValue(Source,".,",".#",Replacer.ReplaceText,{"Person"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Person", Splitter.SplitTextByDelimiter("#", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Person"),
    #"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter",{{"Person", Text.Trim, type text}})
in
    #"Trimmed Text"

Best Regards,
Mariusz

If this post helps, then please consider Accepting it as the solution.

PhilC · ‎12-05-2019

Hi @Mariusz , that is a good approach, keeps the "." portion of the delimiter, and results in the List that I am after.

Thanks.

Jimmy801 · ‎12-04-2019

Hello @PhilC

had created once a nice function that can be used here. Soltuion by @az38 is nice, but would work only in for this 3 cells. This function can be applied dynamically

(tSplitText as text, 
tDelimiter as text, 
nOccurance as nullable number) as list =>

let
    nOccuranceIntern = if nOccurance = null then 2 else nOccurance,
    Source = "Chapman, D.A., Baker, R., Thomas, B.L.",
    ListFromText = Text.ToList(tSplitText),
    CreateListOfDelimiter = List.Accumulate
        (
            ListFromText,
            [CurrentRow = -1, DelimiterFound={}],
            (state, current)=>
                if current = tDelimiter then
                        Record.TransformFields(state, {{"DelimiterFound", each _ & {state[CurrentRow]}},{"CurrentRow", each _ +1 }})
                        else
                        Record.TransformFields(state, {{"CurrentRow", each _ +1 }})
        )[DelimiterFound],
    ListAlternate = List.Alternate(CreateListOfDelimiter,nOccuranceIntern-1,nOccuranceIntern-1),
    ChangeTextList = List.Accumulate
        (
            ListAlternate,
            ListFromText,
            (state, current)=>
                List.ReplaceRange(state,current+1,1,{"%%%"})
        ),
    SplitTextByNewDelimiter = Text.Split
        (
            List.Accumulate
                (
                    ChangeTextList,
                    "",
                    (state, current)=> 
                        state & current
                ),
            "%%%"
         )


in
    SplitTextByNewDelimiter

If this post helps or solves your problem, please mark it as solution.
Kudos are nice to - thanks
Have fun

Jimmy

gauravsukhadia · ‎11-07-2022

Hi @Jimmy801 above function is excellent and I have a similar problem where I want to get result in the group of 10 values ( trying to pass 10 containers to the string including '&' symbol except in the 10th posiiton which is api call to get the status) and then split into rows until the last. Could you please advise where do i need to modify value in this function ? Hi Team above function is excellent and I have a similar problem where I want to get result in the group of 10 values and then split into rows. Could you please advise where do i need to modify value in this function ?

PhilC · ‎12-05-2019

Hi @Jimmy801 , the function works, thanks. I think splitting by ".," / ".#" is a simpler solution so will go with that one.

az38 · ‎12-04-2019

Hi @PhilC

try split by ".,"

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs5ILMhNzNNRcNFz1NNRcErMTi3SUQgCMkMy8nMTi4FCej56SrE60UouiWWZQH4gUM69KDGvREfBVy8IKBULAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Person = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Person", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Person", Splitter.SplitTextByDelimiter(".,", QuoteStyle.Csv), {"Person.1", "Person.2", "Person.3"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Person.1", type text}, {"Person.2", type text}, {"Person.3", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type1", "Index", 0, 1),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
    #"Trimmed Text" = Table.TransformColumns(#"Unpivoted Other Columns",{{"Value", Text.Trim, type text}}),
    #"Added Custom" = Table.AddColumn(#"Trimmed Text", "Custom", each [Value]&"."),
    #"Replaced Value" = Table.ReplaceValue(#"Added Custom","..",".",Replacer.ReplaceText,{"Custom"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Replaced Value",{"Custom"})
in
    #"Removed Other Columns"

do not hesitate to give a kudo to useful posts and mark solutions as solution
Linkedin

do not hesitate to give a kudo to useful posts and mark solutions as solution
LinkedIn

PhilC · ‎12-05-2019

Hi @az38

Thanks for that, the suggestion for using ".," is what I needed, no idea why I did not see that. Thank you.

Edit: just realised that it means the "." will be dropped when splitting, which is not ideal.

Regarding the code, I need an approach that does not split to columns due to the size of the contents (up to 1,000) - this was mentioned in the post, but did not mention that I have three other fields that also need to be expanded, so need to use Lists in the columns.

Cheers, Phil

Jimmy801 · ‎12-06-2019

Hello @PhilC

that is the downside using this "simply" splitting. You have to remaintain data afterwards. And if you applying the logic to a larger dataset, the chances are quite big, this "simplier" solutions won't work (more than 3 parts, ".," not present, but only ","

So "simply" use my function and you are at the save side 🙂

If this post helps or solves your problem, please mark it as solution.
Kudos are nice to - thanks
Have fun

Jimmy

PhilC · ‎12-08-2019

Hi @Jimmy801 ,

I'd love to better understand your thoughts to ensure am using best options.

What did you mean by?

You have to remaintain data afterwards.

How does your approach work better for larger datasets compared to using the Splitter function? I guess an explanation of the steps in the function would help so I can follow what is being done and understand how it is more robust as an approach.

Cheers

Phil

Jimmy801 · ‎12-08-2019

Hello @PhilC

you have shown us only a dataset of 3 rows. So my thoughts on this are as follows

- there might be rows with more than 3 names in it

- there might be rows where you can't apply the ".," but only the real delimiter that is a "," and not the ".,"

- there might be the necessity to get one object to be handled afterwards (nested list) and not new columns based on the content (you would not be able to handle this in a dynamically way afterwards)

So whenever one of this occurs, the other solutions will fail. Maybe also my solution, because i didn't test it on a dataset like this. But I foreseen it. So applying my function, the above mentioned topics should be handled

All the best

Jimmy

Split text based on every second delimiter

Helpful resources

Power BI Monthly Update - April 2024