cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
PhilC Regular Visitor
Regular Visitor

Split text based on every second delimiter

Hi, looking for assistance with manipulating columns in Power Query.

 

I need to split a column based on a Comma delimiter, however I need to only split every second one.  The data is formatted Surname, Initials with a Comma delimiter between people.  There are multiple people in each row (can be between 1 and 1,000 values in each row).  I am not looking to split into columns as part of this process.

 

I want to extract into a list, but have each Surname, Initials combination, not Surname then Initials on the next row.

 

Example Data

Person
Chapman, D.A., Baker, R., Thomas, B.L.
Davis, Q., Grant, M.R.

 

Desired Result

Person
Chapman, D.A.
Baker, R.
Thomas, B.L.
Grant, M.R.
Davis, Q.

 

Have been trying the code from here:  https://community.powerbi.com/t5/Desktop/Text-Replace-specific-character-only-if-it-occurs-as-the-la...

Table.TransformColumns(#"Changed Type",{{"Column1", each Text.Split(_," ")}})

 

Hope that explains what I am after.

 

Cheers

Phil

 

3 ACCEPTED SOLUTIONS

Accepted Solutions
Highlighted
az38 Super Contributor
Super Contributor

Re: Split text based on every second delimiter

Hi @PhilC 

try split by ".,"

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs5ILMhNzNNRcNFz1NNRcErMTi3SUQgCMkMy8nMTi4FCej56SrE60UouiWWZQH4gUM69KDGvREfBVy8IKBULAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Person = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Person", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Person", Splitter.SplitTextByDelimiter(".,", QuoteStyle.Csv), {"Person.1", "Person.2", "Person.3"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Person.1", type text}, {"Person.2", type text}, {"Person.3", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type1", "Index", 0, 1),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
    #"Trimmed Text" = Table.TransformColumns(#"Unpivoted Other Columns",{{"Value", Text.Trim, type text}}),
    #"Added Custom" = Table.AddColumn(#"Trimmed Text", "Custom", each [Value]&"."),
    #"Replaced Value" = Table.ReplaceValue(#"Added Custom","..",".",Replacer.ReplaceText,{"Custom"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Replaced Value",{"Custom"})
in
    #"Removed Other Columns"

do not hesitate to give a kudo to useful posts and mark solutions as solution
Linkedin

View solution in original post

Jimmy801 New Contributor
New Contributor

Re: Split text based on every second delimiter

Hello @PhilC 

 

had created once a nice function that can be used here. Soltuion by @az38  is nice, but would work only in for this 3 cells. This function can be applied dynamically

(tSplitText as text, 
tDelimiter as text, 
nOccurance as nullable number) as list =>

let
    nOccuranceIntern = if nOccurance = null then 2 else nOccurance,
    Source = "Chapman, D.A., Baker, R., Thomas, B.L.",
    ListFromText = Text.ToList(tSplitText),
    CreateListOfDelimiter = List.Accumulate
        (
            ListFromText,
            [CurrentRow = -1, DelimiterFound={}],
            (state, current)=>
                if current = tDelimiter then
                        Record.TransformFields(state, {{"DelimiterFound", each _ & {state[CurrentRow]}},{"CurrentRow", each _ +1 }})
                        else
                        Record.TransformFields(state, {{"CurrentRow", each _ +1 }})
        )[DelimiterFound],
    ListAlternate = List.Alternate(CreateListOfDelimiter,nOccuranceIntern-1,nOccuranceIntern-1),
    ChangeTextList = List.Accumulate
        (
            ListAlternate,
            ListFromText,
            (state, current)=>
                List.ReplaceRange(state,current+1,1,{"%%%"})
        ),
    SplitTextByNewDelimiter = Text.Split
        (
            List.Accumulate
                (
                    ChangeTextList,
                    "",
                    (state, current)=> 
                        state & current
                ),
            "%%%"
         )


in
    SplitTextByNewDelimiter

 

If this post helps or solves your problem, please mark it as solution.
Kudos are nice to - thanks
Have fun

Jimmy

View solution in original post

Super User
Super User

Re: Split text based on every second delimiter

Hi @PhilC 

 

a third option for you to choose.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs5ILMhNzNNRcNFz1NNRcErMTi3SUQgCMkMy8nMTi4FCej56SrE60UouiWWZQH4gUM69KDGvREfBVy8IKBULAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Person = _t]),
    #"Replaced Value" = Table.ReplaceValue(Source,".,",".#",Replacer.ReplaceText,{"Person"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Person", Splitter.SplitTextByDelimiter("#", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Person"),
    #"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter",{{"Person", Text.Trim, type text}})
in
    #"Trimmed Text"

 

Best Regards,
Mariusz

If this post helps, then please consider Accepting it as the solution.

 

View solution in original post

9 REPLIES 9
Highlighted
az38 Super Contributor
Super Contributor

Re: Split text based on every second delimiter

Hi @PhilC 

try split by ".,"

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs5ILMhNzNNRcNFz1NNRcErMTi3SUQgCMkMy8nMTi4FCej56SrE60UouiWWZQH4gUM69KDGvREfBVy8IKBULAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Person = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Person", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Person", Splitter.SplitTextByDelimiter(".,", QuoteStyle.Csv), {"Person.1", "Person.2", "Person.3"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Person.1", type text}, {"Person.2", type text}, {"Person.3", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type1", "Index", 0, 1),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Added Index", {"Index"}, "Attribute", "Value"),
    #"Trimmed Text" = Table.TransformColumns(#"Unpivoted Other Columns",{{"Value", Text.Trim, type text}}),
    #"Added Custom" = Table.AddColumn(#"Trimmed Text", "Custom", each [Value]&"."),
    #"Replaced Value" = Table.ReplaceValue(#"Added Custom","..",".",Replacer.ReplaceText,{"Custom"}),
    #"Removed Other Columns" = Table.SelectColumns(#"Replaced Value",{"Custom"})
in
    #"Removed Other Columns"

do not hesitate to give a kudo to useful posts and mark solutions as solution
Linkedin

View solution in original post

Jimmy801 New Contributor
New Contributor

Re: Split text based on every second delimiter

Hello @PhilC 

 

had created once a nice function that can be used here. Soltuion by @az38  is nice, but would work only in for this 3 cells. This function can be applied dynamically

(tSplitText as text, 
tDelimiter as text, 
nOccurance as nullable number) as list =>

let
    nOccuranceIntern = if nOccurance = null then 2 else nOccurance,
    Source = "Chapman, D.A., Baker, R., Thomas, B.L.",
    ListFromText = Text.ToList(tSplitText),
    CreateListOfDelimiter = List.Accumulate
        (
            ListFromText,
            [CurrentRow = -1, DelimiterFound={}],
            (state, current)=>
                if current = tDelimiter then
                        Record.TransformFields(state, {{"DelimiterFound", each _ & {state[CurrentRow]}},{"CurrentRow", each _ +1 }})
                        else
                        Record.TransformFields(state, {{"CurrentRow", each _ +1 }})
        )[DelimiterFound],
    ListAlternate = List.Alternate(CreateListOfDelimiter,nOccuranceIntern-1,nOccuranceIntern-1),
    ChangeTextList = List.Accumulate
        (
            ListAlternate,
            ListFromText,
            (state, current)=>
                List.ReplaceRange(state,current+1,1,{"%%%"})
        ),
    SplitTextByNewDelimiter = Text.Split
        (
            List.Accumulate
                (
                    ChangeTextList,
                    "",
                    (state, current)=> 
                        state & current
                ),
            "%%%"
         )


in
    SplitTextByNewDelimiter

 

If this post helps or solves your problem, please mark it as solution.
Kudos are nice to - thanks
Have fun

Jimmy

View solution in original post

Super User
Super User

Re: Split text based on every second delimiter

Hi @PhilC 

 

a third option for you to choose.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcs5ILMhNzNNRcNFz1NNRcErMTi3SUQgCMkMy8nMTi4FCej56SrE60UouiWWZQH4gUM69KDGvREfBVy8IKBULAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Person = _t]),
    #"Replaced Value" = Table.ReplaceValue(Source,".,",".#",Replacer.ReplaceText,{"Person"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Person", Splitter.SplitTextByDelimiter("#", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Person"),
    #"Trimmed Text" = Table.TransformColumns(#"Split Column by Delimiter",{{"Person", Text.Trim, type text}})
in
    #"Trimmed Text"

 

Best Regards,
Mariusz

If this post helps, then please consider Accepting it as the solution.

 

View solution in original post

PhilC Regular Visitor
Regular Visitor

Re: Split text based on every second delimiter

Hi @az38

 

Thanks for that, the suggestion for using ".," is what I needed, no idea why I did not see that.  Thank you.

 

Edit: just realised that it means the "." will be dropped when splitting, which is not ideal.

 

Regarding the code, I need an approach that does not split to columns due to the size of the contents (up to 1,000) - this was mentioned in the post, but did not mention that I have three other fields that also need to be expanded, so need to use Lists in the columns.

Cheers, Phil

PhilC Regular Visitor
Regular Visitor

Re: Split text based on every second delimiter

Hi @Jimmy801 , the function works, thanks.  I think splitting by ".," / ".#" is a simpler solution so will go with that one.

PhilC Regular Visitor
Regular Visitor

Re: Split text based on every second delimiter

Hi @Mariusz , that is a good approach, keeps the "." portion of the delimiter, and results in the List that I am after.

 

Thanks.

Jimmy801 New Contributor
New Contributor

Re: Split text based on every second delimiter

Hello @PhilC 

 

that is the downside using this "simply" splitting. You have to remaintain data afterwards. And if you applying the logic to a larger dataset, the chances are quite big, this "simplier" solutions won't work (more than 3 parts, ".," not present, but only ","

So "simply" use my function and you are at the save side 🙂

 

If this post helps or solves your problem, please mark it as solution.
Kudos are nice to - thanks
Have fun

Jimmy

PhilC Regular Visitor
Regular Visitor

Re: Split text based on every second delimiter

Hi @Jimmy801 ,

 

I'd love to better understand your thoughts to ensure am using best options. 

 

What did you mean by?


You have to remaintain data afterwards.

 

How does your approach work better for larger datasets compared to using the Splitter function?  I guess an explanation of the steps in the function would help so I can follow what is being done and understand how it is more robust as an approach.

 

Cheers

Phil

Jimmy801 New Contributor
New Contributor

Re: Split text based on every second delimiter

Hello @PhilC 

 

you have shown us only a dataset of 3 rows. So my thoughts on this are as follows

- there might be rows with more than 3 names in it

- there might be rows where you can't apply the ".," but only the real delimiter that is a "," and not the ".,"

- there might be the necessity to get one object to be handled afterwards (nested list) and not new columns based on the content (you would not be able to handle this in a dynamically way afterwards)

So whenever one of this occurs, the other solutions will fail. Maybe also my solution, because i didn't test it on a dataset like this. But I foreseen it. So applying my function, the above mentioned topics should be handled

 

All the best

 

Jimmy

Helpful resources

Announcements
New Topics Started Badges Coming

New Topics Started Badges Coming

We're releasing new versions of the badge that everyone's talking about. ;) Check your inbox for notifications.

MBAS 2020

Save the new date (and location)!

Our business applications community is growing—so we needed a different venue, resulting in a new date and location. See you there!

Difinity Conference

Difinity Conference

The largest Power BI, Power Platform, and Data conference in New Zealand

Power Platform 2019 release wave 2 plan

Power Platform 2019 release wave 2 plan

Features releasing from October 2019 through March 2020

Top Solution Authors
Top Kudoed Authors (Last 30 Days)