Solved: Re: Split Column by First Letter

surajch · ‎02-10-2020

Hi All,

I have a dataset that combines order and seller. I am attempting to split them by first letter due to the variance of delimeter in each cell. Is it possible to build a solution for just the first letter split rather than building one for each delimeter?

Current	Goal	Goal
5H-15-2/Bob	5H-15-2	Bob
9SH-12-21-Ted	9SH-12-21	Ted
15-01-2 Marissa	15-01-2	Marissa

dax · ‎02-10-2020

Hi @surajch ,

You could try below M code to see whteher it work or not.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMvXQNTTVNdJ3yk9SitWJVrIMBgoY6RoZ6oakpoBFgNIGhrpGCr6JRZnFxYlKsbEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Current = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Current", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each  let
        MyList = Text.ToList([Current]),
        ListCount = List.Count(MyList),
        ListArr = {"A".."Z"}
    in
        Text.Combine(List.Generate(()=> [
            a = 0, 
            b = MyList {a}
        ],
        each [a] < ListCount, 
        each [
            a = [a] + 1, 
            b = if List.Contains(ListArr, MyList {a}) then " " & MyList {a} else MyList {a}
        ], 
        each [b]
        ))),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Added Custom", "Text After Delimiter", each Text.AfterDelimiter([Custom], " ", {0, RelativePosition.FromEnd}), type text),
    #"Removed Columns" = Table.RemoveColumns(#"Inserted Text After Delimiter",{"Custom"})
in
    #"Removed Columns"

Best Regards,
Zoe Zhi

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Mariusz · ‎02-11-2020

Hi @surajch

Try this.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMvXQNTTVNdJ3yk9SitWJVrIMBgoY6RoZ6oakpoBFgNIGhrpGCr6JRZnFxYlgMV8IALMTQaAYCFJSgBpiAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Current = _t]),
    #"Added Custom" = Table.AddColumn(Source, "records", 
        ( rec ) => let 
            position = Text.PositionOfAny( rec[Current], {"A".."Z"}, 1 ), 
            result = 
                [
                    Code = Text.Start( rec[Current], position -1 ), 
                    Name = Text.Range( rec[Current], position )
                ]  
        in 
            result
        ),
    #"Expanded records" = Table.ExpandRecordColumn(#"Added Custom", "records", {"Code", "Name"}, {"Code", "Name"}),
    #"Replaced Errors" = Table.ReplaceErrorValues(#"Expanded records", {{"Code", null}, {"Name", null}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Replaced Errors",{{"Code", type text}, {"Name", type text}})
in
    #"Changed Type"

Best Regards,
Mariusz

If this post helps, then please consider Accepting it as the solution.

Please feel free to connect with me.
LinkedIn

View solution in original post

darrenfishell · ‎11-11-2022

This would be so incredibly simple with RegEx support. 😭

Mariusz · ‎02-11-2020

Hi @surajch

Try this.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMvXQNTTVNdJ3yk9SitWJVrIMBgoY6RoZ6oakpoBFgNIGhrpGCr6JRZnFxYlgMV8IALMTQaAYCFJSgBpiAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Current = _t]),
    #"Added Custom" = Table.AddColumn(Source, "records", 
        ( rec ) => let 
            position = Text.PositionOfAny( rec[Current], {"A".."Z"}, 1 ), 
            result = 
                [
                    Code = Text.Start( rec[Current], position -1 ), 
                    Name = Text.Range( rec[Current], position )
                ]  
        in 
            result
        ),
    #"Expanded records" = Table.ExpandRecordColumn(#"Added Custom", "records", {"Code", "Name"}, {"Code", "Name"}),
    #"Replaced Errors" = Table.ReplaceErrorValues(#"Expanded records", {{"Code", null}, {"Name", null}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Replaced Errors",{{"Code", type text}, {"Name", type text}})
in
    #"Changed Type"

Best Regards,
Mariusz

If this post helps, then please consider Accepting it as the solution.

Please feel free to connect with me.
LinkedIn

TheDataMustFlow · ‎02-10-2020

Hi surajch,

The following code will give you the desired result provided that:

- there are no trailing spaces at the end of the "Current" key

- the name is alway at the end of the key and doesn't contain numbers

let
    Source = Excel.CurrentWorkbook(){[Name="Table2"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Current", type text}, {"Goal", type text}, {"Goal2", type text}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Changed Type", "Current", "Current - Copy"),
   
   /* The split column by position breaks down the characters of "Current - Copy" one by row while replicating the corresponding "Current"
    key by row*/
    #"Split Column by Position" = Table.ExpandListColumn(
                                    Table.TransformColumns(
                                        #"Duplicated Column",
                                        {{"Current - Copy", Splitter.SplitTextByRepeatedLengths(1), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}),
                                        "Current - Copy"),
                             
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Position",{{"Current - Copy", type text}}),

    //Grouped rows creates a nested table for each key
    #"Grouped Rows" = Table.Group(
            #"Changed Type1",
            {"Current"},
            {{"GroupedRows", each _, type table [Current=text, #"Current - Copy"=text, Goal=text, Goal2=text, Custom=text]}}),

    //Then we add an index column to each nested table
    #"Added Custom1" = Table.AddColumn(
            #"Grouped Rows",
            "Index",
            each Table.AddIndexColumn([GroupedRows],
            "rank",
            1)),
    //we expand the nested tables into the main table
    #"Expanded Index" = Table.ExpandTableColumn(#"Added Custom1", "Index", {"Current - Copy", "rank"}, {"Current - Copy", "rank"}),

    //then we add a custom column that will identify the first rank where last characters of "Current" include non-alpha characters
    #"Added Custom2" = Table.AddColumn(
        #"Expanded Index",
        "Custom",
        each Text.Length(Text.End([Current],[rank]))-Text.Length(Text.Select(Text.End([Current],[rank]),{"a".."z","A".."Z"}))),
    
    //then we filter out all the ranks that would include non-alpha characters
    #"Filtered Rows" = Table.SelectRows(#"Added Custom2", each ([Custom] = 0)),

    /*finally, we group again by "Current" and retrieve the max rank for each key 
    (this will be the number of characters at the end of the key that correspond to the name)*/
    #"Grouped Rows1" = Table.Group(#"Filtered Rows", {"Current"}, {{"Counter", each List.Max([rank]), type number}}),
    //the rest is just basic custom columns using the obtained number as the variable
    #"Added Custom3" = Table.AddColumn(#"Grouped Rows1", "Name", each Text.End([Current],[Counter]),type text),
    #"Added Custom4" = Table.AddColumn(#"Added Custom3", "ID", each Text.Start([Current],Text.Length([Current])-([Counter]+1)), type text)
in
    #"Added Custom4"

Let me know if this solves your issue.

If not, can you share what failed and, if possible, the code for your query?

dax · ‎02-10-2020

Hi @surajch ,

You could try below M code to see whteher it work or not.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMvXQNTTVNdJ3yk9SitWJVrIMBgoY6RoZ6oakpoBFgNIGhrpGCr6JRZnFxYlKsbEA", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Current = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Current", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each  let
        MyList = Text.ToList([Current]),
        ListCount = List.Count(MyList),
        ListArr = {"A".."Z"}
    in
        Text.Combine(List.Generate(()=> [
            a = 0, 
            b = MyList {a}
        ],
        each [a] < ListCount, 
        each [
            a = [a] + 1, 
            b = if List.Contains(ListArr, MyList {a}) then " " & MyList {a} else MyList {a}
        ], 
        each [b]
        ))),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Added Custom", "Text After Delimiter", each Text.AfterDelimiter([Custom], " ", {0, RelativePosition.FromEnd}), type text),
    #"Removed Columns" = Table.RemoveColumns(#"Inserted Text After Delimiter",{"Custom"})
in
    #"Removed Columns"

Best Regards,
Zoe Zhi

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Split Column by First Letter

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024