Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
agc1234
New Member

Extracting a Specific string format from a string of text

Good Day Everyone,

 

I am trying to extract a specific format of: ##a# or (number)(number)(letter)(number)

 

I need first 4 parts of the serial number. I've attached examples of source data and what I'd like to extract:

 

agc1234_0-1707419347433.png

 

Does anyone have any ideas? Really stuck trying to get a reliable formula for this.

 

Thanks Everyone!

 

6 REPLIES 6
AlienSx
Super User
Super User

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jY4xCgIxEEWvMqQ2kMxMJoldomAhpLERYgqFLRa0Wdn7q5scwPLNPPivVmVNEiaPqu2quhSDhyCGqeO0zPcnlPX1mBaY37AHS+cYkXjo30MsJOy8BLitxqCAxZML/ip+c6xAAgkkfQFRZ+8Cx05WZybNTrprtpR8HPQrSyMtjecfw6q1Dw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Serial #" = _t]),
    pattern = {false, false, true, false},
    chars = List.Buffer({"0".."9", "A".."Z"}),    
    finder = (t as text) =>
        [lst = List.Buffer(List.Select(Text.ToList(t), (x) => List.Contains(chars, x))),
        gen = List.Generate(
            () => [ch = lst, okay = false, serial = {}],
            (x) => not List.IsEmpty(x[ch]) or x[okay], 
            (x) => 
                [okay = List.Transform(List.FirstN(x[ch], 4), (w) => (try Number.From(w))[HasError]) = pattern,
                serial =  if okay then List.FirstN(x[ch], 4) else {},
                ch = List.Skip(x[ch], if okay then 4 else 1)],
            (x) => Text.Combine(x[serial])
        ),
        z = Text.Combine(List.Select(gen, each _ <> ""), ", ")][z],
    out = Table.AddColumn(Source, "Desired Outcome", (x) => finder(x[#"Serial #"]))
in
    out
dufoq3
Super User
Super User

Hello @agc1234,
same result as @Vijay_A_Verma but different approach.

 

See line four. This one also matches the criteria.

dufoq3_0-1707514941652.png

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("LY4xCwIxDEb/SuhsoU3StHVrFRyELi5C7aBww8G5nNz/V68dX/Lge7Uqa5IweVTtUNWtGDwFMUwdp3V+LlC292taYf7AESxdY0Tiof8OsZCw8xLgsRmDAhYvLvi7+N2xAgkkkPQFRJ29Cxw7WZ2ZNDvprtlT8nnQvyyNtDSeqrUv", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Serial #" = _t]),
    UppercasedText = Table.TransformColumns(Source,{{"Serial #", Text.Upper, type text}}),
    Ad_Cleaned = Table.AddColumn(UppercasedText , "Cleaned", each Text.Select([#"Serial #"], List.Buffer({"0".."9", "A".."Z"}))),
    Ad_EachFourCombination = Table.AddColumn(Ad_Cleaned, "Each 4 Combination", each List.Accumulate( 
    {0..Text.Length([Cleaned])-1},
    {},
    (s,c)=> List.RemoveNulls(s & { try Text.Range([Cleaned], c, 4) otherwise null })
), type list),
    Ad_OnlyMatching = Table.AddColumn(Ad_EachFourCombination, "Only Matching", each List.RemoveNulls(List.Transform([Each 4 Combination], each 
      if  (try Value.Is(Number.From(Text.ToList(_){0}), Number.Type) otherwise false) //1st digit check if is number
      and (try Value.Is(Number.From(Text.ToList(_){1}), Number.Type) otherwise false) //2nd digit check if is number
      and (try not Value.Is(Number.From(Text.ToList(_){2}), Number.Type) otherwise true) //3rd digit check if is text
      and (try Value.Is(Number.From(Text.ToList(_){3}), Number.Type) otherwise false) //4th digit check if is number
      then _ else null)), type list),
    Ad_FourDigits = Table.AddColumn(Ad_OnlyMatching, "4 digits", each Text.Combine([Only Matching], ", "), type text),
    RemovedOtherColumns = Table.SelectColumns(Ad_FourDigits,{"Serial #", "4 digits"})
in
    RemovedOtherColumns

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

latimeria
Solution Specialist
Solution Specialist

Hi @agc1234 ,

Another approach

 

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Serial#", type text}}),
    #"Added Desired Output" = Table.AddColumn(
        #"Changed Type", 
        "Desired Output", 
        each 
            List.Transform(
                Text.Split(Text.Remove([#"Serial#"],{" ", "-"}),"&"), 
                each Text.Start(
                    Text.RemoveRange(_,
                    0,
                    Text.PositionOfAny( 
                        _,
                        {"0".."9"},Occurrence.First
                    )
                ),
                4
            )
        )
    ),
    #"Extracted Values" = Table.TransformColumns(
        #"Added Desired Output", 
        {"Desired Output", each Text.Combine(List.Transform(_, Text.From), ", "), type text}
        )
in
    #"Extracted Values"

 

Hi @latimeria,

 

These should be excluded.

dufoq3_1-1707515318711.png

 


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Vijay_A_Verma
Super User
Super User

Use below code

Note - In row 4, 87X6 is also a pattern matching your requirement.

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("LY4xCwIxDEb/SuhsoU3StHVrFRyELi5C7aBww8G5nNz/V68dX/Lge7Uqa5IweVTtUNWtGDwFMUwdp3V+LlC292taYf7AESxdY0Tiof8OsZCw8xLgsRmDAhYvLvi7+N2xAgkkkPQFRJ29Cxw7WZ2ZNDvprtlT8nnQvyyNtDSeqrUv", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Serial #" = _t]),
    #"Added Custom2" = Table.AddColumn(Source, "Custom.1", (x)=> 
    let
        p = Text.Select(x[#"Serial #"], {"0".."9", "A".."Z"}),
        c = Text.Length(p)-3,
        r = List.Generate(()=>[i=0, a=Text.Start(p,4)], each [i]<c, each [i=[i]+1, a=Text.Middle(p,i,4)], each [a])
    in
        Text.Combine(List.Select(r, (x)=> List.ContainsAll({"0".."9"}, Text.ToList(Text.Start(x, 2) & Text.End(x, 1))) and Text.Select(x, {"A".."Z"}) = Text.At(x,2)), "; ")    )
in
    #"Added Custom2"

 

 

v-stephen-msft
Community Support
Community Support

Hi @agc1234 ,

 

In Power BI, when we want to process complex and unordered data, adding many steps using power query may not get the desired results. As we know, power query cannot use regular expressions. If we call Python script and use regular expressions to process the data this time, it will become simple. Please refer to

Using regular expressions in power bi desktop - Microsoft Fabric Community

 

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.           

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

Top Solution Authors
Top Kudoed Authors