Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
neophyte
New Member

Text/List to (new) Column OR regex help

Disclaimer: I am a few days new to Power Query, please be patient if I don't already know fundamental concepts.

 

I need a function to convert either Text or List into a Column, where:

  • the text will take the form of double-quoted comma-separated values (e.g. "000000,123456, ...")
  • the list will just be a Text.Split of that text.

The text comes from an HTML-reading function named "regex" (see below).

 

let
    // ...
    #"Duplicated Column" = Table.DuplicateColumn(#"Inserted Text Between Delimiters", "Name", "Effective Date"),
    #"Reordered Columns" = Table.ReorderColumns(#"Duplicated Column",{"Vendor", "Name", "Extension", "Effective Date", "Folder Path"}),
    
    // Regex match then isolate a 6-digit number representing the real Effective Date.
    // [Effective] is going to be the input
    regex = let
        f = (input) =>
            // Creates and accesses an in-app browser...
            Web.Page(
                // To run JS inside!
                "<body onload='w();'><div id='b' onload='w();'>nngn</div><script>
                    var x = '"&input&"';
                    var b = x.match(/(?:19|2[012])\d{4}/gm) ?? ['fdfd'];	// Matches every substring beginning with 19,20,21,22 followed by 4 digits
                    function w() {
                        if (document.readyState == 'interactive') {
                            document.getElementById('b').innerHTML = 'dsds';
                            document.write('<h1>'+b+'</h1>');
                        }
                    }
                    document.onreadystatechange = w();
                </script></body>")[Data]{0}[Children]{0}{1}[Children]	// This part traverses down the DOM hierarchy to get the document.write
                // [Data]{0}[Children]{0} = shows head, body
    in f,
    //Stuff = Table.FromList(Text.Split(regex(#"Reordered Columns"[Copy]),","),null,{"Effective Date"})
    //Stuff = Table.AddColumn(#"Reordered Columns1", "Effective Date", each regex(#"Reordered Columns"[Copy]))
in
    #"Reordered Columns"

 

(haha code sample doesn't even support M; that's what it gets for not supporting regex)

Things to note:

  • the column "Copy": which is what I want to apply regex on;
    • duplicate of column "Name" consisting of filenames;
    • filenames DO NOT have consistent format (approximately "text text 6-digit number *text .extension") but they do have a 6-digit number which I want to use regex to get;
  • the function "regex";
    • uses a browser trick (adapted from here and here, but the Tables.AddColumn each regex() doesn't work) 
    • takes column ("Copy") as input;
    • outputs a Text/string in the form of "000000,123456, ..." (includes the quotation marks)

What I've tried (both written in the code):

  • Table.FromList (Expression.Error: There weren't enough elements in the enumeration to complete the operation. Details: [List])
  • Table.AddColumn (mentioned above; Expression.Error: There weren't enough elements in the enumeration to complete the operation. Details: [List])

Important information: the browser (Web.Page) is IE 11 (up to date)

What is optimal:

Either:

  • find the function which converts Text->Column, or
  • help me come up with a better way to do the regex (so that I get a column of 6-digit numbers in each row, from the column "Copy")

A final note: When searching for the answer to this, I found nothing about converting to columns, and I wonder if I am even on the right track or completely misunderstanding the role of columns and lists in Power Query.

1 ACCEPTED SOLUTION

This solution adapts the code from here and here, respectively. The only lines i added was the "TrimesToYears" statement, and from #"Changed Type" onwards.

 

// ...
    // To get only the numerals from a string,
    // split string into individual characters,
    // keep the characters which are numbers,
    // combine the list back into one string
    #"Duplicate Column" = Table.AddColumn(#"Extracted Text Between Delimiters", "Effective", each Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Name]), each if Value.Is(Value.FromText(_), type number) then _ else null))), type text),
// ...
// Helper function which turns txt argument into year argument
// txt: original text
// year: 6-digit number, a substring somewhere in txt
    GetDate = (txt as text, year as text) =>
        let 
            CountOccurences = Text.PositionOf(txt, year, Occurrence.All),
            CheckSequence = List.Transform(CountOccurences, each try Number.From(Text.Range(txt, _, 6)) otherwise null),
            SelectSuccess = List.Select(CheckSequence, each _ <> null), // null-checking
            Output = List.First(SelectSuccess)
        in Output,

// (function) Extracts date based on matching with specific "regex"
// the so-called "regex": must begin with 19,20,...,25 (last two digits of year)
// AND the third character (first digit of month) must be 0 or 1 (i.e. excludes "2022xx")
// i added the TrimesToYears statement
    ExtractDate = (txt as text) =>
        let         
            TrimesToYears = List.Select({190..252}, each List.Contains({0, 1}, Number.Mod(_, 10))),
            YearsToCheck = List.Transform(TrimesToYears, Text.From),
            CheckPosition = List.Transform(YearsToCheck, each GetDate(txt, _)),
            OnlySuccessful = List.Select(CheckPosition, each _ > 0),
            Result = List.First(OnlySuccessful, null)
        in
            Result,
    
    #"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns", {{"Effective", type text}}),    
    // i use ExtractDate function in this line
    #"Customized Column" = Table.AddColumn(#"Changed Type", "Customized", each ExtractDate([Effective])),
    // Really ugly way to convert 6-digit string to date
    #"Created Date" = Table.AddColumn(#"Customized Column" , "Effective Date", each Date.From(#datetime(
        Number.FromText(Text.Range(Number.ToText([Customized]),0,2))+2000, // year
        Number.FromText(Text.Range(Number.ToText([Customized]),2,2)), // month
        Number.FromText(Text.Range(Number.ToText([Customized]),4,2)),0,0,0))), // day, etc.
    #"Transformed Type" = Table.TransformColumnTypes(#"Created Date", {{"Effective Date", type date}})

// ...

 

View solution in original post

6 REPLIES 6
jbwtp
Memorable Member
Memorable Member

Hi @neophyte,

 

does this do what you need?

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TZC5DcNQDEN3cZ1CB3XNYrjMBPH+yA8CmL98EHiI53nc788NdMkc1+uP0T6dD0KhQ9RpSePVZRwPWlsgHnSYznaFTxF7WYUTK3SUmC3QTYsSOteIbLmVbd4PirootWjL2jpPwPhv6cphjVnaoFUEfKvR1mlCKxOMb2t01NbZBlbERCVzTVOaqJa6zZ5L3AxKSCX/9XT7db6+", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
    #"Duplicated Column" = Table.AddColumn(#"Changed Type", "Custom", each Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Name]),each if Value.Is(Value.FromText(_), type number) then _ else null))), type text)
in #"Duplicated Column"

 

 

Thanks to  @v-shex-msft forproviding a solution to extracting the number from text: Solved: PowerQuery | Extract Numbers from A string (eg ABC... - Microsoft Power BI Community

 

Kind regards,

John

 

This solution adapts the code from here and here, respectively. The only lines i added was the "TrimesToYears" statement, and from #"Changed Type" onwards.

 

// ...
    // To get only the numerals from a string,
    // split string into individual characters,
    // keep the characters which are numbers,
    // combine the list back into one string
    #"Duplicate Column" = Table.AddColumn(#"Extracted Text Between Delimiters", "Effective", each Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Name]), each if Value.Is(Value.FromText(_), type number) then _ else null))), type text),
// ...
// Helper function which turns txt argument into year argument
// txt: original text
// year: 6-digit number, a substring somewhere in txt
    GetDate = (txt as text, year as text) =>
        let 
            CountOccurences = Text.PositionOf(txt, year, Occurrence.All),
            CheckSequence = List.Transform(CountOccurences, each try Number.From(Text.Range(txt, _, 6)) otherwise null),
            SelectSuccess = List.Select(CheckSequence, each _ <> null), // null-checking
            Output = List.First(SelectSuccess)
        in Output,

// (function) Extracts date based on matching with specific "regex"
// the so-called "regex": must begin with 19,20,...,25 (last two digits of year)
// AND the third character (first digit of month) must be 0 or 1 (i.e. excludes "2022xx")
// i added the TrimesToYears statement
    ExtractDate = (txt as text) =>
        let         
            TrimesToYears = List.Select({190..252}, each List.Contains({0, 1}, Number.Mod(_, 10))),
            YearsToCheck = List.Transform(TrimesToYears, Text.From),
            CheckPosition = List.Transform(YearsToCheck, each GetDate(txt, _)),
            OnlySuccessful = List.Select(CheckPosition, each _ > 0),
            Result = List.First(OnlySuccessful, null)
        in
            Result,
    
    #"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns", {{"Effective", type text}}),    
    // i use ExtractDate function in this line
    #"Customized Column" = Table.AddColumn(#"Changed Type", "Customized", each ExtractDate([Effective])),
    // Really ugly way to convert 6-digit string to date
    #"Created Date" = Table.AddColumn(#"Customized Column" , "Effective Date", each Date.From(#datetime(
        Number.FromText(Text.Range(Number.ToText([Customized]),0,2))+2000, // year
        Number.FromText(Text.Range(Number.ToText([Customized]),2,2)), // month
        Number.FromText(Text.Range(Number.ToText([Customized]),4,2)),0,0,0))), // day, etc.
    #"Transformed Type" = Table.TransformColumnTypes(#"Created Date", {{"Effective Date", type date}})

// ...

 

Thank you so much for your reply, John!

Unfortunately, my filenames are much too unpredictable for your extraction method. For example, it doesn't work for "Scan220122_145632", which results in "220722145644", whereas I only wanted "220122".

Note to future repliers: the only thing in common between all the filenames is that they have 6-contiguous-digits representing the date (the only thing I can guarantee), and that there may be more (extraneous) numbers.

So again, John, I appreciate your help, but my demands are way too exacting!

Regards

Hi @neophyte,

 

Can you just truncate the remaining digits in the Custom column? I think this is Extract->Firs characters in the main menu/ribbon.

 

Cheers,

John

Hi John,

 

Thank you for your sugggestion! Unfortunately, as I found out, the 6-digit number can be anywhere in the string, before or after other numbers. As such, it doesn't work for filenames like "XXXX XXX XXXXXXXXXXX Increase 4 percent 220110", where it would just reduce to "422011".

I had an idea to filter out only the numbers beginning with 19-22 (representing the year) followed by 4 other numbers (as can be seen above in my updated regex). Could this perhaps be implemented in plain M?

 

Thanks again,

Hi @neophyte,

 

This is possible, but the code is a bit more complex. Still reasoably fast even on my old laptop, though:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TZC5DcNQDEN3cZ1CB3XNYrjMBPH+yA8CmL98EHiI53nc788NdMkc1+uP0T6dD0KhQ9RpSePVZRwPWlsgHnSYznaFTxF7WYUTK3SUmC3QTYsSOteIbLmVbd4PirootWjL2jpPwPhv6cphjVnaoFUEfKvR1mlCKxOMb2t01NbZBlbERCVzTVOaqJa6zZ5L3AxKSCX/9XT7db6+", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
    //txt = "test220202_19_22_21",
    
    GetDate = (txt as text, year as text) =>
        let 
            CountOccurences = Text.PositionOf(txt, year, Occurrence.All),
            CheckSequence = List.Transform(CountOccurences, each try Number.From(Text.Range(txt, _, 6)) otherwise null),
            SelectSuccess = List.Select(CheckSequence, each _ <> null),
            Output = List.First(SelectSuccess)
        in Output,

    ExtractDate = (txt as text) =>
        let         
            YearsToCheck = List.Transform({19..25}, Text.From),
            CheckPosition = List.Transform(YearsToCheck, each GetDate(txt, _)),
            OnlySuccessful = List.Select(CheckPosition, each _ > 0),
            Result = List.First(OnlySuccessful, null)
        in
            Result,
    Output = Table.AddColumn(#"Changed Type", "First Characters", each ExtractDate([Name]))
in Output

 

See if this does what you want.

 

Kind regards,

John

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors