Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Disclaimer: I am a few days new to Power Query, please be patient if I don't already know fundamental concepts.
I need a function to convert either Text or List into a Column, where:
The text comes from an HTML-reading function named "regex" (see below).
let
// ...
#"Duplicated Column" = Table.DuplicateColumn(#"Inserted Text Between Delimiters", "Name", "Effective Date"),
#"Reordered Columns" = Table.ReorderColumns(#"Duplicated Column",{"Vendor", "Name", "Extension", "Effective Date", "Folder Path"}),
// Regex match then isolate a 6-digit number representing the real Effective Date.
// [Effective] is going to be the input
regex = let
f = (input) =>
// Creates and accesses an in-app browser...
Web.Page(
// To run JS inside!
"<body onload='w();'><div id='b' onload='w();'>nngn</div><script>
var x = '"&input&"';
var b = x.match(/(?:19|2[012])\d{4}/gm) ?? ['fdfd']; // Matches every substring beginning with 19,20,21,22 followed by 4 digits
function w() {
if (document.readyState == 'interactive') {
document.getElementById('b').innerHTML = 'dsds';
document.write('<h1>'+b+'</h1>');
}
}
document.onreadystatechange = w();
</script></body>")[Data]{0}[Children]{0}{1}[Children] // This part traverses down the DOM hierarchy to get the document.write
// [Data]{0}[Children]{0} = shows head, body
in f,
//Stuff = Table.FromList(Text.Split(regex(#"Reordered Columns"[Copy]),","),null,{"Effective Date"})
//Stuff = Table.AddColumn(#"Reordered Columns1", "Effective Date", each regex(#"Reordered Columns"[Copy]))
in
#"Reordered Columns"
(haha code sample doesn't even support M; that's what it gets for not supporting regex)
Things to note:
What I've tried (both written in the code):
Important information: the browser (Web.Page) is IE 11 (up to date)
What is optimal:
Either:
A final note: When searching for the answer to this, I found nothing about converting to columns, and I wonder if I am even on the right track or completely misunderstanding the role of columns and lists in Power Query.
Solved! Go to Solution.
This solution adapts the code from here and here, respectively. The only lines i added was the "TrimesToYears" statement, and from #"Changed Type" onwards.
// ...
// To get only the numerals from a string,
// split string into individual characters,
// keep the characters which are numbers,
// combine the list back into one string
#"Duplicate Column" = Table.AddColumn(#"Extracted Text Between Delimiters", "Effective", each Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Name]), each if Value.Is(Value.FromText(_), type number) then _ else null))), type text),
// ...
// Helper function which turns txt argument into year argument
// txt: original text
// year: 6-digit number, a substring somewhere in txt
GetDate = (txt as text, year as text) =>
let
CountOccurences = Text.PositionOf(txt, year, Occurrence.All),
CheckSequence = List.Transform(CountOccurences, each try Number.From(Text.Range(txt, _, 6)) otherwise null),
SelectSuccess = List.Select(CheckSequence, each _ <> null), // null-checking
Output = List.First(SelectSuccess)
in Output,
// (function) Extracts date based on matching with specific "regex"
// the so-called "regex": must begin with 19,20,...,25 (last two digits of year)
// AND the third character (first digit of month) must be 0 or 1 (i.e. excludes "2022xx")
// i added the TrimesToYears statement
ExtractDate = (txt as text) =>
let
TrimesToYears = List.Select({190..252}, each List.Contains({0, 1}, Number.Mod(_, 10))),
YearsToCheck = List.Transform(TrimesToYears, Text.From),
CheckPosition = List.Transform(YearsToCheck, each GetDate(txt, _)),
OnlySuccessful = List.Select(CheckPosition, each _ > 0),
Result = List.First(OnlySuccessful, null)
in
Result,
#"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns", {{"Effective", type text}}),
// i use ExtractDate function in this line
#"Customized Column" = Table.AddColumn(#"Changed Type", "Customized", each ExtractDate([Effective])),
// Really ugly way to convert 6-digit string to date
#"Created Date" = Table.AddColumn(#"Customized Column" , "Effective Date", each Date.From(#datetime(
Number.FromText(Text.Range(Number.ToText([Customized]),0,2))+2000, // year
Number.FromText(Text.Range(Number.ToText([Customized]),2,2)), // month
Number.FromText(Text.Range(Number.ToText([Customized]),4,2)),0,0,0))), // day, etc.
#"Transformed Type" = Table.TransformColumnTypes(#"Created Date", {{"Effective Date", type date}})
// ...
Hi @neophyte,
does this do what you need?
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TZC5DcNQDEN3cZ1CB3XNYrjMBPH+yA8CmL98EHiI53nc788NdMkc1+uP0T6dD0KhQ9RpSePVZRwPWlsgHnSYznaFTxF7WYUTK3SUmC3QTYsSOteIbLmVbd4PirootWjL2jpPwPhv6cphjVnaoFUEfKvR1mlCKxOMb2t01NbZBlbERCVzTVOaqJa6zZ5L3AxKSCX/9XT7db6+", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
#"Duplicated Column" = Table.AddColumn(#"Changed Type", "Custom", each Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Name]),each if Value.Is(Value.FromText(_), type number) then _ else null))), type text)
in #"Duplicated Column"
Thanks to @v-shex-msft forproviding a solution to extracting the number from text: Solved: PowerQuery | Extract Numbers from A string (eg ABC... - Microsoft Power BI Community
Kind regards,
John
This solution adapts the code from here and here, respectively. The only lines i added was the "TrimesToYears" statement, and from #"Changed Type" onwards.
// ...
// To get only the numerals from a string,
// split string into individual characters,
// keep the characters which are numbers,
// combine the list back into one string
#"Duplicate Column" = Table.AddColumn(#"Extracted Text Between Delimiters", "Effective", each Text.Combine(List.RemoveNulls(List.Transform(Text.ToList([Name]), each if Value.Is(Value.FromText(_), type number) then _ else null))), type text),
// ...
// Helper function which turns txt argument into year argument
// txt: original text
// year: 6-digit number, a substring somewhere in txt
GetDate = (txt as text, year as text) =>
let
CountOccurences = Text.PositionOf(txt, year, Occurrence.All),
CheckSequence = List.Transform(CountOccurences, each try Number.From(Text.Range(txt, _, 6)) otherwise null),
SelectSuccess = List.Select(CheckSequence, each _ <> null), // null-checking
Output = List.First(SelectSuccess)
in Output,
// (function) Extracts date based on matching with specific "regex"
// the so-called "regex": must begin with 19,20,...,25 (last two digits of year)
// AND the third character (first digit of month) must be 0 or 1 (i.e. excludes "2022xx")
// i added the TrimesToYears statement
ExtractDate = (txt as text) =>
let
TrimesToYears = List.Select({190..252}, each List.Contains({0, 1}, Number.Mod(_, 10))),
YearsToCheck = List.Transform(TrimesToYears, Text.From),
CheckPosition = List.Transform(YearsToCheck, each GetDate(txt, _)),
OnlySuccessful = List.Select(CheckPosition, each _ > 0),
Result = List.First(OnlySuccessful, null)
in
Result,
#"Changed Type" = Table.TransformColumnTypes(#"Reordered Columns", {{"Effective", type text}}),
// i use ExtractDate function in this line
#"Customized Column" = Table.AddColumn(#"Changed Type", "Customized", each ExtractDate([Effective])),
// Really ugly way to convert 6-digit string to date
#"Created Date" = Table.AddColumn(#"Customized Column" , "Effective Date", each Date.From(#datetime(
Number.FromText(Text.Range(Number.ToText([Customized]),0,2))+2000, // year
Number.FromText(Text.Range(Number.ToText([Customized]),2,2)), // month
Number.FromText(Text.Range(Number.ToText([Customized]),4,2)),0,0,0))), // day, etc.
#"Transformed Type" = Table.TransformColumnTypes(#"Created Date", {{"Effective Date", type date}})
// ...
Thank you so much for your reply, John!
Unfortunately, my filenames are much too unpredictable for your extraction method. For example, it doesn't work for "Scan220122_145632", which results in "220722145644", whereas I only wanted "220122".
Note to future repliers: the only thing in common between all the filenames is that they have 6-contiguous-digits representing the date (the only thing I can guarantee), and that there may be more (extraneous) numbers.
So again, John, I appreciate your help, but my demands are way too exacting!
Regards
Hi @neophyte,
Can you just truncate the remaining digits in the Custom column? I think this is Extract->Firs characters in the main menu/ribbon.
Cheers,
John
Hi John,
Thank you for your sugggestion! Unfortunately, as I found out, the 6-digit number can be anywhere in the string, before or after other numbers. As such, it doesn't work for filenames like "XXXX XXX XXXXXXXXXXX Increase 4 percent 220110", where it would just reduce to "422011".
I had an idea to filter out only the numbers beginning with 19-22 (representing the year) followed by 4 other numbers (as can be seen above in my updated regex). Could this perhaps be implemented in plain M?
Thanks again,
Hi @neophyte,
This is possible, but the code is a bit more complex. Still reasoably fast even on my old laptop, though:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("TZC5DcNQDEN3cZ1CB3XNYrjMBPH+yA8CmL98EHiI53nc788NdMkc1+uP0T6dD0KhQ9RpSePVZRwPWlsgHnSYznaFTxF7WYUTK3SUmC3QTYsSOteIbLmVbd4PirootWjL2jpPwPhv6cphjVnaoFUEfKvR1mlCKxOMb2t01NbZBlbERCVzTVOaqJa6zZ5L3AxKSCX/9XT7db6+", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Name = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Name", type text}}),
//txt = "test220202_19_22_21",
GetDate = (txt as text, year as text) =>
let
CountOccurences = Text.PositionOf(txt, year, Occurrence.All),
CheckSequence = List.Transform(CountOccurences, each try Number.From(Text.Range(txt, _, 6)) otherwise null),
SelectSuccess = List.Select(CheckSequence, each _ <> null),
Output = List.First(SelectSuccess)
in Output,
ExtractDate = (txt as text) =>
let
YearsToCheck = List.Transform({19..25}, Text.From),
CheckPosition = List.Transform(YearsToCheck, each GetDate(txt, _)),
OnlySuccessful = List.Select(CheckPosition, each _ > 0),
Result = List.First(OnlySuccessful, null)
in
Result,
Output = Table.AddColumn(#"Changed Type", "First Characters", each ExtractDate([Name]))
in Output
See if this does what you want.
Kind regards,
John
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.