Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

smpa01

Using JavaScript in power query for regex- Part2

TLDR: Using Power Query’s browser, a substring can be extracted now based on a pattern using regex in Web.Script.

 

In my previous post, I talked about how to use javascript inside Power Query. In this post, I want to talk about one of the practical examples for doing so. Regex which stands for Regular Expression is a very powerful string-searching algorithm technique used in the browser for substring extraction based on a pattern.

Since js can run inside pqwry, it means that regex extraction is natively possible without needing any other external package at all.

 

A regex extraction can be done as following where it is extracts out all the substring with digits.

 

 

 

 

let
   regex=let   fx=(input)=>
    Web.Page(
        "<script>
            var x='"&input&"'; // this is the input string for regex
            var b=x.match(/\d{3}/gm); // specify the desired regular expression inside string.match()
                                    //https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match 
            document.write(b);
        </script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text]

in
fx,
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("HYqxCoAwEEN/JXTuoq06+BtupcNpDy1oB6+Kn+8phCS8JAQzbVmgIhz54QSpZy4rpGmd70aIej+YaIMhRT8jJZ3XLOnr3rUWs473kvi/g05G3Rg37ReLifEF", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each regex([Column1]))
in
    #"Added Custom" 

 

 

 

 

 

smpa01_0-1634141754734.png

 

 

How to test regex and performance ?  Go to  https://regex101.com/ and select ECMAScript (JavaScript) as FLAVOR

 

Regex resources - https://www.regular-expressions.info/tutorial.html

                             https://docs.microsoft.com/en-us/dotnet/standard/base-types/regular-expression-language-quick-refere...

                            https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions

 

 

 

 Some of the frequently used patterns are listed below

 

String Regex Alternative Regex Explanation Returns what? Workaround
This is a mixed string s12345; s345678 /\d/gm /[0-9]/gm Matches all individual digits 1,2,3,4,5,3,4,5,6,7,8  
This is a mixed string s12345; s345678 /\d+/gm /[0-9]+/gm Matches digits 12345,345678  
This is a mixed string s12345; s345678 /\d+/ /[0-9]+/ Matches digit and returns only first match 12345  
This is a mixed string s12345; s345678 /\d{3}/gm /[0-9]{3}/gm Matches 3 consecutive digits 123,345,678  
This is a test string 123 345 /[a-zA-Z0-9]/gm   Matches a single character in the range between a-z A-Z 0-9 T,h,i,s,i,s,a,t,e,s,t,s,t,r,i,n,g,1,2,3,3,4,5  
This is a test string 123 345 /[a-zA-Z]+/gm   Matches a single character in the range between a z unlimited times This,is,a,test,string  
This returns all word 123 345; except non-alphanumeric characters /\b\w+\b/gm   Matches any alphanumeric characters equivalent to [a-zA-Z0-9_] This,returns,all,word,123,345,except,non,alphanumeric,characters  
This extracts any word that are only 4 characters long a12345 1123 a112 /\b\d{4}\b/gm /\b[^a-zA-Z\W]{4}\b/gm Matches only 4 digit numbers 1123  
This extracts any word that are only 4 characters long a12345 1123 a112 /\b\w{4}\b/gm /\b[^\W]{4}\b/gm Matches word only 4 characters in length This,word,that,only,long,1123,a112  
This extracts any word that are only 4 characters long a12345 1123 a112 /\b[^\d\W]{4}\b/gm   Matches only 4 character words that are not digit This,word,that,only,long  
Journal entry for tran12345, tran3456, tran, 123456 /tran[0-9]+/gm /tran\d+/gm Matches string that starts with "tran" and unlimited digit tran12345,tran3456  
Power BI consists of power query and DAX /\bPower\b/gm   Matches the word - "Power", case sensitive Power  
Power BI consists of power query and DAX /\bPower\b/gmi   Matches the word - "Power", case insensitive Power,power  
Power BI consists of power query and DAX /\bpower\b/gmi   Matches the word - "Power", case insensitive Power,power  
This is a mixed string s12345; s34567 1234 four 3587 /\b[^a-zA-Z\W]{4}\b/gm   Matches only 4 digit numbers 1234,3587  
It costs 50€ in Germany, 72$ in Canada, 370¥ in China and 4300₹ in India, 111$€₹ /\d+(?=\$|€|₹)/gm   Poitive Lookahead - A(?=B)Match A only when A is followed by B
Poitive Lookahead - A(?=B|C)Match A only when A is followed by B or C
Poitive Lookahead - A(?=BC)Match A only when A is followed by B and C
- in the example, returns all digits only when followed by $ or € or ₹
- $ symbol is escaped in regex with \$
50,72,4300,111  
It costs 50€ in Germany, 72$ in Canada, 370¥ in China and 4300₹ in India, 111$€₹ /\d+(?=\$€₹)/gm   Poitive Lookahead - A(?=B)Match A only when A is followed by B
Poitive Lookahead - A(?=B|C)Match A only when A is followed by B or C
Poitive Lookahead - A(?=BC)Match A only when A is followed by B and C
 -in the example, returns all digits only when followed by $ and € and ₹
111  
The price 30 lt of milk is 110$ /\d+\b(?!\$)/gm   Negative Looahead- A(?!B)Match A only when A is not followed by B
- in the example, returns all digits when not followed by $
30  
1 steak costs $30; 2 steaks cost $55 /(?<=\$)\d+/gm   Positive lookbehind: (?<=B)A Match A when preceded by B
- does not work in PQ probably because PQ's browser not compatible with lookbehind
30,55 var a = '1 steak costs $30; 2 steaks cost $55';
var b = a.match(/\$\d+/gm);
var c = b.toString().replace(/\$/gm,'')

PQ Code attached [Code1]
The cost of 3 repairing and 4maintenance jobs are respectively $700 and $500 test123 456test789 /(?<!\$)\b\d+/gm   Negative Lookbehind: (?<!B)A Match A not preceded by B
- does not work in PQ PQ probably because PQ's browser not compatible with lookbehind
3,4,456 var a = 'The cost of 3 repairing and 4maintenance jobs are repectively $700 and $500 test123 456test789';
var b = a.match(/((?!\$)^)\d+|(?:[^\$])\b\d+/gm);
var c = b.toString().replace(/ /gm,'')

PQ Code attached [Code2]
power_bi /(power)/gm   Capturing Group; group syntax (..) power  
power_bi /(?:power)/gm   Non-capturing Group; group syntax (..)
Capturing and non-capturing groups return the same , but non-capturing groups are utilized in Lookarounds
power  

 

 

 

//code1
let   fx=(input)=>
    Web.Page(
        "<script>
            var x='"&input&"';
            var b = x.match(/\$\d+/gm);
            var c = b.toString().replace(/\$/gm,'');
            document.write(c);
        </script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text],
        result = fx("1 steak costs $30; 2 steaks cost $55")

in
result
//code2
let   fx=(input)=>
    Web.Page(
        "<script>
            var x='"&input&"';
            var b = x.match(/((?!\$)^)\d+|(?:[^\$])\b\d+/gm);
            var c = b.toString().replace(/ /gm,'');
            document.write(c);
        </script>"){0}[Data]{0}[Children]{1}[Children]{0}[Text],
        result = fx("The cost of 3 repairing and 4maintenance jobs are respectively $700 and $500 test123 456test789")

in
result

 

 

 

 

Comments