Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

V-lianl-msft

Using regular expressions in power bi desktop

Scenario:
In Power BI, when we want to process complex and unordered data, adding many steps using power query may not get the desired results. As we know, power query cannot use regular expressions. If we call Python script and use regular expressions to process the data this time, it will become simple.


Table Used:

V-lianl-msft_0-1616657075629.png

As shown in the figure above, if the data has been imported into Power BI, we can't process the data effectively by using power query. At this time, we can call the regular expression of Python's re module.


Precondition:
To run Python scripts in Power BI desktop, you need to install Python and two Python packages (pandas and matplotlib) on your local computer. Please refer to this document for more details: Run Python scripts in Power BI Desktop.

 

Operation:
After importing the data into power bi desktop, we can run the python script in IDE and copy it to the python script editor in the pbi query editor after confirmation.

import re
import json

# Custom function to get e-mail address
def get_find_emails(text):
emails = re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", text)
emails=';'.join(emails)
return emails

# Custom function to get phone number
def get_find_mobiles(text):
mobiles = re.findall(r" [1-9][0-9]{4,} ", text)
mobiles =';'.join(mobiles)
return mobiles

email_list=[]
tele_list=[]
for i in range(len(dataset)):
text=dataset.iat[i,1]
email=get_find_emails(text)
email_list.append(email)
tele=get_find_mobiles(text)
tele_list.append(tele)

dataset['email']=email_list
dataset['tele']=tele_list

This code defines two functions: get_ find_ Email(Custom function to get phone number) and get_ find_ Mobile(Custom function to get phone number), which will get two lists and eventually place them into the data table.
By the way, the use of regular expressions, you can search for the related resources online to learn about it.
Click OK to return the result:

V-lianl-msft_1-1616657101752.png


After expanding the table, the next two columns are the phone numbers and e-mail addresses we want to get.

V-lianl-msft_2-1616657114209.png


Please check the attached files for details.

 

Author: Liang Lu
Reviewer: Ula & Kerry

Comments

Very cool feature, that Python is supported by Power BI. I think your code can still be simplified by using Pandas only:

 

 

import re
import pandas as pd

def extract_from_column(column: pd.Series, regex_string: str):
    return column.str.extract(regex_string, flags=re.I)

dataset['email']=extract_from_column(dataset['contact information'],
                                     r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+")
dataset['tele']=extract_from_column(dataset['contact information'],
                                    r" [1-9][0-9]{4,}")

 

 

I propose an other solution, which 1/ doesn't require to install Python or R (it uses javascript capabilities), and 2/ parses the result into a record.

 

// regular expression parser function
regexp.parse = (regexp as text, str as text, optional model as nullable record) as record => let
    html =  "<script>var regexp = " & regexp & "; var str = """ & Text.Replace(str, """", "\""") & """; var res = regexp.exec(str); document.write(res.join(String.fromCharCode(30)))</script>",
    res = Web.Page(html)[Data]{0}[Children]{0}[Children]{1}[Text]{0},
    match_list = try List.Transform(Text.Split(res, Character.FromNumber(30)), each if _="" then null else _) otherwise null,
    parse_to_record = 
        if match_list=null then
            error(Error.Record("Regexp.Exec error", "The regexp pattern doesn't match the string", "pattern=" & regexp & ", string=""" & str & """"))
        else if model=null then
            List.Accumulate(List.Positions(match_list), [], (rec, i) => Record.AddField(rec, "field" & Text.From(i), match_list{i}))
        else
            Record.FromList(List.Transform(Record.FieldValues(model), each match_list{_}), Record.FieldNames(model))
    in
        parse_to_record,

 

 

How to test the function ?

let's try to extract a date (dd/mm/yyyy) and a weigth (99 kg) from a given string  

// testing the regexp parser without an output model
   test1 = regexp.parse(
        "/(\d{2}\/\d{2}\/\d{4}).*[^\d](\d+\s*kg)/gi",
        "It's 26/03/2023 and this text contains a 10kg weigth to extract"
    ),
   result1 = [
        field0="It's 26/03/2023 and this text contains a 10kg weigth to extract",
        field1=”26/03/2023”,
        field2=”10 kg”
    ],

// testing the parser with an output model   
   test2 = regexp.parse(
        "/(\d{2}\/\d{2}\/\d{4}).*[^\d](\d+\s*kg)/gi",
        "It's 26/03/2023 and this text contains a 10kg weigth to extract",
        [date=1, weigth=2]
    ),
    result2 = [
       date=”26/03/2023”,
       weigth=”10 kg”
    ],

 

 

@gregnogier , this is a great solution. No need to install extra software, excellent solution.
The only downside is it doesn't work in direct query mode, only in import mode.