cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
ngct1112
Post Patron
Post Patron

Regex for subtracting texts by Power Query/ Dax

Hi, I would like to extract some data from a column which the input of the data are not consistant.

May I know is there any way I could clean the data by PowerQuery or DAX. Thanks

ColumnResult
1kg1kg
abc10kg10kg
10kg10kg
18kg hhh418kg
Abc:10kg10kg
5aaab3kg5kg
abc - 15kg15kg
3kg3kg
9kg9kg
VV - fg 8kg8kg
kk 5kg 00g5kg
001….abc..5kg…5kg
2 ACCEPTED SOLUTIONS

Hi @ngct1112 ,

please paste this code into the advanced editor and follow the steps:

 

let
    Source = Table.FromRows(
        Json.Document(
            Binary.Decompress(
                Binary.FromText(
                    "i45Wcs7PKc3NU9JRCkotLs0pUYrViVYyzE4HCoBIEC8xKdnQACJiABXC4FtkpytkZGSYgAQtoIKOSclW6ApNE7PTExOTjMGipggLFHQVDE0hSmGiEDXGUJ4lmGcJ5YWFATWkpStYgEVhFmZnKwA1KxgYIBtuYGD4qGGZHtAOPT2gGJANk4wFAA==", 
                    BinaryEncoding.Base64
                ), 
                Compression.Deflate
            )
        ), 
        let
            _t = ((type nullable text) meta [Serialized.Text = true])
        in
            type table[Column1 = _t, Column2 = _t]
    ),
    #"Duplicated Column" = Table.DuplicateColumn(Source, "Column1", "Column1 - Copy"),
    #"Changed Type" = Table.TransformColumnTypes(
        #"Duplicated Column", 
        {{"Column1", type text}, {"Column2", type text}}
    ),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars = true]),
    #"Added Index" = Table.AddIndexColumn(#"Promoted Headers", "Index", 0, 1),
    #"Changed Type1" = Table.TransformColumnTypes(
        #"Added Index", 
        {{"Column", type text}, {"Result", type text}}
    ),
    #"Split Column by Character Transition" = Table.SplitColumn(
        #"Changed Type1", 
        "Column", 
        Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), 
        {"Column.1", "Column.2", "Column.3"}
    ),
    GetSplittedValues = Table.AddColumn(
        #"Split Column by Character Transition", 
        "SplittedValues", 
        each Record.FieldValues(
            Record.SelectFields(
                _, 
                List.Difference(Record.FieldNames(_), Table.ColumnNames(#"Changed Type1"))
            )
        )
    ),
    #"Expanded SplittedValues" = Table.ExpandListColumn(GetSplittedValues, "SplittedValues"),
    #"Extracted Text Before Delimiter" = Table.TransformColumns(
        #"Expanded SplittedValues", 
        {{"SplittedValues", each Text.BeforeDelimiter(_, " "), type text}}
    ),
    FilterOnlyRowsWithNumbers = Table.SelectRows(
        #"Extracted Text Before Delimiter", 
        each (List.Contains({"1".."9"}, Text.Start([SplittedValues], 1)))
    ),
    FilterOnlyRowsWithKg = Table.SelectRows(
        FilterOnlyRowsWithNumbers, 
        each Text.Contains([SplittedValues], "kg", Comparer.OrdinalIgnoreCase)
    ),
    #"Added Custom1" = Table.AddColumn(
        FilterOnlyRowsWithKg, 
        "Custom", 
        each try Number.From(Text.BeforeDelimiter([SplittedValues], "kg")) otherwise null
    ),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom] <> null)),
    #"Added Suffix" = Table.TransformColumns(
        #"Filtered Rows", 
        {{"Custom", each Text.From(_, "en-GB") & "kg", type text}}
    ),
    #"Removed Other Columns" = Table.SelectColumns(
        #"Added Suffix", 
        {"Column_1", "Result", "Custom"}
    )
in
    #"Removed Other Columns"

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

CNENFRNL
Super User III
Super User III

the real power of PQ + RegEx

let
    RE = (regex as text, str as text) =>
    let
        html =
            "<script>var regex = " & regex & "; var str = """ & str & """; var res = str.match(regex); document.write(res)</script>",
        res = Web.Page(html)[Data]{0}[Children]{0}[Children]{1}[Text]{0}
    in res,

    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMsxOV4rViVZKTEo2NICyDQ28oQyL7HSFjIwMEzDPMSnZCq7GNDE7PTExyRihXUFXwdAUyjX2dgfTltkQOiwMKJuWrmABlc/OVgAqVTAwgHANDAwfNSzTAxqipwcUB7KVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Custom", each RE("/\d+kg/gi", [Column]))
in
    #"Added Custom"

Screenshot 2021-07-04 221030.png

View solution in original post

8 REPLIES 8
CNENFRNL
Super User III
Super User III

the real power of PQ + RegEx

let
    RE = (regex as text, str as text) =>
    let
        html =
            "<script>var regex = " & regex & "; var str = """ & str & """; var res = str.match(regex); document.write(res)</script>",
        res = Web.Page(html)[Data]{0}[Children]{0}[Children]{1}[Text]{0}
    in res,

    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMsxOV4rViVZKTEo2NICyDQ28oQyL7HSFjIwMEzDPMSnZCq7GNDE7PTExyRihXUFXwdAUyjX2dgfTltkQOiwMKJuWrmABlc/OVgAqVTAwgHANDAwfNSzTAxqipwcUB7KVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column = _t]),
    #"Added Custom" = Table.AddColumn(Source, "Custom", each RE("/\d+kg/gi", [Column]))
in
    #"Added Custom"

Screenshot 2021-07-04 221030.png

View solution in original post

Hi @CNENFRNL ,

this is truly awesomesauce !!

Now we just have to wait and hope that this can also be refreshed in the service some day.
Looks like it is also prohibited through a gateway, unfortunately.

Cheers, Imke

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Thank you for your kind reminder. Indeed, I must take such restrictions into consideration as I'm now addicted to such a tricky way to wield regex in PQ instead of embedding R or Python scripts.😂

BTW, I own you a big thank as I'm enlighted by many of your blogs on PQ!👍 

@CNENFRNL This method looks so brilliant.

May I ask how it works? using parameter to create regrex?

It's a bit tricky. I used Web.Page to parse html containing js script; as you can see, embedded regex object can be used in such js snippets. It works in your scenairo but it's not a cure-all. Pls refer to restrictions in ImkeF's reply.

amitchandak
Super User IV
Super User IV

@ImkeF , can help on this ?



Did I answer your question? Mark my post as a solution! Appreciate your Kudos!!
Dashboard of My Blogs !! YouTube Channel !! Connect on Linkedin

Proud to be a Super User!

Hi @ngct1112 ,

please paste this code into the advanced editor and follow the steps:

 

let
    Source = Table.FromRows(
        Json.Document(
            Binary.Decompress(
                Binary.FromText(
                    "i45Wcs7PKc3NU9JRCkotLs0pUYrViVYyzE4HCoBIEC8xKdnQACJiABXC4FtkpytkZGSYgAQtoIKOSclW6ApNE7PTExOTjMGipggLFHQVDE0hSmGiEDXGUJ4lmGcJ5YWFATWkpStYgEVhFmZnKwA1KxgYIBtuYGD4qGGZHtAOPT2gGJANk4wFAA==", 
                    BinaryEncoding.Base64
                ), 
                Compression.Deflate
            )
        ), 
        let
            _t = ((type nullable text) meta [Serialized.Text = true])
        in
            type table[Column1 = _t, Column2 = _t]
    ),
    #"Duplicated Column" = Table.DuplicateColumn(Source, "Column1", "Column1 - Copy"),
    #"Changed Type" = Table.TransformColumnTypes(
        #"Duplicated Column", 
        {{"Column1", type text}, {"Column2", type text}}
    ),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars = true]),
    #"Added Index" = Table.AddIndexColumn(#"Promoted Headers", "Index", 0, 1),
    #"Changed Type1" = Table.TransformColumnTypes(
        #"Added Index", 
        {{"Column", type text}, {"Result", type text}}
    ),
    #"Split Column by Character Transition" = Table.SplitColumn(
        #"Changed Type1", 
        "Column", 
        Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}), 
        {"Column.1", "Column.2", "Column.3"}
    ),
    GetSplittedValues = Table.AddColumn(
        #"Split Column by Character Transition", 
        "SplittedValues", 
        each Record.FieldValues(
            Record.SelectFields(
                _, 
                List.Difference(Record.FieldNames(_), Table.ColumnNames(#"Changed Type1"))
            )
        )
    ),
    #"Expanded SplittedValues" = Table.ExpandListColumn(GetSplittedValues, "SplittedValues"),
    #"Extracted Text Before Delimiter" = Table.TransformColumns(
        #"Expanded SplittedValues", 
        {{"SplittedValues", each Text.BeforeDelimiter(_, " "), type text}}
    ),
    FilterOnlyRowsWithNumbers = Table.SelectRows(
        #"Extracted Text Before Delimiter", 
        each (List.Contains({"1".."9"}, Text.Start([SplittedValues], 1)))
    ),
    FilterOnlyRowsWithKg = Table.SelectRows(
        FilterOnlyRowsWithNumbers, 
        each Text.Contains([SplittedValues], "kg", Comparer.OrdinalIgnoreCase)
    ),
    #"Added Custom1" = Table.AddColumn(
        FilterOnlyRowsWithKg, 
        "Custom", 
        each try Number.From(Text.BeforeDelimiter([SplittedValues], "kg")) otherwise null
    ),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom] <> null)),
    #"Added Suffix" = Table.TransformColumns(
        #"Filtered Rows", 
        {{"Custom", each Text.From(_, "en-GB") & "kg", type text}}
    ),
    #"Removed Other Columns" = Table.SelectColumns(
        #"Added Suffix", 
        {"Column_1", "Result", "Custom"}
    )
in
    #"Removed Other Columns"

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

@ImkeF the way you doing this is incredible. Appreciated with your help.

Helpful resources

Announcements
PBI_User Group Leader_768x460.jpg

Manage your user group events

Check out the News & Announcements to learn more.

MBAS on Demand

2021 Release Wave 2 Plan

Power Platform release plan for the 2021 release wave 2 describes all new features releasing from October 2021 through March 2022.

Get Ready for Power BI Dev Camp

Microsoft named a Leader in The Forrester Wave

Microsoft received the highest score of any vendor in both the strategy and current offering categories.

R2 (Green) 768 x 460px.png

Microsoft Dynamics 365 & Power Platform User Professionals

DynamicsCon is a FREE, 4 half-day virtual learning experience for 11,000+ Microsoft Business Application users and professionals.