Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi, I would like to extract some data from a column which the input of the data are not consistant.
May I know is there any way I could clean the data by PowerQuery or DAX. Thanks
Column | Result |
1kg | 1kg |
abc10kg | 10kg |
10kg | 10kg |
18kg hhh4 | 18kg |
Abc:10kg | 10kg |
5aaab3kg | 5kg |
abc - 15kg | 15kg |
3kg | 3kg |
9kg | 9kg |
VV - fg 8kg | 8kg |
kk 5kg 00g | 5kg |
001….abc..5kg… | 5kg |
Solved! Go to Solution.
Hi @ngct1112 ,
please paste this code into the advanced editor and follow the steps:
let
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText(
"i45Wcs7PKc3NU9JRCkotLs0pUYrViVYyzE4HCoBIEC8xKdnQACJiABXC4FtkpytkZGSYgAQtoIKOSclW6ApNE7PTExOTjMGipggLFHQVDE0hSmGiEDXGUJ4lmGcJ5YWFATWkpStYgEVhFmZnKwA1KxgYIBtuYGD4qGGZHtAOPT2gGJANk4wFAA==",
BinaryEncoding.Base64
),
Compression.Deflate
)
),
let
_t = ((type nullable text) meta [Serialized.Text = true])
in
type table[Column1 = _t, Column2 = _t]
),
#"Duplicated Column" = Table.DuplicateColumn(Source, "Column1", "Column1 - Copy"),
#"Changed Type" = Table.TransformColumnTypes(
#"Duplicated Column",
{{"Column1", type text}, {"Column2", type text}}
),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars = true]),
#"Added Index" = Table.AddIndexColumn(#"Promoted Headers", "Index", 0, 1),
#"Changed Type1" = Table.TransformColumnTypes(
#"Added Index",
{{"Column", type text}, {"Result", type text}}
),
#"Split Column by Character Transition" = Table.SplitColumn(
#"Changed Type1",
"Column",
Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}),
{"Column.1", "Column.2", "Column.3"}
),
GetSplittedValues = Table.AddColumn(
#"Split Column by Character Transition",
"SplittedValues",
each Record.FieldValues(
Record.SelectFields(
_,
List.Difference(Record.FieldNames(_), Table.ColumnNames(#"Changed Type1"))
)
)
),
#"Expanded SplittedValues" = Table.ExpandListColumn(GetSplittedValues, "SplittedValues"),
#"Extracted Text Before Delimiter" = Table.TransformColumns(
#"Expanded SplittedValues",
{{"SplittedValues", each Text.BeforeDelimiter(_, " "), type text}}
),
FilterOnlyRowsWithNumbers = Table.SelectRows(
#"Extracted Text Before Delimiter",
each (List.Contains({"1".."9"}, Text.Start([SplittedValues], 1)))
),
FilterOnlyRowsWithKg = Table.SelectRows(
FilterOnlyRowsWithNumbers,
each Text.Contains([SplittedValues], "kg", Comparer.OrdinalIgnoreCase)
),
#"Added Custom1" = Table.AddColumn(
FilterOnlyRowsWithKg,
"Custom",
each try Number.From(Text.BeforeDelimiter([SplittedValues], "kg")) otherwise null
),
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom] <> null)),
#"Added Suffix" = Table.TransformColumns(
#"Filtered Rows",
{{"Custom", each Text.From(_, "en-GB") & "kg", type text}}
),
#"Removed Other Columns" = Table.SelectColumns(
#"Added Suffix",
{"Column_1", "Result", "Custom"}
)
in
#"Removed Other Columns"
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
the real power of PQ + RegEx
let
RE = (regex as text, str as text) =>
let
html =
"<script>var regex = " & regex & "; var str = """ & str & """; var res = str.match(regex); document.write(res)</script>",
res = Web.Page(html)[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in res,
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMsxOV4rViVZKTEo2NICyDQ28oQyL7HSFjIwMEzDPMSnZCq7GNDE7PTExyRihXUFXwdAUyjX2dgfTltkQOiwMKJuWrmABlc/OVgAqVTAwgHANDAwfNSzTAxqipwcUB7KVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column = _t]),
#"Added Custom" = Table.AddColumn(Source, "Custom", each RE("/\d+kg/gi", [Column]))
in
#"Added Custom"
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
the real power of PQ + RegEx
let
RE = (regex as text, str as text) =>
let
html =
"<script>var regex = " & regex & "; var str = """ & str & """; var res = str.match(regex); document.write(res)</script>",
res = Web.Page(html)[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in res,
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMsxOV4rViVZKTEo2NICyDQ28oQyL7HSFjIwMEzDPMSnZCq7GNDE7PTExyRihXUFXwdAUyjX2dgfTltkQOiwMKJuWrmABlc/OVgAqVTAwgHANDAwfNSzTAxqipwcUB7KVYmMB", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column = _t]),
#"Added Custom" = Table.AddColumn(Source, "Custom", each RE("/\d+kg/gi", [Column]))
in
#"Added Custom"
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
I'm trying to grasp what you've done here. But failing.
I exported the code, edited the source table, and changed the regex formula to match my requirment. but getting an error:
= Table.AddColumn(#"Renamed Columns", "Custom", each RE("/\w+", [Idealink]))
I get the following error
Expression.Error: There weren't enough elements in the enumeration to complete the operation.
Hi @CNENFRNL ,
this is truly awesomesauce !!
Now we just have to wait and hope that this can also be refreshed in the service some day.
Looks like it is also prohibited through a gateway, unfortunately.
Cheers, Imke
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Thank you for your kind reminder. Indeed, I must take such restrictions into consideration as I'm now addicted to such a tricky way to wield regex in PQ instead of embedding R or Python scripts.😂
BTW, I own you a big thank as I'm enlighted by many of your blogs on PQ!👍
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
@CNENFRNL This method looks so brilliant.
May I ask how it works? using parameter to create regrex?
It's a bit tricky. I used Web.Page to parse html containing js script; as you can see, embedded regex object can be used in such js snippets. It works in your scenairo but it's not a cure-all. Pls refer to restrictions in ImkeF's reply.
Thanks to the great efforts by MS engineers to simplify syntax of DAX! Most beginners are SUCCESSFULLY MISLED to think that they could easily master DAX; but it turns out that the intricacy of the most frequently used RANKX() is still way beyond their comprehension! |
DAX is simple, but NOT EASY! |
@ImkeF , can help on this ?
Hi @ngct1112 ,
please paste this code into the advanced editor and follow the steps:
let
Source = Table.FromRows(
Json.Document(
Binary.Decompress(
Binary.FromText(
"i45Wcs7PKc3NU9JRCkotLs0pUYrViVYyzE4HCoBIEC8xKdnQACJiABXC4FtkpytkZGSYgAQtoIKOSclW6ApNE7PTExOTjMGipggLFHQVDE0hSmGiEDXGUJ4lmGcJ5YWFATWkpStYgEVhFmZnKwA1KxgYIBtuYGD4qGGZHtAOPT2gGJANk4wFAA==",
BinaryEncoding.Base64
),
Compression.Deflate
)
),
let
_t = ((type nullable text) meta [Serialized.Text = true])
in
type table[Column1 = _t, Column2 = _t]
),
#"Duplicated Column" = Table.DuplicateColumn(Source, "Column1", "Column1 - Copy"),
#"Changed Type" = Table.TransformColumnTypes(
#"Duplicated Column",
{{"Column1", type text}, {"Column2", type text}}
),
#"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars = true]),
#"Added Index" = Table.AddIndexColumn(#"Promoted Headers", "Index", 0, 1),
#"Changed Type1" = Table.TransformColumnTypes(
#"Added Index",
{{"Column", type text}, {"Result", type text}}
),
#"Split Column by Character Transition" = Table.SplitColumn(
#"Changed Type1",
"Column",
Splitter.SplitTextByCharacterTransition((c) => not List.Contains({"0".."9"}, c), {"0".."9"}),
{"Column.1", "Column.2", "Column.3"}
),
GetSplittedValues = Table.AddColumn(
#"Split Column by Character Transition",
"SplittedValues",
each Record.FieldValues(
Record.SelectFields(
_,
List.Difference(Record.FieldNames(_), Table.ColumnNames(#"Changed Type1"))
)
)
),
#"Expanded SplittedValues" = Table.ExpandListColumn(GetSplittedValues, "SplittedValues"),
#"Extracted Text Before Delimiter" = Table.TransformColumns(
#"Expanded SplittedValues",
{{"SplittedValues", each Text.BeforeDelimiter(_, " "), type text}}
),
FilterOnlyRowsWithNumbers = Table.SelectRows(
#"Extracted Text Before Delimiter",
each (List.Contains({"1".."9"}, Text.Start([SplittedValues], 1)))
),
FilterOnlyRowsWithKg = Table.SelectRows(
FilterOnlyRowsWithNumbers,
each Text.Contains([SplittedValues], "kg", Comparer.OrdinalIgnoreCase)
),
#"Added Custom1" = Table.AddColumn(
FilterOnlyRowsWithKg,
"Custom",
each try Number.From(Text.BeforeDelimiter([SplittedValues], "kg")) otherwise null
),
#"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom] <> null)),
#"Added Suffix" = Table.TransformColumns(
#"Filtered Rows",
{{"Custom", each Text.From(_, "en-GB") & "kg", type text}}
),
#"Removed Other Columns" = Table.SelectColumns(
#"Added Suffix",
{"Column_1", "Result", "Custom"}
)
in
#"Removed Other Columns"
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
User | Count |
---|---|
141 | |
113 | |
104 | |
77 | |
64 |
User | Count |
---|---|
135 | |
123 | |
101 | |
71 | |
61 |