Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi,
I have a table (report table) that contains a column with a large text including multiple substrings that I would like to extract and create a new row in a different table (extract table).
My substring always have the same length and is based on a sequence so its easy to find.
To illustrate values in my report table:
Report column (with report id 1)
"The grey fox crossed the road at midnight. It was dark and cold, Y19-421, Y21-002 and Y22-317.
Next the ducks crossed the road."
The string is very long (+2000 chars) and will contain different types of chars (letters, special chars like =, > and ;). Example of a substring: Y22-123
I would like to extract these substrings into a table:
Report ID | Substring |
1 | Substring1 |
1 | Substring2 |
Do you have any good suggestion of how to do that?
Solved! Go to Solution.
Hi @AndreasAh ,
My Sample:
Here I suggest you to add a special delimiter that not contained in your orginal table like “+” before "Y".
Then Split Column by Delimiter"+" by rows.
Then add a custom column to check whether your row contains "-"
Filter "True" in Custom column> Replace " "(space) by nothing. Spliter by position 7 by rows.
Finally remove some columns you don't need. Result is as below.
Whole M code:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZY6xCsIwGIRf5cjchP5/FHHUzcXJpZQOoYltqW0giVjf3lCLi9txx313dS1IFOLWO3TBvXH3C9rgY3QWKZvBGwuTMA12Hro+KVwSXibCmjDCzBatf9gCFR3ljikLJlmWvEYVs9R0ULi6Ja04+2zH+DegRFPUgvOPE4b5W2OCDxuBFc6/QOuNrSXpfa42Hw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Report ID" = _t, Substring = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Report ID", Int64.Type}, {"Substring", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type","Y","+Y",Replacer.ReplaceText,{"Substring"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Substring", Splitter.SplitTextByDelimiter("+", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Substring"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Substring", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Custom", each Text.Contains([Substring],"-")),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = true)),
#"Replaced Value1" = Table.ReplaceValue(#"Filtered Rows"," ","",Replacer.ReplaceText,{"Substring"}),
#"Split Column by Position" = Table.SplitColumn(#"Replaced Value1", "Substring", Splitter.SplitTextByRepeatedLengths(7), {"Substring.1", "Substring.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Position",{{"Substring.1", type text}, {"Substring.2", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Substring.2", "Custom"})
in
#"Removed Columns"
Best Regards,
Rico Zhou
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi @AndreasAh ,
If your substring in same ID is separated by commas, here I suggest you to try Split Column by Delimiter"," by rows.
For reference: Split columns by delimiter
My Sample:
Result is as below.
Best Regards,
Rico Zhou
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi Rico Zhou,
I realiezed that my post was not clear enough so I've updated it. Will your solution work for the following example:
"The grey fox crossed the road at midnight. It was dark and cold, Y19-421, Y21-002 and Y22-317.
Next the ducks crossed the road."
To my understanding I will get the following table:
The grey fox crossed the road at midnight. It was dark and cold
Y19-421
Y21-002 and Y22-317. Next the ducks crossed the road."
Best Regards
Andreas
Hi @AndreasAh ,
My Sample:
Here I suggest you to add a special delimiter that not contained in your orginal table like “+” before "Y".
Then Split Column by Delimiter"+" by rows.
Then add a custom column to check whether your row contains "-"
Filter "True" in Custom column> Replace " "(space) by nothing. Spliter by position 7 by rows.
Finally remove some columns you don't need. Result is as below.
Whole M code:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZY6xCsIwGIRf5cjchP5/FHHUzcXJpZQOoYltqW0giVjf3lCLi9txx313dS1IFOLWO3TBvXH3C9rgY3QWKZvBGwuTMA12Hro+KVwSXibCmjDCzBatf9gCFR3ljikLJlmWvEYVs9R0ULi6Ja04+2zH+DegRFPUgvOPE4b5W2OCDxuBFc6/QOuNrSXpfa42Hw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Report ID" = _t, Substring = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Report ID", Int64.Type}, {"Substring", type text}}),
#"Replaced Value" = Table.ReplaceValue(#"Changed Type","Y","+Y",Replacer.ReplaceText,{"Substring"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Substring", Splitter.SplitTextByDelimiter("+", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Substring"),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Substring", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "Custom", each Text.Contains([Substring],"-")),
#"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = true)),
#"Replaced Value1" = Table.ReplaceValue(#"Filtered Rows"," ","",Replacer.ReplaceText,{"Substring"}),
#"Split Column by Position" = Table.SplitColumn(#"Replaced Value1", "Substring", Splitter.SplitTextByRepeatedLengths(7), {"Substring.1", "Substring.2"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Position",{{"Substring.1", type text}, {"Substring.2", type text}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Substring.2", "Custom"})
in
#"Removed Columns"
Best Regards,
Rico Zhou
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Your solution is good for the majority of cases therefore I accepted your solution as an answer.
For other readers intrest I will also post the solution that I ended up using:
I didn't solve anything in in PowerBI, instead I used an SQL query when I loaded my data:
1. To prepare my data for pattern matching I removed (replaced) all "," " " 'tab' and newline with ",". I did this by extracting my data off intrest into a temporary table. Then I used Split_String() with delemiter "," to get a table where each "word" was in its own row.
2. To dynamicaly extract the information I need, I used LIKE funktion.
This gave me a table with only Y19-421 in powerbi.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
15 | |
3 | |
1 | |
1 | |
1 |
User | Count |
---|---|
26 | |
3 | |
2 | |
2 | |
2 |