Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
AndreasAh
Frequent Visitor

Get multiple substrings and create a new row

Hi,

 

I have a table (report table) that contains a column with a large text including multiple substrings that I would like to extract and create a new row in a different table (extract table).

My substring always have the same length and is based on a sequence so its easy to find.

 

To illustrate values in my report table:

Report column (with report id 1)

"The grey fox crossed the road at midnight. It was dark and cold, Y19-421, Y21-002 and Y22-317.

Next the ducks crossed the road."

The string is very long (+2000 chars) and will contain different types of chars (letters, special chars like =, > and ;). Example of a substring: Y22-123

 

I would like to extract these substrings into a table:

 

Report IDSubstring
1Substring1
1Substring2

 

Do you have any good suggestion of how to do that?

1 ACCEPTED SOLUTION

Hi @AndreasAh ,

 

My Sample:

RicoZhou_5-1665654025584.png

Here I suggest you to add a special delimiter that not contained in your orginal table like “+” before "Y".

RicoZhou_0-1665653781458.png

Then Split Column by Delimiter"+" by rows.

RicoZhou_2-1665653877951.png

Then add a custom column to check whether your row contains "-"

RicoZhou_1-1665653863482.png

Filter "True" in Custom column> Replace " "(space) by nothing. Spliter by position 7 by rows.

RicoZhou_3-1665653966760.png

Finally remove some columns you don't need. Result is as below.

RicoZhou_4-1665654010800.png

Whole M code:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZY6xCsIwGIRf5cjchP5/FHHUzcXJpZQOoYltqW0giVjf3lCLi9txx313dS1IFOLWO3TBvXH3C9rgY3QWKZvBGwuTMA12Hro+KVwSXibCmjDCzBatf9gCFR3ljikLJlmWvEYVs9R0ULi6Ja04+2zH+DegRFPUgvOPE4b5W2OCDxuBFc6/QOuNrSXpfa42Hw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Report ID" = _t, Substring = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Report ID", Int64.Type}, {"Substring", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type","Y","+Y",Replacer.ReplaceText,{"Substring"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Substring", Splitter.SplitTextByDelimiter("+", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Substring"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Substring", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type1", "Custom", each Text.Contains([Substring],"-")),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = true)),
    #"Replaced Value1" = Table.ReplaceValue(#"Filtered Rows"," ","",Replacer.ReplaceText,{"Substring"}),
    #"Split Column by Position" = Table.SplitColumn(#"Replaced Value1", "Substring", Splitter.SplitTextByRepeatedLengths(7), {"Substring.1", "Substring.2"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Position",{{"Substring.1", type text}, {"Substring.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Substring.2", "Custom"})
in
    #"Removed Columns"

 

Best Regards,
Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

 

View solution in original post

4 REPLIES 4
v-rzhou-msft
Community Support
Community Support

Hi @AndreasAh ,

 

If your substring in same ID is separated by commas, here I suggest you to try Split Column by Delimiter"," by rows.

For reference: Split columns by delimiter

My Sample:

RicoZhou_0-1665561092894.png

RicoZhou_1-1665561297627.png

Result is as below.

RicoZhou_2-1665561316555.png

 

Best Regards,
Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Hi Rico Zhou,

 

I realiezed that my post was not clear enough so I've updated it. Will your solution work for the following example:

"The grey fox crossed the road at midnight. It was dark and cold, Y19-421, Y21-002 and Y22-317.

Next the ducks crossed the road."

 

To my understanding I will get the following table:

The grey fox crossed the road at midnight. It was dark and cold

Y19-421

Y21-002 and Y22-317. Next the ducks crossed the road."

 

Best Regards
Andreas

Hi @AndreasAh ,

 

My Sample:

RicoZhou_5-1665654025584.png

Here I suggest you to add a special delimiter that not contained in your orginal table like “+” before "Y".

RicoZhou_0-1665653781458.png

Then Split Column by Delimiter"+" by rows.

RicoZhou_2-1665653877951.png

Then add a custom column to check whether your row contains "-"

RicoZhou_1-1665653863482.png

Filter "True" in Custom column> Replace " "(space) by nothing. Spliter by position 7 by rows.

RicoZhou_3-1665653966760.png

Finally remove some columns you don't need. Result is as below.

RicoZhou_4-1665654010800.png

Whole M code:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("ZY6xCsIwGIRf5cjchP5/FHHUzcXJpZQOoYltqW0giVjf3lCLi9txx313dS1IFOLWO3TBvXH3C9rgY3QWKZvBGwuTMA12Hro+KVwSXibCmjDCzBatf9gCFR3ljikLJlmWvEYVs9R0ULi6Ja04+2zH+DegRFPUgvOPE4b5W2OCDxuBFc6/QOuNrSXpfa42Hw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Report ID" = _t, Substring = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Report ID", Int64.Type}, {"Substring", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type","Y","+Y",Replacer.ReplaceText,{"Substring"}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Substring", Splitter.SplitTextByDelimiter("+", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Substring"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Substring", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type1", "Custom", each Text.Contains([Substring],"-")),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Custom] = true)),
    #"Replaced Value1" = Table.ReplaceValue(#"Filtered Rows"," ","",Replacer.ReplaceText,{"Substring"}),
    #"Split Column by Position" = Table.SplitColumn(#"Replaced Value1", "Substring", Splitter.SplitTextByRepeatedLengths(7), {"Substring.1", "Substring.2"}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Position",{{"Substring.1", type text}, {"Substring.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type2",{"Substring.2", "Custom"})
in
    #"Removed Columns"

 

Best Regards,
Rico Zhou

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

 

Your solution is good for the majority of cases therefore I accepted your solution as an answer.

For other readers intrest I will also post the solution that I ended up using:

 

I didn't solve anything in in PowerBI, instead I used an SQL query when I loaded my data:

 

1. To prepare my data for pattern matching I removed (replaced) all "," " " 'tab' and newline with ",". I did this by extracting my data off intrest into a temporary table. Then I used Split_String() with delemiter "," to get a table where each "word" was in  its own row.

 

2. To dynamicaly extract the information I need, I used LIKE funktion.

 

This gave me a table with only Y19-421 in powerbi.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.