Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
rovers
Frequent Visitor

Extracting substrings from free text

Hi,

Seeking help to code the following in Power Query (preferred) or DAX.

 

Input Table:

ID; Free Text

Order123; PN113 and PN224 were used to fulfil the order. However, there was no stock of PN113. PN224 was ok.

Order124; There was an issue. PN445 was not found. To resolved this, PN556 was used in place of PN445.

 

Output Table:

Order123; PN113

Order123; PN224

Order124; PN445

Order124; PN556

 

The objective is to search for "PN" and list the full PN that appears in the free text. Each PN is standardized to always have 5 characters. 

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

Here is a crude way of doing this. It assumes that the string "PN"  does not appear anywhere else in the text. So "The driver was hypnotized"  wouldn't work.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("PY47DoMwEESvMqK2kACTM6QiFHSIwoK1QFjeyB+4fmyS0O7OvDfjWLzcQq6qm0IUfVdVDZRd0Hd1LXGSI0RPCwJDR6M3g7ASOFdKPPmkg5zItxQ8lYdl+MDzDta4aOUflZ68l8UkbqNMxuFuKovN+0i5IGX7owVojnYpMTAceTZHHrNuXqRY2z6u2LVws3gbNdPXnAjJNX0A", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Free Text" = _t]),
    #"Replaced Value" = Table.ReplaceValue(Source,"."," ",Replacer.ReplaceText,{"Free Text"}),
    #"Added Custom1" = Table.AddColumn(#"Replaced Value", "Custom.1", each List.Distinct(List.FindText(Text.Split([Free Text]," "),"PN"))),
    #"Expanded Custom.1" = Table.ExpandListColumn(#"Added Custom1", "Custom.1")
in
    #"Expanded Custom.1"

View solution in original post

5 REPLIES 5
lbendlin
Super User
Super User

Here is a crude way of doing this. It assumes that the string "PN"  does not appear anywhere else in the text. So "The driver was hypnotized"  wouldn't work.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("PY47DoMwEESvMqK2kACTM6QiFHSIwoK1QFjeyB+4fmyS0O7OvDfjWLzcQq6qm0IUfVdVDZRd0Hd1LXGSI0RPCwJDR6M3g7ASOFdKPPmkg5zItxQ8lYdl+MDzDta4aOUflZ68l8UkbqNMxuFuKovN+0i5IGX7owVojnYpMTAceTZHHrNuXqRY2z6u2LVws3gbNdPXnAjJNX0A", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Free Text" = _t]),
    #"Replaced Value" = Table.ReplaceValue(Source,"."," ",Replacer.ReplaceText,{"Free Text"}),
    #"Added Custom1" = Table.AddColumn(#"Replaced Value", "Custom.1", each List.Distinct(List.FindText(Text.Split([Free Text]," "),"PN"))),
    #"Expanded Custom.1" = Table.ExpandListColumn(#"Added Custom1", "Custom.1")
in
    #"Expanded Custom.1"

This works, thanks a lot. 

 

However, there were some entries that needed cleaning e.g. (PN789) where the brackets should be removed. I added two more lines of code to clean them.

 

#"Added Custom.2" = Table.AddColumn(#"Expanded Custom.1", "Custom.2", each Text.PositionOf([Custom.1],"P")),
#"Added Custom.3" = Table.AddColumn(#"Added Custom.2", "Custom.3", each Text.Middle([Custom.1], [Custom.2], 6))

 

 

 

 

I would recommend you add two more replacers instead.

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("PY47DoMwEESvMnKVSJYlwOQMqQgFHaJAsAiE5Y38gevHkIR2d+a9aVvxciO5LC+EFHWVZQV6O6Ku8lxjJ0eInkYExhTNtBiEmcBHReHJO23k5HFLwb33sAwfeFjBE06a+qPSk1clOnkZdTI2V7O3WLyPpHCrK63L+48XMHG0o0LDcOTZbMecefEykcvyccbOjYvF2/QDfd0JkWzdBw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Free Text" = _t]),
    #"Replaced Value" = Table.ReplaceValue(Source,"."," ",Replacer.ReplaceText,{"Free Text"}),
    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value","("," ",Replacer.ReplaceText,{"Free Text"}),
    #"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",")"," ",Replacer.ReplaceText,{"Free Text"}),
    #"Added Custom1" = Table.AddColumn(#"Replaced Value2", "Custom.1", each List.Distinct(List.FindText(Text.Split([Free Text]," "),"PN"))),
    #"Expanded Custom.1" = Table.ExpandListColumn(#"Added Custom1", "Custom.1")
in
    #"Expanded Custom.1"

Actually (PN789) was just an example. The additional characters could come in other forms such as -PN789. In anyway I used "PN" as a mockup, my real use case is a 3 letter code which is not commonly used in english words. 

Fair enough. I seem to recall that there are a couple of more elegant fuzzy replacer functions available - maybe something to consider. Or simply blank out everything that is not a letter or a number.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.