Good afternoon!
I have a general text field that could contain a paragraph of text and within it, will contain a code that I would like to extract and enter into a new column.
For example, I would be looking for " FKR-FY22-614 ". This can change in length based on the last numbers so like FKR-FY22-52 or FKR-FY23-1040 eventually. The one constant is it always starts with FKR-FY
I have found some solutions but the code gets complicated and I have a hard time understanding how to apply it in my situation.
Thanks in advance!!
Solved! Go to Solution.
To deal with text before the product number, my solution needs to be tweaked a bit. Try this as a custom column definition:
let
after_prefix = Text.AfterDelimiter([Column1], "FKR-FY" ),
remove_digits = Text.Remove(after_prefix, {"0".."9","-"}),
delimiter = Text.Start(remove_digits, 1),
trim_start = Text.AfterDelimiter([Column1], "FKR-FY"),
prod_num = Text.BeforeDelimiter(trim_start, delimiter),
result = if delimiter = ""
then "FKR-FY" & trim_start
else "FKR-FY" & prod_num
in
result
I'll test a few things tonight and tomorrow morning and come back and mark a solution.
Thanks!
If you have a list of characters to split on (space, period, etc.), then you write a custom column with Text.SplitAny:
List.First(Text.SplitAny([Column1], " ."))
A more robust solution might be to look for the first character after the number after the second hyphen and then return everything before that character.
let
after_hyphens = Text.AfterDelimiter([Column1], "-", 1),
remove_digits = Text.Remove(after_hyphens, {"0".."9"}),
delimiter = Text.Start(remove_digits, 1),
prod_num = Text.BeforeDelimiter([Column1], delimiter),
result = if delimiter = ""
then [Column1]
else prod_num
in
result
That's awesome, I will give that a shot. I like the idea of going in reverse too, that does make sense.
Hi @Dropsix
If I understand correctly you are looking for strings that start with "FKR-FY". What is the rule to know where the string ends? Is it at the next space?
Can you provide an example with a full paragraph of text?
|
Please accept the solution when done and consider giving a thumbs up if posts are helpful. Contact me privately for support with any larger-scale BI needs, tutoring, etc. |
Sometimes it's the next space, but sometimes I have seen periods right after.
So for example it could be:
FKR-FY22-42 was delivered via courier, contents were not verified etc.
Or
FKR-FY23-1003. delivered via courier, contents were not verified etc.
You´ll have to be more specific about the requirements to get a more robust solution.
Place the following M code in a blank query to see the steps.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rc5BCoMwFATQqwxZG7GxeAQ33XVXggvRKX4oCSS/6vEN4glKdwMzPMZ7o9wV/eNp+5dz9u6wjRkzP7IyccYqI6b4TcJUlRCUQTO20iFERRnJW8qOOpmh8uaSWntrmrb+DapPKerCBGVWnB87/MkeDg==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each "FKR-FY" & Text.AfterDelimiter([Column1], "FKR-FY"), type text),
chars_ = {"0".."9", "a".."z", "A".."Z", "-"},
#"Split Column by Character Transition" = Table.SplitColumn(#"Added Custom", "Custom", Splitter.SplitTextByCharacterTransition(chars_, (c) => not List.Contains(chars_, c)), {"Custom"})
in
#"Split Column by Character Transition"
|
Please accept the solution when done and consider giving a thumbs up if posts are helpful. Contact me privately for support with any larger-scale BI needs, tutoring, etc. |
To be more specific, I am looking for any mention of FKR and the letters and number that follow it within a free text field. Sometimes it's at the beginning of the field, sometimes in the middle sometimes at the end followed by a period. The criteria is that I find anything that looks like FKR-FY**-** or FKR-FY**-*** or FKR-FY**-****
I know from a data standpoint this seems unreliable (what if someone doesn't copy and paste that properly or type it in properly) but this is for my reference only and not used in any official way. So I hope that is specific enough as can be, it's just kind of random.
Here are a couple of examples:
FKR-FY21-554. Due to COVID-19, containers were not opened to verify their contents. Containers were damaged during transport and had to be re-taped. |
"Delivery Company is UPS Express. FKR-FY19-112." |
To deal with text before the product number, my solution needs to be tweaked a bit. Try this as a custom column definition:
let
after_prefix = Text.AfterDelimiter([Column1], "FKR-FY" ),
remove_digits = Text.Remove(after_prefix, {"0".."9","-"}),
delimiter = Text.Start(remove_digits, 1),
trim_start = Text.AfterDelimiter([Column1], "FKR-FY"),
prod_num = Text.BeforeDelimiter(trim_start, delimiter),
result = if delimiter = ""
then "FKR-FY" & trim_start
else "FKR-FY" & prod_num
in
result
I think that's about it. I swapped out then "FKR-FY" to null instead and it looks perfect.
Thank you so much, I like this. I like AIB's solution also but frankly don't quite understand how it works.
1. The #"Added Custom" step does a bit of initial cleaning by removing whatever is before "FKR-FY"
2. From your requirements description (albeit not fully unambiguous), I surmise that the substring we are interested in extracting starts by "FKR-FY" and ends by any character that is not alphanumeric nor "-". From the step in point 1, we know the string starts by "FKR-FY so step " #"Split Column by Character Transition" looks for the position where there is a transition between the "allowed" characters (defined in variable chars_
) and other characters. That is the end of the substring of interest and the string is split there. The piece before the split is kept.
|
Please accept the solution when done and consider giving a thumbs up if posts are helpful. Contact me privately for support with any larger-scale BI needs, tutoring, etc. |
Ok that makes sense. I'll take another look at it
I like this and how it works when the text doesn't start with the product number! I also considered a character transition but didn't think of using the optional column parameter in Table.SplitColumn to just get a single column.