Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and a 50 percent discount on exams.
Get startedEarn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hiya good people of PQ,
I have a text column as containing various texts,e.g. xxyyzz, xxxabc, acdfgh.
I want to extract the the texts "x", "c" and "h" for example, using a pre-specified list {"x","c","h"}, if there is a match.
I am currently using if text.contains code, but its starting to get very complicated with every new data refresh adding more complexity to the text column.
If there was a simpler solution with List.accumulate, e..g, then I could just update the list.
Thanks and best regds.,
Solved! Go to Solution.
Table.AddColumn(Custom2, "Ingredient", each
List.Skip(
{{{"50%","sugarlo"},"SugarLo"},
{{"Stevia","Green"},"Stevia"},
{{"classic","cook","zero calorie"},"Sucralose"},
{{"original","Low Calories"},"Aspartame"},
{{"xylitol"},"Xylitol"}
},
(x)=>not List.Contains(x{0},[Title],(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
){0}?{1}?
)
please change List.ContainsAny to List.Contains
1 if use flat list, you can not get the replacement when more than one substrings be assigned a replacement. the list can be re-write as this
{"50%","SugarLo"},
{"sugarlo","SugarLo"},
{"Stevia","Stevia"},
{"Green","Stevia"},
{"classic","Sucralose"},
{"cook","Sucralose"},
{"zero calorie""Sucralose"},
{"original","Aspartame"},
{"Low Calories","Aspartame"},
{"xylitol","Xylitol"}
and re-write the second arguement as
(x)=>Text.Contains([Title],x,Comparer.OrdinalIgnoreCase)
I just intergerate the same replacement into one item.
2 (ItemFromSupportingList)=>not List.Contains(ItemFromSupportingList{0},[Title],(1stArguement,2ndArgument)=>Text.Contains(2ndArgument,1stArguement,Comparer.OrdinalIgnoreCase))
){0}?{1}?
{0} is select the first value of the list after skiping all the items not match the condition. this value will be in the structure as {list,text}, then {1} to get the "text" as the replacement.
so you get a blank list {}, so {}{0} is error, then {}{0}{1} is error.
can add a ? to tolerate the error, when there is no item on the index you want, it will give a null
each item should like {{"50%","sugarlo"},"SugarLo"}
when all the value in first item's list can not be found in the Title column, then this item {{"50%","sugarlo"},"SugarLo"} will be skipped.
Hey @Daryl-Lynch-Bzy ,
Below is a snapshot of the text column :
And here is the code I am using for extracting the key words like Stevia or Aspartame and I use similar code to extract the pack size of type (like sachets or tablets) :
Table.AddColumn(Custom2, "Ingredient", each if
Text.Contains([Title],"50%",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"sugarlo",Comparer.OrdinalIgnoreCase)then "SugarLo" else
if Text.Contains([Title],"Stevia",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"Green",Comparer.OrdinalIgnoreCase)
then "Stevia" else
if Text.Contains([Title],"classic",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"cook",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"zero calorie",Comparer.OrdinalIgnoreCase) then "Sucralose" else
if Text.Contains([Title],"original" ,Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"Low Calories", Comparer.OrdinalIgnoreCase)
then "Aspartame" else
if Text.Contains([Title],"xylitol",Comparer.OrdinalIgnoreCase) then "Xylitol" else null)
This code keeps getting longer and more complex with every refresh as the texts start getting new additions everytime so I have to mody this code post every refresh.
But what remains invariant is the list of actives or the list of pack sizes or the type etc - hence if I can work off a list, it will be a simple task just to update the relevant list
Thanks and appreciate the leg-up
Table.AddColumn(Custom2, "Ingredient", each
List.Skip(
{{{"50%","sugarlo"},"SugarLo"},
{{"Stevia","Green"},"Stevia"},
{{"classic","cook","zero calorie"},"Sucralose"},
{{"original","Low Calories"},"Aspartame"},
{{"xylitol"},"Xylitol"}
},
(x)=>not List.ContainsAny(x{0},[Title],(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
){0}?{1}?
)
Table.AddColumn(Custom2, "Ingredient", each
List.Skip(
{{{"50%","sugarlo"},"SugarLo"},
{{"Stevia","Green"},"Stevia"},
{{"classic","cook","zero calorie"},"Sucralose"},
{{"original","Low Calories"},"Aspartame"},
{{"xylitol"},"Xylitol"}
},
(x)=>not List.Contains(x{0},[Title],(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
){0}?{1}?
)
please change List.ContainsAny to List.Contains
Sorry to be a pest here but I am havig trouble working thru' the logic of the code. Appreciate your being patient with my dumb questions :
1. Why are we using a nested list - why cannot we use a flat list, e.g. {"50%", "sugarlo", "low calories","Aspartame"} and so on? The reaso I am asking is this is that the [Title] column also has some other information like forms, pack size etc for which we may not need a nested list as in this code.
2. I am also not clear as to how do you skip a nested list
3. In the last part of the code,
(x)=>not List.Contains(x{0},[Title],(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase)) ){0}?{1}?
I can see you are using the logic for skipping the list, but I did not understand the following :
a. where is variable (x) getting its value from? and why are using the 1st positing by using x{0}?
b. I am not able to understand the 3rd parameter within the list.contains
c. where is variable (y) getting its value from
d. Then in the last part why are we using {0} and then {1} positions and of which list?
Possibly all dumb questions but if I get this logis straight, I can reconstruct the code for extracting the other text strings :
a. Form - "tablets","jars","pouch","sachets"
b. Pack size - "100","300","500","50","100","150" etc
c. Multi pack - "1","2","3","4","5","6"
Appreciate the leg-up and apologies for the long-winded question!
1 if use flat list, you can not get the replacement when more than one substrings be assigned a replacement. the list can be re-write as this
{"50%","SugarLo"},
{"sugarlo","SugarLo"},
{"Stevia","Stevia"},
{"Green","Stevia"},
{"classic","Sucralose"},
{"cook","Sucralose"},
{"zero calorie""Sucralose"},
{"original","Aspartame"},
{"Low Calories","Aspartame"},
{"xylitol","Xylitol"}
and re-write the second arguement as
(x)=>Text.Contains([Title],x,Comparer.OrdinalIgnoreCase)
I just intergerate the same replacement into one item.
2 (ItemFromSupportingList)=>not List.Contains(ItemFromSupportingList{0},[Title],(1stArguement,2ndArgument)=>Text.Contains(2ndArgument,1stArguement,Comparer.OrdinalIgnoreCase))
){0}?{1}?
{0} is select the first value of the list after skiping all the items not match the condition. this value will be in the structure as {list,text}, then {1} to get the "text" as the replacement.
Thanks a lot @wdx223_Daniel for the clear explanation.
Suppose the text string also contains sub-strings like 50,500 and 5 - is there a way we can extrat the "5" from the single "5", but not from "50 and "500"? That along with a remodification of this code can help extract the pack sizes and selling units from the [Title] string?
could you provide some sample data along with the desire output?
The source data remains same...
The code I am using to extract the form is as below :
Table.AddColumn(Ingredients, "Form", each if Text.Contains([Title],"tablets",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"tablet",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"500 pcs",Comparer.OrdinalIgnoreCase)
or Text.Contains([Title],"330",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"440",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"550",Comparer.OrdinalIgnoreCase)
then "Tablets" else
if Text.Contains([Title],"sachet",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"sachets",Comparer.OrdinalIgnoreCase) then "Sachets" else
if Text.Contains([Title],"500 g",Comparer.OrdinalIgnoreCase) or Text.Contains([Title],"500g",Comparer.OrdinalIgnoreCase)
or Text.Contains([Title],"xylitol",Comparer.OrdinalIgnoreCase) then "Pouch" else
if Text.Contains([Title],"80",Comparer.OrdinalIgnoreCase) or
Text.Contains([Title],"150g",Comparer.OrdinalIgnoreCase) or
Text.Contains([Title],"150 g",Comparer.OrdinalIgnoreCase) or
Text.Contains([Title],"150gm",Comparer.OrdinalIgnoreCase) or
Text.Contains([Title],"150 GRAM",Comparer.OrdinalIgnoreCase)
then "Jar" else "Tablets"
)
The code I am using to extract the sizes is as below :
a. First I remove all text from the [Title] string :
Table.AddColumn(MPU, "Remove Text", each Text.Remove([Title],{"A".."Z","a".."z"}))
b. This returns a column as below :
c. from this column now, I feed a pre-prepared list :
into the following code to extract the matching strings :
Table.AddColumn(RemoveTextFromTitle, "Pack", each List.Select(PackList,
(x)=>
Text.Contains([Remove Text],x,Comparer.OrdinalIgnoreCase)
){0}?)
d. This returns a column as below :
e. I now clean up the above the column, by feeding another list with corrected packlist :
by using the following code :
List.Accumulate({0..Table.RowCount(PackSize)-1},
PackPosition,
(x,y)=>
Table.ReplaceValue(x,
PackSize[OldValue]{y}, PackSize[NewValue]{y},
Replacer.ReplaceValue, {"Pack"})
)
Little long-winded, but I could not find a simpler way to get to this in fewer steps.
Any help appreciated.
it looks like more complicated.
i can not get the whole story depending on what you showed
Sorry I confused you unintentionally. let me try and simplify this for you :
1. We are starting with the same column as above :
2. This text string, contains terms like sachets, tablets and jars - which are the forms. I want to extract these sub-strings from this column and the desired output should be to add a column headed "Form" with the strings as below :
3. Next if you notice, the text string column also includes number strings as - 100,300, 500, 80,50 etc. These are pack sizes. The desired output would be to extract these strings into a separate column headed "pack Size" :
Not sure if that has simplified things a bit and helps you better with the solution I am seeking.
Table.SplitColumn(Custom2, "Title", each
let
fx=(x,y)=>List.RemoveItems(Splitter.SplitTextByAnyDelimiter(x)(y),{"",null}),
Ingredient=List.Skip(
{{{"50%","sugarlo"},"SugarLo"},
{{"Stevia","Green"},"Stevia"},
{{"classic","cook","zero calorie"},"Sucralose"},
{{"original","Low Calories"},"Aspartame"},
{{"xylitol"},"Xylitol"}
},
(x)=>not List.Contains(x{0},_,(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
){0}?{1}?,
Form=fx(fx(YourFormList,_),_){0}?,
PackSize=fx(fx(YourPackSizeList,_),_){0}?,
MultiPack=fx(fx(YourMulitPackList,_),){0}?
in {Ingredient,Form,PackSize,MultiPack}
{"Ingredient","Form","PackSize","MultiPack"})
@wdx223_Daniel , I tried the code above only for ingredients :
Table.SplitColumn(ActiveNewWay.1, "Title", each
let
fx=(x,y)=>List.RemoveItems(Splitter.SplitTextByAnyDelimiter(x)(y),{"",null}),
Ingredient=List.Skip(
{{{"50%","sugarlo"},"SugarLo"},
{{"Stevia","Green"},"Stevia"},
{{"classic","cook","zero calorie"},"Sucralose"},
{{"original","Low Calories"},"Aspartame"},
{{"xylitol"},"Xylitol"}
},
(x)=>not List.Contains(x{0},_,(x,y)=>Text.Contains(y,x,Comparer.OrdinalIgnoreCase))
){0}?{1}?
// Form=fx(fx(YourFormList,_),_){0}?,
// PackSize=fx(fx(YourPackSizeList,_),_){0}?,
// MultiPack=fx(fx(YourMulitPackList,_),){0}?
in {Ingredient}
{"Ingredient"})
And this is returning the following error :
Am I missing something here?
regds.,
Thanks @wdx223_Daniel - the missing underscore probably is not causing the error since that is commented out. Will try the comma after {Ingredient} and check the result.
One question though :
when you are feeding the variable (x){0} into the list.contains function, will it not always check against the 1st item in the list (which in this case is "SugarLo"? I am slightly confused here. Should not the list.contains check each item in the list within the [Title] column? but (x){0} will always check for the 1st item and not list.contains will always return false, right? I am not able to figure out that if its not a recursive function, then how for each row in column [Title], each item of the list will be checked for true or false?
this function will be ran in each row of your table.
it firstly get the value of [Title] in this row
then, test if it contains any value in the first list-item of each row in your pre-prepared list
it will skip all the row of which the first list-item did not be found in Title,
then get the first row of the remain pre-prepared list, and get the 2nd item of the row as the replacement.
@wdx223_Daniel - I was trying out the code in a smaller bit as follows :
1. I have prepared a list called [strings} as follows :
let
Source = {
{{"SugarLo","50%"},"SugarLo"},
{{"Original", "Low Calories"},"Aspartame"},
{{"Classic","cook","zero calorie"},"Sucralose"},
{{"Stevia","green"},"Stevia"},
{{"Xylitol"},"Xylitol"}
}
in
Source
2. Then I am trying out your code on this list as follows :
List.Skip(Strings, (x)=>not List.Contains(x{0},"SugarLo"))
Since the condition is false it returns all the 5 lists and if we use {0}{1} on this list it correctly returns "SugarLo"
3. the problem starts when I change "SugarLo" to "Aspartame" e.g. -
List.Skip(Strings, (x)=>not List.Contains(x{0},"Aspartame"))
which returns an empty list and hence the {0}{1} does not work and returns an error.
Am I missing something in the logic here? I am really sorry but I am not able to work out how will (x){0} work on the list recursively?
can you help?
thanks and really appreciate and apologies for the bother!
so you get a blank list {}, so {}{0} is error, then {}{0}{1} is error.
can add a ? to tolerate the error, when there is no item on the index you want, it will give a null