cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Zelbinian Frequent Visitor
Frequent Visitor

Create new column based on text from a messy text blob

So, I have a column with data that looks like this:

#Research; #ourResearch-Cat1-Subcat1; #Benchmark; 2018; Benchmark
#Research; #OurResearch-Cat1-Subcat2; #Benchmark; Benchmark
#OURRESEARCH-NewCat; #Research
#Research; #ourresearch-Category3-Subcat 3;
#Research; #ourResearch-NewCat2

 

It's a text blob from a database and it's a little messy, as user-entered data often is. I've definitely not captured every variation, but you get the gist. Here's the result I want:

Category SubCategory

Cat1Subcat1
Cat1Subcat2
NewCat 
Category3Subcat 3
NewCat2 

 

I've been Googling around for solutions for hours and the issue is everything I've seen really depends on knowing the exact positioning of the text you're trying to grab. I do know, to an extent. Like, if I could write a regex this problem would have been solved in about 5 minutes. But I've yet to figure out a combination of replacer and splitter functions that can do the same job.

Edit: I've added some complexity to my example. The things in the sample data are just stand-ins for the real thing, categories and subcategories can be any alphanumeric string of any length.

2 ACCEPTED SOLUTIONS

Accepted Solutions
Phil_Seamark Super Contributor
Super Contributor

Re: Create new column based on text from a messy text blob

Hi @Zelbinian

 

Please try these two calculated columns as an alternative.  These are in DAX and again, I have attached a PBIX file that you can download to see/tweak etc.

 

Category = 
VAR LengthOfText = LEN('Table1'[Column1])
VAR StartingPointOR = SEARCH("#OurResearch",'Table1'[Column1],1,0)+1
VAR RestOfText1 =  MID('Table1'[Column1],StartingPointOR+12,LengthOfText-StartingPointOR+1) & ";"
VAR StartingPointSC = SEARCH(";",RestOfText1,1,0)-1
VAR RestOfText2 = LEFT(RestOfText1,StartingPointSC)
VAR StartingPointDash = SEARCH("-",RestOfText2,1,0)

RETURN 
    IF(
        StartingPointDash>0,
        LEFT(RestOfText2,StartingPointDash-1),
        RestOfText2
        )

and

 

Sub Category = 
VAR LengthOfText = LEN('Table1'[Column1])
VAR StartingPointOR = SEARCH("#OurResearch",'Table1'[Column1],1,0)+1
VAR RestOfText1 =  MID('Table1'[Column1],StartingPointOR+12,LengthOfText-StartingPointOR+1) & ";"
VAR StartingPointSC = SEARCH(";",RestOfText1,1,0)-1
VAR RestOfText2 = LEFT(RestOfText1,StartingPointSC)
VAR StartingPointDash = SEARCH("-",RestOfText2,1,0)

RETURN 
    IF(
        StartingPointDash>0,
        MID(RestOfText2,StartingPointDash+1,99)
        )

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Community Support Team
Community Support Team

Re: Create new column based on text from a messy text blob

Hi @Zelbinian,

 

The @Phil_Seamark's solution is wonderful. Here is a solution of M based on your sample data. The file is attached. 

Text.BeforeDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")
Text.AfterDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")

Create-new-column-based-on-text-from-a-messy-text-blob

 

Best Regards,
Dale

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
10 REPLIES 10
Highlighted
Phil_Seamark Super Contributor
Super Contributor

Re: Create new column based on text from a messy text blob

Hi @Zelbinian

 

Here is one approach.  Not sure if it works with your wider dataset, but hopefully you can see how to tweak it.  I have attached a PBIX file

 

Category = 
VAR LengthOfText = LEN('Table1'[Column1])
VAR StartingPointCat = FIND("-Cat",'Table1'[Column1],1,0)+1
VAR RestOfText =  MID('Table1'[Column1],StartingPointCat,LengthOfText-StartingPointCat+1)
VAR FinishPointCAT = {
                        {FIND("-",RestOfText,1,LengthOfText)},
                        {FIND(";",RestOfText,1,LengthOfText)}
                        }
RETURN 
    MID(
        'Table1'[Column1],
        StartingPointCat,MINX(FinishPointCAT,[Value])-1
        )

and 

Sub Category = 
VAR LengthOfText = LEN('Table1'[Column1])
VAR StartingPointCat = FIND("-Subcat",'Table1'[Column1],1,0)+1
VAR RestOfText =  MID('Table1'[Column1],StartingPointCat,LengthOfText-StartingPointCat+1)
VAR FinishPointCAT = {
                        {FIND("-",RestOfText,1,LengthOfText)},
                        {FIND(";",RestOfText,1,LengthOfText)}
                        }
RETURN 
    IF(StartingPointCat>1,
        MID(
        'Table1'[Column1],
        StartingPointCat,MINX(FinishPointCAT,[Value])-1
        ))

image.png


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Zelbinian Frequent Visitor
Frequent Visitor

Re: Create new column based on text from a messy text blob

The problem with this solution - and other solutions I've seen for similar problems - is that it assumes I can know what the data is going to be in advance. I use "Cat" and "Subcat" as standins for Category and Subcategory data but I can't actually know what those will be in advance. They can be of arbitrary length and composition. I can know that they will follow "#ourresearch-" (in various forms of capitalization), but that's it.

(Also, it would have been helpful for you to mention that this was DAX code. I assumed it was Power Query for about an hour lol)

Phil_Seamark Super Contributor
Super Contributor

Re: Create new column based on text from a messy text blob

HI @Zelbinian

 

My solution is based on the small dataset you provided, hence the approach.  If you have some definite "rules" then these can probably be used to help tweak the approach.  


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Zelbinian Frequent Visitor
Frequent Visitor

Re: Create new column based on text from a messy text blob

I updated the question, if that helps.

Phil_Seamark Super Contributor
Super Contributor

Re: Create new column based on text from a messy text blob

Hi @Zelbinian

 

Please try these two calculated columns as an alternative.  These are in DAX and again, I have attached a PBIX file that you can download to see/tweak etc.

 

Category = 
VAR LengthOfText = LEN('Table1'[Column1])
VAR StartingPointOR = SEARCH("#OurResearch",'Table1'[Column1],1,0)+1
VAR RestOfText1 =  MID('Table1'[Column1],StartingPointOR+12,LengthOfText-StartingPointOR+1) & ";"
VAR StartingPointSC = SEARCH(";",RestOfText1,1,0)-1
VAR RestOfText2 = LEFT(RestOfText1,StartingPointSC)
VAR StartingPointDash = SEARCH("-",RestOfText2,1,0)

RETURN 
    IF(
        StartingPointDash>0,
        LEFT(RestOfText2,StartingPointDash-1),
        RestOfText2
        )

and

 

Sub Category = 
VAR LengthOfText = LEN('Table1'[Column1])
VAR StartingPointOR = SEARCH("#OurResearch",'Table1'[Column1],1,0)+1
VAR RestOfText1 =  MID('Table1'[Column1],StartingPointOR+12,LengthOfText-StartingPointOR+1) & ";"
VAR StartingPointSC = SEARCH(";",RestOfText1,1,0)-1
VAR RestOfText2 = LEFT(RestOfText1,StartingPointSC)
VAR StartingPointDash = SEARCH("-",RestOfText2,1,0)

RETURN 
    IF(
        StartingPointDash>0,
        MID(RestOfText2,StartingPointDash+1,99)
        )

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Zelbinian Frequent Visitor
Frequent Visitor

Re: Create new column based on text from a messy text blob

I appreciate you giving me the fish, but, unfortunately, I do not understand your code (I've never written in DAX) nor do I have an explanation about the thinking that produced it. Can you start by explaining why you chose a calculated column / DAX vs doing Power Query / M in the query step and creating a new real column?

Phil_Seamark Super Contributor
Super Contributor

Re: Create new column based on text from a messy text blob

Hi @Zelbinian

 

For something like this, you can use either M or DAX.  I chose DAX because I'm more familiar with it, but it probably doesn't matter and there won't be a big performance difference.  Personally, I find DAX code easier to read than M, but I love anything you can do in M just from the UI.

 

If I was to do this in M, the principals would be the same.  Look for the guaranteed patters (#ourresearch) etc and then start breaking up the text using text functions.


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Community Support Team
Community Support Team

Re: Create new column based on text from a messy text blob

Hi @Zelbinian,

 

The @Phil_Seamark's solution is wonderful. Here is a solution of M based on your sample data. The file is attached. 

Text.BeforeDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")
Text.AfterDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")

Create-new-column-based-on-text-from-a-messy-text-blob

 

Best Regards,
Dale

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Zelbinian Frequent Visitor
Frequent Visitor

Re: Create new column based on text from a messy text blob

In case anybody wants to see the full text of @v-jiascu-msft's code without having to download the file, here it is:

 #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Text.BeforeDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")),
 #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Text.AfterDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-"))


The 'each' in each statement seems like a particularly important detail there.

Also, that last example does not actually have a semicolon for the Text.BetweenDelimiters to key off of. Yet it seems to work. The the documentation of the function does not describe how the function behaves if the second delimeter doesn't exist, how did you figure out that solution?