Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Zelbinian
Frequent Visitor

Create new column based on text from a messy text blob

So, I have a column with data that looks like this:

#Research; #ourResearch-Cat1-Subcat1; #Benchmark; 2018; Benchmark
#Research; #OurResearch-Cat1-Subcat2; #Benchmark; Benchmark
#OURRESEARCH-NewCat; #Research
#Research; #ourresearch-Category3-Subcat 3;
#Research; #ourResearch-NewCat2

 

It's a text blob from a database and it's a little messy, as user-entered data often is. I've definitely not captured every variation, but you get the gist. Here's the result I want:

Category SubCategory

Cat1Subcat1
Cat1Subcat2
NewCat 
Category3Subcat 3
NewCat2 

 

I've been Googling around for solutions for hours and the issue is everything I've seen really depends on knowing the exact positioning of the text you're trying to grab. I do know, to an extent. Like, if I could write a regex this problem would have been solved in about 5 minutes. But I've yet to figure out a combination of replacer and splitter functions that can do the same job.

Edit: I've added some complexity to my example. The things in the sample data are just stand-ins for the real thing, categories and subcategories can be any alphanumeric string of any length.

2 ACCEPTED SOLUTIONS

Hi @Zelbinian

 

Please try these two calculated columns as an alternative.  These are in DAX and again, I have attached a PBIX file that you can download to see/tweak etc.

 

Category = 
VAR LengthOfText = LEN('Table1'[Column1])
VAR StartingPointOR = SEARCH("#OurResearch",'Table1'[Column1],1,0)+1
VAR RestOfText1 =  MID('Table1'[Column1],StartingPointOR+12,LengthOfText-StartingPointOR+1) & ";"
VAR StartingPointSC = SEARCH(";",RestOfText1,1,0)-1
VAR RestOfText2 = LEFT(RestOfText1,StartingPointSC)
VAR StartingPointDash = SEARCH("-",RestOfText2,1,0)

RETURN 
    IF(
        StartingPointDash>0,
        LEFT(RestOfText2,StartingPointDash-1),
        RestOfText2
        )

and

 

Sub Category = 
VAR LengthOfText = LEN('Table1'[Column1])
VAR StartingPointOR = SEARCH("#OurResearch",'Table1'[Column1],1,0)+1
VAR RestOfText1 =  MID('Table1'[Column1],StartingPointOR+12,LengthOfText-StartingPointOR+1) & ";"
VAR StartingPointSC = SEARCH(";",RestOfText1,1,0)-1
VAR RestOfText2 = LEFT(RestOfText1,StartingPointSC)
VAR StartingPointDash = SEARCH("-",RestOfText2,1,0)

RETURN 
    IF(
        StartingPointDash>0,
        MID(RestOfText2,StartingPointDash+1,99)
        )

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

View solution in original post

Hi @Zelbinian,

 

The @Phil_Seamark's solution is wonderful. Here is a solution of M based on your sample data. The file is attached. 

Text.BeforeDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")
Text.AfterDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")

Create-new-column-based-on-text-from-a-messy-text-blob

 

Best Regards,
Dale

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

10 REPLIES 10

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.