Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
So, I have a column with data that looks like this:
#Research; #ourResearch-Cat1-Subcat1; #Benchmark; 2018; Benchmark |
#Research; #OurResearch-Cat1-Subcat2; #Benchmark; Benchmark |
#OURRESEARCH-NewCat; #Research |
#Research; #ourresearch-Category3-Subcat 3; |
#Research; #ourResearch-NewCat2 |
It's a text blob from a database and it's a little messy, as user-entered data often is. I've definitely not captured every variation, but you get the gist. Here's the result I want:
Category SubCategory
Cat1 | Subcat1 |
Cat1 | Subcat2 |
NewCat | |
Category3 | Subcat 3 |
NewCat2 |
I've been Googling around for solutions for hours and the issue is everything I've seen really depends on knowing the exact positioning of the text you're trying to grab. I do know, to an extent. Like, if I could write a regex this problem would have been solved in about 5 minutes. But I've yet to figure out a combination of replacer and splitter functions that can do the same job.
Edit: I've added some complexity to my example. The things in the sample data are just stand-ins for the real thing, categories and subcategories can be any alphanumeric string of any length.
Solved! Go to Solution.
Hi @Zelbinian
Please try these two calculated columns as an alternative. These are in DAX and again, I have attached a PBIX file that you can download to see/tweak etc.
Category = VAR LengthOfText = LEN('Table1'[Column1]) VAR StartingPointOR = SEARCH("#OurResearch",'Table1'[Column1],1,0)+1 VAR RestOfText1 = MID('Table1'[Column1],StartingPointOR+12,LengthOfText-StartingPointOR+1) & ";" VAR StartingPointSC = SEARCH(";",RestOfText1,1,0)-1 VAR RestOfText2 = LEFT(RestOfText1,StartingPointSC) VAR StartingPointDash = SEARCH("-",RestOfText2,1,0) RETURN IF( StartingPointDash>0, LEFT(RestOfText2,StartingPointDash-1), RestOfText2 )
and
Sub Category = VAR LengthOfText = LEN('Table1'[Column1]) VAR StartingPointOR = SEARCH("#OurResearch",'Table1'[Column1],1,0)+1 VAR RestOfText1 = MID('Table1'[Column1],StartingPointOR+12,LengthOfText-StartingPointOR+1) & ";" VAR StartingPointSC = SEARCH(";",RestOfText1,1,0)-1 VAR RestOfText2 = LEFT(RestOfText1,StartingPointSC) VAR StartingPointDash = SEARCH("-",RestOfText2,1,0) RETURN IF( StartingPointDash>0, MID(RestOfText2,StartingPointDash+1,99) )
Hi @Zelbinian,
The @Phil_Seamark's solution is wonderful. Here is a solution of M based on your sample data. The file is attached.
Text.BeforeDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")
Text.AfterDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")
Best Regards,
Dale
Hi @Zelbinian
Here is one approach. Not sure if it works with your wider dataset, but hopefully you can see how to tweak it. I have attached a PBIX file
Category = VAR LengthOfText = LEN('Table1'[Column1]) VAR StartingPointCat = FIND("-Cat",'Table1'[Column1],1,0)+1 VAR RestOfText = MID('Table1'[Column1],StartingPointCat,LengthOfText-StartingPointCat+1) VAR FinishPointCAT = { {FIND("-",RestOfText,1,LengthOfText)}, {FIND(";",RestOfText,1,LengthOfText)} } RETURN MID( 'Table1'[Column1], StartingPointCat,MINX(FinishPointCAT,[Value])-1 )
and
Sub Category = VAR LengthOfText = LEN('Table1'[Column1]) VAR StartingPointCat = FIND("-Subcat",'Table1'[Column1],1,0)+1 VAR RestOfText = MID('Table1'[Column1],StartingPointCat,LengthOfText-StartingPointCat+1) VAR FinishPointCAT = { {FIND("-",RestOfText,1,LengthOfText)}, {FIND(";",RestOfText,1,LengthOfText)} } RETURN IF(StartingPointCat>1, MID( 'Table1'[Column1], StartingPointCat,MINX(FinishPointCAT,[Value])-1 ))
The problem with this solution - and other solutions I've seen for similar problems - is that it assumes I can know what the data is going to be in advance. I use "Cat" and "Subcat" as standins for Category and Subcategory data but I can't actually know what those will be in advance. They can be of arbitrary length and composition. I can know that they will follow "#ourresearch-" (in various forms of capitalization), but that's it.
(Also, it would have been helpful for you to mention that this was DAX code. I assumed it was Power Query for about an hour lol)
HI @Zelbinian
My solution is based on the small dataset you provided, hence the approach. If you have some definite "rules" then these can probably be used to help tweak the approach.
I updated the question, if that helps.
Hi @Zelbinian
Please try these two calculated columns as an alternative. These are in DAX and again, I have attached a PBIX file that you can download to see/tweak etc.
Category = VAR LengthOfText = LEN('Table1'[Column1]) VAR StartingPointOR = SEARCH("#OurResearch",'Table1'[Column1],1,0)+1 VAR RestOfText1 = MID('Table1'[Column1],StartingPointOR+12,LengthOfText-StartingPointOR+1) & ";" VAR StartingPointSC = SEARCH(";",RestOfText1,1,0)-1 VAR RestOfText2 = LEFT(RestOfText1,StartingPointSC) VAR StartingPointDash = SEARCH("-",RestOfText2,1,0) RETURN IF( StartingPointDash>0, LEFT(RestOfText2,StartingPointDash-1), RestOfText2 )
and
Sub Category = VAR LengthOfText = LEN('Table1'[Column1]) VAR StartingPointOR = SEARCH("#OurResearch",'Table1'[Column1],1,0)+1 VAR RestOfText1 = MID('Table1'[Column1],StartingPointOR+12,LengthOfText-StartingPointOR+1) & ";" VAR StartingPointSC = SEARCH(";",RestOfText1,1,0)-1 VAR RestOfText2 = LEFT(RestOfText1,StartingPointSC) VAR StartingPointDash = SEARCH("-",RestOfText2,1,0) RETURN IF( StartingPointDash>0, MID(RestOfText2,StartingPointDash+1,99) )
I appreciate you giving me the fish, but, unfortunately, I do not understand your code (I've never written in DAX) nor do I have an explanation about the thinking that produced it. Can you start by explaining why you chose a calculated column / DAX vs doing Power Query / M in the query step and creating a new real column?
Hi @Zelbinian,
The @Phil_Seamark's solution is wonderful. Here is a solution of M based on your sample data. The file is attached.
Text.BeforeDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")
Text.AfterDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")
Best Regards,
Dale
In case anybody wants to see the full text of @v-jiascu-msft's code without having to download the file, here it is:
#"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each Text.BeforeDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-")), #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each Text.AfterDelimiter(Text.BetweenDelimiters([Column1], "-", ";"), "-"))
The 'each' in each statement seems like a particularly important detail there.
Also, that last example does not actually have a semicolon for the Text.BetweenDelimiters to key off of. Yet it seems to work. The the documentation of the function does not describe how the function behaves if the second delimeter doesn't exist, how did you figure out that solution?
Hi @Zelbinian,
I'm glad it can work. Firstly, we need to find the pattern that the computer can recognize. Secondly, find a function here that can do the job.
The functions will iterate every character of the parameter till the second delimiter or the end. So it won't be a problem if the second delimiter doesn't exist.
Best Regards,
Dale
Hi @Zelbinian
For something like this, you can use either M or DAX. I chose DAX because I'm more familiar with it, but it probably doesn't matter and there won't be a big performance difference. Personally, I find DAX code easier to read than M, but I love anything you can do in M just from the UI.
If I was to do this in M, the principals would be the same. Look for the guaranteed patters (#ourresearch) etc and then start breaking up the text using text functions.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
112 | |
97 | |
84 | |
67 | |
59 |
User | Count |
---|---|
150 | |
120 | |
99 | |
87 | |
68 |