cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
ianbruckner
Frequent Visitor

Remove text between html tags

I'm looking for help to remove html tags from a string.

 

Example input: <div class="ExternalClass742C332E0D0340C598BC9A78413A04DE">Staff going to storage training</div>

Desired output: Staff going to storage training

 

I found @MarcelBeug 's post in  Robust-function-to-remove-HTML-tags very helpful, but can't seem to nail the magic combo of also including other tags in the <> brackets.... like class=somethingIdon'tcaretosee

1 ACCEPTED SOLUTION
jsh121988
Microsoft
Microsoft

You could try using PowerQuery Text Between function and use '>' and '</' as the delimiters.

 

You can also do this in DAX using PATHITEM and SUBSTITUTE.

InnerHTML = 
PATHITEM( // Splits the string using delimiter "|", and takes the 2nd item that is a type of Text
SUBSTITUTE( // Output <div class="Whatever"|Staff going to storage training|div>
SUBSTITUTE([Html], ">", "|"), // Output <div class="Whatever"|Staff going to storage training</div>
"</","|"
),
2,
TEXT
)

View solution in original post

5 REPLIES 5
Pascal_KTeam
Resolver I
Resolver I

Removing HTML tasks can be a daunting task, I had to do this as well and it always needed some maintenance as new cases were coming in. As an alternative, why don't you just use a custom visual that can actually deal with HTML?

Take a look at the HTML Text Styler visual which you can get from the AppSource: https://appsource.microsoft.com/en-us/product/power-bi-visuals/wa200002071?tab=overview

v-frfei-msft
Community Support
Community Support

Hi @ianbruckner ,

 

We can create a calculated column using DAX as well.

Column = 
VAR len =
    SEARCH ( ">", Table1[Column1],, BLANK () )
VAR len2 =
    SEARCH ( "</div>", Table1[Column1],, BLANK () )
RETURN
    MID ( Table1[Column1], len + 1, len2 - len - 1 )

Capture.PNG

 

Regards,

Frank

 

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the others find it more quickly.
jsh121988
Microsoft
Microsoft

You could try using PowerQuery Text Between function and use '>' and '</' as the delimiters.

 

You can also do this in DAX using PATHITEM and SUBSTITUTE.

InnerHTML = 
PATHITEM( // Splits the string using delimiter "|", and takes the 2nd item that is a type of Text
SUBSTITUTE( // Output <div class="Whatever"|Staff going to storage training|div>
SUBSTITUTE([Html], ">", "|"), // Output <div class="Whatever"|Staff going to storage training</div>
"</","|"
),
2,
TEXT
)

I ended up using Text.BetweenDelimiters, and it certainly stripped out the first tag. Then I found enough other tags hidden inside the input that 1, makes it impractical to state them all, and then 2, scared enough I'd lose data if I went to the lowest common denominator of > < as the delimiters alone... so I gave up. Instead, I'll probably use the HTML visualization that's filtered by a user selecting the row that contains the rest of the data. I really wish I could embed that in the table - oh well for now.

 

Thanks for the pointers!

Can you give a sample input and output where multiple tags are present? Maybe use 3 layers of nested tags if you have that example?

Helpful resources

Announcements
June 2022 update 768X460.jpg

Check it out!

Click here to read more about the June 2022 updates!

Power Platform Conf 2022 768x460.jpg

Join us for Microsoft Power Platform Conference

The first Microsoft-sponsored Power Platform Conference is coming in September. 100+ speakers, 150+ sessions, and what's new and next for Power Platform.

Power BI Dev Camp Session 23 768x460.jpg

Check it Out!

Mark your calendars and join us on Thursday, June 30 at 11a PDT for a great session with Ted Pattison!

June 20 episode 7 with aka link 768x460.jpg

The Power BI Community Show

Join us on June 20 at 11 am PDT when Kim Manis shares the latest on Azure Synapse analytics, the Microsoft Intelligent Data Platform, and notable Power BI Updates from Microsoft Build 2022.

Top Solution Authors
Top Kudoed Authors