cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
ianbruckner
Regular Visitor

Remove text between html tags

I'm looking for help to remove html tags from a string.

 

Example input: <div class="ExternalClass742C332E0D0340C598BC9A78413A04DE">Staff going to storage training</div>

Desired output: Staff going to storage training

 

I found @MarcelBeug 's post in  Robust-function-to-remove-HTML-tags very helpful, but can't seem to nail the magic combo of also including other tags in the <> brackets.... like class=somethingIdon'tcaretosee

1 ACCEPTED SOLUTION

Accepted Solutions
jsh121988 Resolver I
Resolver I

Re: Remove text between html tags

You could try using PowerQuery Text Between function and use '>' and '</' as the delimiters.

 

You can also do this in DAX using PATHITEM and SUBSTITUTE.

InnerHTML = 
PATHITEM( // Splits the string using delimiter "|", and takes the 2nd item that is a type of Text
SUBSTITUTE( // Output <div class="Whatever"|Staff going to storage training|div>
SUBSTITUTE([Html], ">", "|"), // Output <div class="Whatever"|Staff going to storage training</div>
"</","|"
),
2,
TEXT
)

View solution in original post

4 REPLIES 4
jsh121988 Resolver I
Resolver I

Re: Remove text between html tags

You could try using PowerQuery Text Between function and use '>' and '</' as the delimiters.

 

You can also do this in DAX using PATHITEM and SUBSTITUTE.

InnerHTML = 
PATHITEM( // Splits the string using delimiter "|", and takes the 2nd item that is a type of Text
SUBSTITUTE( // Output <div class="Whatever"|Staff going to storage training|div>
SUBSTITUTE([Html], ">", "|"), // Output <div class="Whatever"|Staff going to storage training</div>
"</","|"
),
2,
TEXT
)

View solution in original post

Community Support
Community Support

Re: Remove text between html tags

Hi @ianbruckner ,

 

We can create a calculated column using DAX as well.

Column = 
VAR len =
    SEARCH ( ">", Table1[Column1],, BLANK () )
VAR len2 =
    SEARCH ( "</div>", Table1[Column1],, BLANK () )
RETURN
    MID ( Table1[Column1], len + 1, len2 - len - 1 )

Capture.PNG

 

Regards,

Frank

 

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the others find it more quickly.
ianbruckner
Regular Visitor

Re: Remove text between html tags

I ended up using Text.BetweenDelimiters, and it certainly stripped out the first tag. Then I found enough other tags hidden inside the input that 1, makes it impractical to state them all, and then 2, scared enough I'd lose data if I went to the lowest common denominator of > < as the delimiters alone... so I gave up. Instead, I'll probably use the HTML visualization that's filtered by a user selecting the row that contains the rest of the data. I really wish I could embed that in the table - oh well for now.

 

Thanks for the pointers!

jsh121988 Resolver I
Resolver I

Re: Remove text between html tags

Can you give a sample input and output where multiple tags are present? Maybe use 3 layers of nested tags if you have that example?

Helpful resources

Announcements
Announcing the New Spanish Forum

Announcing the New Spanish Forum

Do you need help in Spanish? Check out our new Spanish community section.

MBAS Gallery 2020

MBAS Gallery 2020

Watch Microsoft Business Applications Summit sessions on-demand.

‘Better Together’ Integration Forum Launch

‘Better Together’ Integration Forum Launch

We've launched a how-to forum where you can learn about how Power BI integrates with other Power Platform products.

Top Solution Authors
Top Kudoed Authors