Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
Anonymous
Not applicable

Removing HTML Tags and extracting text

Hey guys, 

 

I need help figuring out how to remove/clean HTML data. I need to just extract the text. I'll copy a sample below for your information. 

 

Thanks in advanced. 

 

p
<div class="ExternalClass5796EA3CBFB24C2DA402D911F488833D"></p><p>Document Agresso Pipeline 2019-Q2 PID.&#160;</p><p><br>&#160;</p><p>Sue scheduled to work Monday next week and perhaps a couple of other days (tbc).?</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;">AP127 - LATCo</strong>&#160;&#160;<br style="margin&#58;0px;line-height&#58;20.8px;"></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#5...
<div class="ExternalClassC964E2629C2C4CD7B480E1C98C6DE34C"><p></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;">Prepare Project Board slides in readiness for meeting to be held on 20/03/2019.<br></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;">Arrange Sue's work schedule.</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line...
<div class="ExternalClassE55DE38454154EFEB55A531968AB8AC7"><p></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">ICT Team will be conducting Year-End activities next week so the time available to be spent on Agresso Pipeline activities will be limited.&#160;<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Session arranged with Phil to handover all BA related&#160;Agresso Pipeline activities and to obtain a status update.<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;"><br></strong></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;">AP127 - LATCo</strong>&#160;&#160;<br style="margin&#58;0px;line-height&#58;20.8px;"></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Meeting arranged for&#160;26/03/2019.<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Review a...
<div class="ExternalClassDB3FBF1C1A9F4B8CA432B1895A7EA234"><p>Prepare to start writing an EOP report and associated slides.</p>?????<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><br></p>?<br></p></div>
<div class="ExternalClassF8A3A37CEA9346639AA50BB961BDB29E">Once CJI3 available complete EOP slides.</p><p><span style="font&#58;400 13px/20.8px &quot;segoe ui&quot;,&quot;segoe&quot;,tahoma,helvetica,arial,sans-serif;text-align&#58;left;color&#58;#444444;text-transform&#58;none;text-indent&#58;0px;letter-spacing&#58;normal;text-decoration&#58;none;word-spacing&#58;0px;display&#58;inline !important;white-space&#58;normal;orphans&#58;2;font-size-adjust&#58;none;font-stretch&#58;normal;float&#58;none;background-color&#58;transparent;">Request CJI3 report once all known costs have been applied.</span><span style="font&#58;400 13px/20.8px &quot;segoe ui&quot;,&quot;segoe&quot;,tahoma,helvetica,arial,sans-serif;text-align&#58;left;color&#58;#444444;text-transform&#58;none;text-indent&#58;0px;letter-spacing&#58;normal;text-decoration&#58;none;word-spacing&#58;0px;display&#58;inline !important;white-space&#58;normal;orphans&#58;2;font-size-adjust&#58;none;font-stretch&#58;normal;float&#58;none;background-color&#58...
<div class="ExternalClass53ABD8C10790435FA20EA6B3C0658E1B"><p></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;">Once CJI3 available complete EOP slides.</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;"><span style="text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#...
<div class="ExternalClass18D86FBC3818425FB569E604DBA97FA9"><p></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Once CJI3 available complete EOP slides.</p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><span style="line-height&#58;20.8px;background-color&#58;transparent;">Request CJI3 report once all known costs have been applied?</span></p><br></p></div>
<div class="ExternalClass088C5894BE554D8BBFEC053C5FE81D55"><p></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;">Once CJI3 available complete EOP slides.</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;"><span style="line-height&#58;20.8px;background-color&#58;transparent;">Request CJI3 report once all known costs have been applied??</span></p><br>&#160;</p></div>
<div class="ExternalClass9E6533746EAB4AF2BF3FB94E6A65AFED"><p></p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;">Once CJI3 available complete EOP slides.</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;"><span style="line-height&#58;20.8px;background-color&#58;transparent;">Request CJI3 report once all known costs have been applied???</span></p><br>&#160;</p></div>
<div class="ExternalClassCA561E6DB1F5409D857A1841B962611B">CR3 documented and submitted for approval. This CR requests that the implementation of CA client is restarted following the same deployment method as for LATCo.</p><p>CR2 will need to be revised as the licensing costs for CA will change due to a reduced number of client licences being required (6 as opposed to 20).</p><p>Sue not worked in PCC this week due to other commitments.?</p><p>First PCC/Lincs Agresso Forum meeting held on 05/03/2019, at which Agresso experiences were shared between PCC and LIns.&#160;<br></p><p><br>&#160;</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-weight&#58;400;text-decoration&#58;none;word-spacing&#58;0px;white-space&#58;normal;orphans&#58;2;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;">...
<div class="ExternalClassDBE9541976CE4649B188607B0788AE75"><p> <span style="font&#58;400 13px/20.8px &quot;segoe ui&quot;, segoe, tahoma, helvetica, arial, sans-serif;text-align&#58;left;color&#58;#444444;text-transform&#58;none;text-indent&#58;0px;letter-spacing&#58;normal;text-decoration&#58;none;word-spacing&#58;0px;display&#58;inline;white-space&#58;normal;orphans&#58;2;float&#58;none;background-color&#58;transparent;">Agresso Pipeline 2019-Q2 PID submitted and approved.&#160;</span> </p><p>Awaiting approval of CR3.</p><p>Awaiting license costs for CR2, unable to revise CR until license costs provided.</p><p>Sue has had problems logging onto PCC network remotely. She has continued to undertake contract related pipeline work for Richard McCarthy.<br></p><p><br>&#160;</p><p style="margin&#58;0px 0px 10px;text-align&#58;left;color&#58;#444444;text-transform&#58;none;line-height&#58;20.8px;text-indent&#58;0px;letter-spacing&#58;normal;font-size&#58;13px;font-style&#58;normal;font-variant&#58;normal;font-we...
<div class="ExternalClass7B68D75BF86F4A86837989FCAFCF3799"><p><span style="font-family&#58;&quot;segoe ui&quot;, segoe, tahoma, helvetica, arial, sans-serif;">Agresso Pipeline 2019-Q2 PID revised,&#160;submitted and approved.&#160;</span><br></p><p>CR3 approved.<br></p><p>Amended costs for CA licences added to CR2 and revised CR submitted to PMO.<br></p><p>Financial&#160;Year-End activities commence next week so the time spent on the&#160;Agresso Pipeline by the ICT Team will be limited.<br></p><p>Project Board meeting held on 20/03/2019. <br></p><p>Timesheet for Sue's time on spent on&#160;28/02 approved.<br></p><p></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;">AP99 - Combined Authority&#160;</strong><br style="margin&#58;0px;line-height&#58;20.8px;"></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><span style="margin&#58;0px;line-height&#58;20.8px;">Completed enough of&#160;the configuration of the&#160...
<div class="ExternalClass217D5FF855074455A8242C61FC5C1EB8"><p><span style="font-family&#58;&quot;segoe ui&quot;, segoe, tahoma, helvetica, arial, sans-serif;">Agresso Pipeline 2019-Q2 PID</span> approved&#160;by PCC. <br></p><p>Agresso ICT Team undertook financial year-end activities which have restricted the amount of time available to spend on the project.<br></p><p>BA handover meeting held with Phil.<br></p><p>Amended&#160;CR2 presented and approved at Gate meeting on 29/03/2019.<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;"><strong style="margin&#58;0px;line-height&#58;20.8px;">AP99 - Combined Authority&#160;</strong> <br style="margin&#58;0px;line-height&#58;20.8px;">CR 2 approved at gate&#160;meeting.<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Initial testing of system&#160;taking place.<br></p><p style="line-height&#58;20.8px;background-color&#58;transparent;">Meeting arranged on 01/04/2019 to discuss testing approach and monitoring requir...
2 ACCEPTED SOLUTIONS
Fowmy
Super User
Super User

@Anonymous 

Pasted your sample HTML data in PQ using ENTER DATA table option, you can import from your HTML file as Web source.
So the data looks like this

Fowmy_0-1597128347221.png


The Added a custom column with following code

 

=Html.Table([Column1], {{"ExtractedText",":root"}})

 


Then Expanded the New Column, you get only the text

Fowmy_1-1597128474182.png


If you want to combine al the above lines into one CELL, add the following line:

=Text.Combine(#"Expanded Custom"[ExtractedText],"#(lf)")

Fowmy_0-1597128910739.png

 






________________________

Did I answer your question? Mark this post as a solution, this will help others!.

Click on the Thumbs-Up icon on the right if you like this reply 🙂

YouTube, LinkedIn

 

 

Did I answer your question? Mark my post as a solution! and hit thumbs up


Subscribe and learn Power BI from these videos

Website LinkedIn PBI User Group

View solution in original post

Anonymous
Not applicable

@Fowmy That's awesome, thanks for the advice. 

 

I've also discovered the custom Visual HTML Content, which displays HTML Rich text in their natural/intended form. Also works really nicely. 

Karlos_0-1597139333548.png

 

View solution in original post

3 REPLIES 3
Fowmy
Super User
Super User

@Anonymous 

Pasted your sample HTML data in PQ using ENTER DATA table option, you can import from your HTML file as Web source.
So the data looks like this

Fowmy_0-1597128347221.png


The Added a custom column with following code

 

=Html.Table([Column1], {{"ExtractedText",":root"}})

 


Then Expanded the New Column, you get only the text

Fowmy_1-1597128474182.png


If you want to combine al the above lines into one CELL, add the following line:

=Text.Combine(#"Expanded Custom"[ExtractedText],"#(lf)")

Fowmy_0-1597128910739.png

 






________________________

Did I answer your question? Mark this post as a solution, this will help others!.

Click on the Thumbs-Up icon on the right if you like this reply 🙂

YouTube, LinkedIn

 

 

Did I answer your question? Mark my post as a solution! and hit thumbs up


Subscribe and learn Power BI from these videos

Website LinkedIn PBI User Group

Anonymous
Not applicable

@Fowmy That's awesome, thanks for the advice. 

 

I've also discovered the custom Visual HTML Content, which displays HTML Rich text in their natural/intended form. Also works really nicely. 

Karlos_0-1597139333548.png

 

edhans
Super User
Super User

Take a look at Chris's post on this. - Removing HTML tags.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

Top Solution Authors
Top Kudoed Authors