cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
TerrificPoet Regular Visitor
Regular Visitor

Re: Robust function to remove HTML tags

Hello,

I have the same problem as you but i'm not sure where should i copy and paste that function. I have a table with multiple columns and i just need to clean the data of HTML TAGS in one of them could you please explain in more details how you did it 

Kind regards,

 

Nicolas

Grumelo Regular Visitor
Regular Visitor

Re: Robust function to remove HTML tags

This solution is capable of removing tags like

<p> </p>

but not something like

<div class="..."> </div>

 

 

An example of function call would be welcome. I don't understand what the meaning of "Iteration".

Is it the amount of tags you want to remove?

Highlighted
Grumelo Regular Visitor
Regular Visitor

Re: Robust function to remove HTML tags

My solution to clean HTML tags

 

let
   TextFromHtml = (HTML as any) =>
let
    Source = if HTML = null then
        ""
    else
        Text.From(HTML),
    SplitAny = Text.SplitAny(Source,"<>"),
    ListAlternate = List.Alternate(SplitAny,1,1,1),
    ListSelect = List.Select(ListAlternate, each _<>""),
    TextCombine = Text.Combine(ListSelect, "")
in
    TextCombine
in
    TextFromHtml

For beginners: create a blank query an copy paste the code above; rename the query as "TextFromHtml"

 

Example of calling this function on a column called Comment:
= Table.TransformColumns(#"Previous Step",{{"Comment", TextFromHtml, type text}})

 

Reference: https://social.technet.microsoft.com/Forums/office/en-US/b080d122-14f0-43bc-8a86-f50cdf1c32d8/removi...

mmayer Frequent Visitor
Frequent Visitor

Re: Robust function to remove HTML tags

I tried incorporating your code to clean up the HTML in only one of my columns. But I get the following error: "Expresion.Error: 1 arguments were passed to function which expects between 2 and 4.

Details:

Pattern=

Arguments=List"

 

This is the HTML I am trying to remove from the [Notes] column:

 

Before:

<div class="ExternalClass4B5FF8D025D448A5904CF90969BB5E7E">Request includes&#58; Status indicator for project costs and job progress, Email notification to supervisors of outstanding/incomplete items​.</div><div class="ExternalClass4B5FF8D025D448A5904CF90969BB5E7E">TSR has been submitted.&#160; Waiting on TEC.</div><div class="ExternalClass4B5FF8D025D448A5904CF90969BB5E7E">SWS clean-up and data update in progress in preparation of system modifications.</div><div class="ExternalClass4B5FF8D025D448A5904CF90969BB5E7E">System cleaned up and ready to work with TEC on revamp.<br></div>

 

After:

Request includes&#58; Status indicator for project costs and job progress, Email notification to supervisors of outstanding/incomplete items​.TSR has been submitted.&#160; Waiting on TEC. SWS clean-up and data update in progress in preparation of system modifications. System cleaned up and ready to work with TEC on revamp.

 

 

Here is the code:

 

let
  Source = SharePoint.Tables("https://orgname.sharepoint.com/sites/department/", [ApiVersion = 15]),
    #"2be78719-1e12-4827-8f88-d9edd1a7781f" = Source{[Id="2be78719-1e12-4827-8f88-d9edd1a7781f"]}[Items],
    #"Renamed Columns" = Table.RenameColumns(#"2be78719-1e12-4827-8f88-d9edd1a7781f",{{"ID", "ID.1"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"End Date", type date}, {"Start Date", type date}, {"Today", type date}, {"End Date (Actu", type date}, {"Created", type date}, {"Modified", type date}, {"EndDateT", type text}, {"Completion", Int64.Type}, {"Id", Int64.Type}, {"ID.1", Int64.Type}, {"EditorId", Int64.Type}}),
      fnRHTMLT = (String as text, StartPosition as number, optional Iteration as number, optional MaxIterations as number) as text =>
        let
            StringFromStartposition = Text.RemoveRange(String, 0, StartPosition),
            StartPositionEndTag = Text.PositionOf(StringFromStartposition, "</"),
            PositionsEndTag = if StartPositionEndTag = -1 
                                then -1 
                                else Text.PositionOf(Text.RemoveRange(StringFromStartposition, 0, StartPositionEndTag),">"),
            StartTag = if PositionsEndTag = -1 
                       then null 
                       else "<" & Text.Range(StringFromStartposition, StartPositionEndTag + 2, PositionsEndTag - 1),
            StartPositionStartTag = if PositionsEndTag = -1 
                                    then -1 
                                    else Text.PositionOf(Text.Start(String,StartPosition + StartPositionEndTag),StartTag,Occurrence.Last),
            NewString = if StartPositionStartTag = -1 
                        then String 
                        else Text.RemoveRange(Text.RemoveRange(String,StartPosition + StartPositionEndTag, PositionsEndTag + 1),StartPositionStartTag, PositionsEndTag),
            NextStartPosition = if PositionsEndTag = -1
                                then -1
                                else if StartPositionStartTag = -1
                                     then StartPosition + StartPositionEndTag + 1
                                     else StartPosition + StartPositionEndTag - PositionsEndTag,
            Result = if NextStartPosition = -1
                     then NewString
                     else if Iteration = null 
                          then @fnRHTMLT(NewString, NextStartPosition)
                          else if Iteration = MaxIterations
                               then NewString
                               else @fnRHTMLT(NewString, NextStartPosition, Iteration + 1, MaxIterations)

        in
            Result,      
    #"RemoveHTML" = Table.TransformColumns(#"Changed Type",{{"Notes", fnRHTMLT, type text}}),

in
    RemoveHTML