Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
ukeasyproj
Helper II
Helper II

I have a column in DQ that is stored in HTML format

Is there any way to convert this column to regular text?

 

test.PNG

1 ACCEPTED SOLUTION
v-yuezhe-msft
Employee
Employee

Hi @ukeasyproj,

In your scenario, you would need to firstly remove HTML tags in SQL Server data source, then load data from SQL Server database to Power BI using DirectQuery mode. Please check the following detailed steps.

1. Create a function in your database using the following codes.

IF OBJECT_ID(N'dbo.RegexReplace') IS NOT NULL 
   DROP FUNCTION dbo.RegexReplace
GO
CREATE FUNCTION dbo.RegexReplace
(
  @pattern VARCHAR(255),
  @replacement VARCHAR(255),
  @Subject VARCHAR(MAX),
  @global BIT = 1,
 @Multiline bit =1
)
RETURNS VARCHAR(MAX)

AS BEGIN
DECLARE @objRegexExp INT,
    @objErrorObject INT,
    @strErrorMessage VARCHAR(255),
    @Substituted VARCHAR(8000),
    @hr INT,
    @Replace BIT

SELECT  @strErrorMessage = 'creating a regex object'
EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT
IF @hr = 0 
    SELECT  @strErrorMessage = 'Setting the Regex pattern',
            @objErrorObject = @objRegexExp
IF @hr = 0 
    EXEC @hr= sp_OASetProperty @objRegexExp, 'Pattern', @pattern
IF @hr = 0 /*By default, the regular expression is case sensitive. Set the IgnoreCase property to True to make it case insensitive.*/
    SELECT  @strErrorMessage = 'Specifying the type of match' 
IF @hr = 0 
    EXEC @hr= sp_OASetProperty @objRegexExp, 'IgnoreCase', 1
IF @hr = 0 
    EXEC @hr= sp_OASetProperty @objRegexExp, 'MultiLine', @Multiline
IF @hr = 0 
    EXEC @hr= sp_OASetProperty @objRegexExp, 'Global', @global
IF @hr = 0 
    SELECT  @strErrorMessage = 'Doing a Replacement' 
IF @hr = 0 
    EXEC @hr= sp_OAMethod @objRegexExp, 'Replace', @Substituted OUT,
        @subject, @Replacement
 /*If the RegExp.Global property is False (the default), Replace will return the @subject string with the first regex match (if any) substituted with the replacement text. If RegExp.Global is true, the @Subject string will be returned with all matches replaced.*/   
IF @hr <> 0 
    BEGIN
        DECLARE @Source VARCHAR(255),
            @Description VARCHAR(255),
            @Helpfile VARCHAR(255),
            @HelpID INT

        EXECUTE sp_OAGetErrorInfo @objErrorObject, @source OUTPUT,
            @Description OUTPUT, @Helpfile OUTPUT, @HelpID OUTPUT
        SELECT  @strErrorMessage = 'Error whilst '
                + COALESCE(@strErrorMessage, 'doing something') + ', '
                + COALESCE(@Description, '')
        RETURN @strErrorMessage
    END
EXEC sp_OADestroy @objRegexExp
RETURN @Substituted
END
Go

2. Then use this function to  remove HTML tags from your column, there is an example four reference. My original table looks like below.
1.png

3. Change configure settings using the following codes, otherwise you will get error message “SQL Server blocked access to procedure ‘sys.sp_OACreate’ of component ‘Ole Automation Procedures’ because this component is turned off as part of the security configuration for this server” when running Update statements in Step 4.

sp_configure'show advanced options', 1 
GO 
RECONFIGURE; 
GO 
sp_configure'Ole Automation Procedures', 1 
GO 
RECONFIGURE; 
GO 
sp_configure'show advanced options', 1 
GO 
RECONFIGURE;

4. Update the HTML field using the function created in Step 1.
2.PNG

5. Import data from SQL Server database to Power BI Desktop.

3.png


Thanks,
Lydia Zhang

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

8 REPLIES 8
v-yuezhe-msft
Employee
Employee

Hi @ukeasyproj,

In your scenario, you would need to firstly remove HTML tags in SQL Server data source, then load data from SQL Server database to Power BI using DirectQuery mode. Please check the following detailed steps.

1. Create a function in your database using the following codes.

IF OBJECT_ID(N'dbo.RegexReplace') IS NOT NULL 
   DROP FUNCTION dbo.RegexReplace
GO
CREATE FUNCTION dbo.RegexReplace
(
  @pattern VARCHAR(255),
  @replacement VARCHAR(255),
  @Subject VARCHAR(MAX),
  @global BIT = 1,
 @Multiline bit =1
)
RETURNS VARCHAR(MAX)

AS BEGIN
DECLARE @objRegexExp INT,
    @objErrorObject INT,
    @strErrorMessage VARCHAR(255),
    @Substituted VARCHAR(8000),
    @hr INT,
    @Replace BIT

SELECT  @strErrorMessage = 'creating a regex object'
EXEC @hr= sp_OACreate 'VBScript.RegExp', @objRegexExp OUT
IF @hr = 0 
    SELECT  @strErrorMessage = 'Setting the Regex pattern',
            @objErrorObject = @objRegexExp
IF @hr = 0 
    EXEC @hr= sp_OASetProperty @objRegexExp, 'Pattern', @pattern
IF @hr = 0 /*By default, the regular expression is case sensitive. Set the IgnoreCase property to True to make it case insensitive.*/
    SELECT  @strErrorMessage = 'Specifying the type of match' 
IF @hr = 0 
    EXEC @hr= sp_OASetProperty @objRegexExp, 'IgnoreCase', 1
IF @hr = 0 
    EXEC @hr= sp_OASetProperty @objRegexExp, 'MultiLine', @Multiline
IF @hr = 0 
    EXEC @hr= sp_OASetProperty @objRegexExp, 'Global', @global
IF @hr = 0 
    SELECT  @strErrorMessage = 'Doing a Replacement' 
IF @hr = 0 
    EXEC @hr= sp_OAMethod @objRegexExp, 'Replace', @Substituted OUT,
        @subject, @Replacement
 /*If the RegExp.Global property is False (the default), Replace will return the @subject string with the first regex match (if any) substituted with the replacement text. If RegExp.Global is true, the @Subject string will be returned with all matches replaced.*/   
IF @hr <> 0 
    BEGIN
        DECLARE @Source VARCHAR(255),
            @Description VARCHAR(255),
            @Helpfile VARCHAR(255),
            @HelpID INT

        EXECUTE sp_OAGetErrorInfo @objErrorObject, @source OUTPUT,
            @Description OUTPUT, @Helpfile OUTPUT, @HelpID OUTPUT
        SELECT  @strErrorMessage = 'Error whilst '
                + COALESCE(@strErrorMessage, 'doing something') + ', '
                + COALESCE(@Description, '')
        RETURN @strErrorMessage
    END
EXEC sp_OADestroy @objRegexExp
RETURN @Substituted
END
Go

2. Then use this function to  remove HTML tags from your column, there is an example four reference. My original table looks like below.
1.png

3. Change configure settings using the following codes, otherwise you will get error message “SQL Server blocked access to procedure ‘sys.sp_OACreate’ of component ‘Ole Automation Procedures’ because this component is turned off as part of the security configuration for this server” when running Update statements in Step 4.

sp_configure'show advanced options', 1 
GO 
RECONFIGURE; 
GO 
sp_configure'Ole Automation Procedures', 1 
GO 
RECONFIGURE; 
GO 
sp_configure'show advanced options', 1 
GO 
RECONFIGURE;

4. Update the HTML field using the function created in Step 1.
2.PNG

5. Import data from SQL Server database to Power BI Desktop.

3.png


Thanks,
Lydia Zhang

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
MarcelBeug
Community Champion
Community Champion

Coincidentally, recently I created a recursive function (PowerQuery / M) to remove HTML tags.

It has 4 arguments (of which 2 optional to run the function in test mode):
String,
Startposition (typically 0 to start with)
Optional current iteration (typically 1 to start with)
Optional maximum number of iterations.

 

        fnRHTMLT = (String as text, StartPosition as number, optional Iteration as number, optional MaxIterations as number) as text =>
        let
            StringFromStartposition = Text.RemoveRange(String, 0, StartPosition),
            StartPositionEndTag = Text.PositionOf(StringFromStartposition, "</"),
            PositionsEndTag = if StartPositionEndTag = -1 
                                then -1 
                                else Text.PositionOf(Text.RemoveRange(StringFromStartposition, 0, StartPositionEndTag),">"),
            StartTag = if PositionsEndTag = -1 
                       then null 
                       else "<" & Text.Range(StringFromStartposition, StartPositionEndTag + 2, PositionsEndTag - 1),
            StartPositionStartTag = if PositionsEndTag = -1 
                                    then -1 
                                    else Text.PositionOf(Text.Start(String,StartPosition + StartPositionEndTag),StartTag,Occurrence.Last),
            NewString = if StartPositionStartTag = -1 
                        then String 
                        else Text.RemoveRange(Text.RemoveRange(String,StartPosition + StartPositionEndTag, PositionsEndTag + 1),StartPositionStartTag, PositionsEndTag),
            NextStartPosition = if PositionsEndTag = -1
                                then -1
                                else if StartPositionStartTag = -1
                                     then StartPosition + StartPositionEndTag + 1
                                     else StartPosition + StartPositionEndTag - PositionsEndTag,
            Result = if NextStartPosition = -1
                     then NewString
                     else if Iteration = null 
                          then @fnRHTMLT(NewString, NextStartPosition)
                          else if Iteration = MaxIterations
                               then NewString
                               else @fnRHTMLT(NewString, NextStartPosition, Iteration + 1, MaxIterations)

        in
            Result
Specializing in Power Query Formula Language (M)

@MarcelBeug

 

Hey, I am new to power bi, in the query editor, where would I need to insert your code?

You can incorporate the code in your query in the Advanced Editor.

 

Example:

 

let
    <Insert the function here followed by a comma>,
    TableWithDescription = Table.FromList({"<BODY><p>MarcelBeug</p></BODY>"},null,type table[Description = text]),
    #"Added Custom" = Table.AddColumn(TableWithDescription, "HTMLRemoved", each fnRHTMLT([Description],0))
in
    #"Added Custom"
Specializing in Power Query Formula Language (M)

@MarcelBeug

 

This is my current query as folllows:

 

let
    Source = Sql.Database(ServerName, DatabaseName),
    dbo_TT_Projects = Source{[Schema="dbo",Item="TT_Projects"]}[Data],
    #"Filtered Rows" = Table.SelectRows(dbo_TT_Projects, each [AccountId] = AccountID),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"ProjectID", "Name", "EndDate", "ActCompletionDate", "Duration", "StartDate", "CreationDate", "ProjectStatusID", "BillingType", "BillingAmount", "PriorityID", "Progress", "CustomerID", "CreatorID", "EstimatedHours", "Description", "Budget", "PortfolioID", "Billed", "MD_ProjectFields"}),
    #"Expanded MD_ProjectFields" = Table.ExpandTableColumn(#"Removed Other Columns", "MD_ProjectFields", {"ProjectDECIMAL0", "ProjectDECIMAL1", "ProjectDECIMAL2", "ProjectDECIMAL3", "ProjectDECIMAL4", "ProjectDATETIME0", "ProjectDATETIME1", "ProjectDATETIME2", "ProjectDATETIME3", "ProjectDATETIME4", "ProjectDATETIME5"}, {"ProjectDECIMAL0", "ProjectDECIMAL1", "ProjectDECIMAL2", "ProjectDECIMAL3", "ProjectDECIMAL4", "ProjectDATETIME0", "ProjectDATETIME1", "ProjectDATETIME2", "ProjectDATETIME3", "ProjectDATETIME4", "ProjectDATETIME5"}),
    #"Renamed CustomFields" = Table.RenameColumns(#"Expanded MD_ProjectFields",{{"ProjectDATETIME1", "Original End Date"}, {"ProjectDATETIME2", "Q1 End Date"}, {"ProjectDATETIME3", "Q2 End Date"}, {"ProjectDATETIME4", "Q3 End Date"}, {"ProjectDATETIME5", "Q4 End Date"}, {"ProjectDECIMAL0", "Original Estimated Hours"}, {"ProjectDECIMAL1", "Q1"}, {"ProjectDECIMAL2", "Q2"}, {"ProjectDECIMAL3", "Q3"}, {"ProjectDECIMAL4", "Q4"}}),
    #"Renamed Columns" = Table.RenameColumns(#"Renamed CustomFields",{{"Name", "Project Name"}, {"StartDate", "Start Date"}, {"EndDate", "End Date"}, {"ProjectDATETIME0", "PO ACK DATE"}})
in
    #"Renamed Columns"

Still confused where I should put your function, the column I want to invoke it on is called Description

You can put the code between "let" and "Source". And don't forget to put a comma after "Result".

Now you can use the function for instance by adding a custom column with code: fnRHTMLT([Description],0)

Specializing in Power Query Formula Language (M)

let

   fnRHTMLT = (String as text, StartPosition as number, optional Iteration as number, optional MaxIterations as number) as text =>
        let
            StringFromStartposition = Text.RemoveRange(String, 0, StartPosition),
            StartPositionEndTag = Text.PositionOf(StringFromStartposition, "</"),
            PositionsEndTag = if StartPositionEndTag = -1 
                                then -1 
                                else Text.PositionOf(Text.RemoveRange(StringFromStartposition, 0, StartPositionEndTag),">"),
            StartTag = if PositionsEndTag = -1 
                       then null 
                       else "<" & Text.Range(StringFromStartposition, StartPositionEndTag + 2, PositionsEndTag - 1),
            StartPositionStartTag = if PositionsEndTag = -1 
                                    then -1 
                                    else Text.PositionOf(Text.Start(String,StartPosition + StartPositionEndTag),StartTag,Occurrence.Last),
            NewString = if StartPositionStartTag = -1 
                        then String 
                        else Text.RemoveRange(Text.RemoveRange(String,StartPosition + StartPositionEndTag, PositionsEndTag + 1),StartPositionStartTag, PositionsEndTag),
            NextStartPosition = if PositionsEndTag = -1
                                then -1
                                else if StartPositionStartTag = -1
                                     then StartPosition + StartPositionEndTag + 1
                                     else StartPosition + StartPositionEndTag - PositionsEndTag,
            Result = if NextStartPosition = -1
                     then NewString
                     else if Iteration = null 
                          then @fnRHTMLT(NewString, NextStartPosition)
                          else if Iteration = MaxIterations
                               then NewString
                               else @fnRHTMLT(NewString, NextStartPosition, Iteration + 1, MaxIterations)

        in
            Result,



    Source = Sql.Database(ServerName, DatabaseName),
    dbo_TT_Projects = Source{[Schema="dbo",Item="TT_Projects"]}[Data],
    #"Filtered Rows" = Table.SelectRows(dbo_TT_Projects, each [AccountId] = AccountID),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"ProjectID", "Name", "EndDate", "ActCompletionDate", "Duration", "StartDate", "CreationDate", "ProjectStatusID", "BillingType", "BillingAmount", "PriorityID", "Progress", "CustomerID", "CreatorID", "EstimatedHours", "Description", "Budget", "PortfolioID", "Billed", "MD_ProjectFields"}),
    #"Expanded MD_ProjectFields" = Table.ExpandTableColumn(#"Removed Other Columns", "MD_ProjectFields", {"ProjectDECIMAL0", "ProjectDECIMAL1", "ProjectDECIMAL2", "ProjectDECIMAL3", "ProjectDECIMAL4", "ProjectDATETIME0", "ProjectDATETIME1", "ProjectDATETIME2", "ProjectDATETIME3", "ProjectDATETIME4", "ProjectDATETIME5"}, {"ProjectDECIMAL0", "ProjectDECIMAL1", "ProjectDECIMAL2", "ProjectDECIMAL3", "ProjectDECIMAL4", "ProjectDATETIME0", "ProjectDATETIME1", "ProjectDATETIME2", "ProjectDATETIME3", "ProjectDATETIME4", "ProjectDATETIME5"}),
    #"Renamed CustomFields" = Table.RenameColumns(#"Expanded MD_ProjectFields",{{"ProjectDATETIME1", "Original End Date"}, {"ProjectDATETIME2", "Q1 End Date"}, {"ProjectDATETIME3", "Q2 End Date"}, {"ProjectDATETIME4", "Q3 End Date"}, {"ProjectDATETIME5", "Q4 End Date"}, {"ProjectDECIMAL0", "Original Estimated Hours"}, {"ProjectDECIMAL1", "Q1"}, {"ProjectDECIMAL2", "Q2"}, {"ProjectDECIMAL3", "Q3"}, {"ProjectDECIMAL4", "Q4"}}),
    #"Renamed Columns" = Table.RenameColumns(#"Renamed CustomFields",{{"Name", "Project Name"}, {"StartDate", "Start Date"}, {"EndDate", "End Date"}, {"ProjectDATETIME0", "PO ACK DATE"}}),
    #"Added Custom" = Table.AddColumn(dbo_TT_Projects, "HTMLRemoved", each fnRHTMLT([Description],0))

in
    #"Added Custom"

 

I did the following but got an error saying it is not supported in DirectQuery Mode

Sorry, I didn't take into account any Direct Query limitations.

I can't think of any alternative that would work in DQ mode.
Maybe something could be done in your database environment, but that would be outside my scope.

 

Specializing in Power Query Formula Language (M)

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.