Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
JK_PowerBINew
Helper II
Helper II

Extracting names from html in a column with variable structure

Hi there,

 

I have connected to a SharePoint List which imports data from a Microsoft Form. When the data is imported it replaces some of the text with HTML code:

For Example

Appointed Candidate(s)
<div class="ExternalClassF09666AB6C1F4582804BC8F27B5D8669"><div style="font-family&#58;Calibri, Arial, Helvetica, sans-serif;font-size&#58;11pt;color&#58;rgb(0, 0, 0);"><span style="color&#58;black;">​Jane Smith</div>

I want to wrangle the column so that only the name within the HTML code would be present:

Appointed Candidate(s)
Jane Smith

 

I originally tried to combat this by creating a Custom Column from Examples and entering the name in the column to extract the text from between a set of delimiters, however, the HTML code is variable as some cells contain 2 or 3 peoples names and the data is inconsistent as it has been entered in free format on the Microsoft Form.

For example:

Appointed Candidate(s)

"<div class="ExternalClassF09666AB6C1F4582804BC8F27B5D8669"><div style="font-family&#58;Calibri, Arial, Helvetica, sans-serif;font-size&#58;11pt;color&#58;rgb(0, 0, 0);"><span style="color&#58;black;">​Jane Smith</span><br></div></div>"

"<div class="ExternalClass13330A41C3914588B1527918A7C4DA70"><div><span style="color&#58;black;"></span></div></div><div style="color&#58;rgb(29, 29, 29);background-color&#58;rgb(255, 255, 255);"><div><span style="color&#58;black;">00</span>1&#58;&#160;<span style="background-color&#58;rgb(255, 255, 255);display&#58;inline !important;">John Smith</span></div></div><div style="color&#58;rgb(29, 29, 29);background-color&#58;rgb(255, 255, 255);">002&#58;&#160;<span style="background-color&#58;rgb(255, 255, 255);display&#58;inline !important;">Jack Jill</span></div><div style="color&#58;rgb(29, 29, 29);background-color&#58;rgb(255, 255, 255);">003&#58;&#160;Peter Rabbit</div><div>"

 

This converts the DAX formula to one which is an if statement and replaces the HTML code with the desired name - which I don't think would work when data is refreshed and new entries are added as each if statement will be entirely unique.

 

Is there a way to tackle this and to just extract the names from the HTML code in the format I described?

 

Thanks!

1 ACCEPTED SOLUTION
Vijay_A_Verma
Super User
Super User

Use this in PQ. See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately. If you have columns other than these, then delete Changed type step and do a Changed type for complete table from UI again

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZNBT8IwFMe/SsWLJCO2G+u2LB62ITGcjB6BQzcKNJZuaQtRT979ln4SO4cMYRglMVxM1mVt3/vv/359HQ5boyWETjZhK5BxotTVqHX9qKkUhCflvA8DjHEU4wT1u65v+7AbJ37f9mK352McjCoBWsso/cSpkZnmQnemZMH4U7lp43PXDxPCWSqZBSLJCLfADeUrqllGLKCIUB1FJZuGH6mKPdM6EaFCh1nOc1mvyVl6AS1QPu3wqxFVELFxspOWcpI9bOLfXl4HRFBwv2B6XiVfmjKqzdbY+ifURKjM3lZL5fasBngMUOQ4Doy6KHECZID6MXJtL0B+5CXdXuTBPaDHlXWgksPe9w+wgbYdWKAa7TA1v5vJfCkmnaZI1zVh69fu6RxdFYR7daE6dv2FMAwbhH/ld8JUwclW4zLBmemSM7YocqmJ0BtPg3wuvu2eUzM3ISdnZHTBgHH+Q0R/j8RpRHJLzU0FdyRNmT7o7POuj98B", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Appointed Candidate(s)" = _t]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Appointed Candidate(s)", Splitter.SplitTextByDelimiter("</", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Appointed Candidate(s)"),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Split Column by Delimiter", "Text After Delimiter", each Text.AfterDelimiter([#"Appointed Candidate(s)"], ">", {0, RelativePosition.FromEnd}), type text),
    #"Added Custom" = Table.AddColumn(#"Inserted Text After Delimiter", "Extracted Name", each Text.Select([Text After Delimiter],{"a".."z","A".."Z"," "})),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Text After Delimiter"}),
    #"Filtered Rows1" = Table.SelectRows(#"Removed Columns", each ([Extracted Name] <> ""))
in
    #"Filtered Rows1"

 

View solution in original post

3 REPLIES 3
Vijay_A_Verma
Super User
Super User

Use this in PQ. See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test (later on when you use the query on your dataset, you will have to change the source appropriately. If you have columns other than these, then delete Changed type step and do a Changed type for complete table from UI again

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZNBT8IwFMe/SsWLJCO2G+u2LB62ITGcjB6BQzcKNJZuaQtRT979ln4SO4cMYRglMVxM1mVt3/vv/359HQ5boyWETjZhK5BxotTVqHX9qKkUhCflvA8DjHEU4wT1u65v+7AbJ37f9mK352McjCoBWsso/cSpkZnmQnemZMH4U7lp43PXDxPCWSqZBSLJCLfADeUrqllGLKCIUB1FJZuGH6mKPdM6EaFCh1nOc1mvyVl6AS1QPu3wqxFVELFxspOWcpI9bOLfXl4HRFBwv2B6XiVfmjKqzdbY+ifURKjM3lZL5fasBngMUOQ4Doy6KHECZID6MXJtL0B+5CXdXuTBPaDHlXWgksPe9w+wgbYdWKAa7TA1v5vJfCkmnaZI1zVh69fu6RxdFYR7daE6dv2FMAwbhH/ld8JUwclW4zLBmemSM7YocqmJ0BtPg3wuvu2eUzM3ISdnZHTBgHH+Q0R/j8RpRHJLzU0FdyRNmT7o7POuj98B", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Appointed Candidate(s)" = _t]),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(Source, {{"Appointed Candidate(s)", Splitter.SplitTextByDelimiter("</", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Appointed Candidate(s)"),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Split Column by Delimiter", "Text After Delimiter", each Text.AfterDelimiter([#"Appointed Candidate(s)"], ">", {0, RelativePosition.FromEnd}), type text),
    #"Added Custom" = Table.AddColumn(#"Inserted Text After Delimiter", "Extracted Name", each Text.Select([Text After Delimiter],{"a".."z","A".."Z"," "})),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Text After Delimiter"}),
    #"Filtered Rows1" = Table.SelectRows(#"Removed Columns", each ([Extracted Name] <> ""))
in
    #"Filtered Rows1"

 

Thanks so much for this! I will try it out. Would you be able to clarify what you mean by UI? Sorry I am quite new to DAX!

UI means PQ user interface. But since you don't need Changed Type step, then you can ignore this.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors