cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Helper I
Helper I

Extracting Hex Color Codes from Websites

I am looking for a way to extract hex color codes from the code of a webpage. I have a number of websites that we manage and would like to scrape the site to extract their color schemes for marketing research purposes. 

 

Hex color codes are always #CCC or #CCCCCC, example:

 

     color: #FFF;
     background: #57C2BD;

 

So if I can import the html for the site, I am fairly confident I can extract these color codes. The issue I am having is finding a way to get PowerBI to import the raw html of the page versus the document/table format which is cumbersom and only appears to let me get to the visible text on the page.

 

Any suggestions would be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Memorable Member
Memorable Member

Re: Extracting Hex Color Codes from Websites

A couple of ways.

 

Something like this with but it's not pretty and doesn't always get the right results depending on the page.

 

let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))}),
#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "http")),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Column1], "href")),
#"Replaced Value" = Table.ReplaceValue(#"Filtered Rows1","href","~href",Replacer.ReplaceText,{"Column1"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Column1", Splitter.SplitTextByEachDelimiter({"~"}, QuoteStyle.None, false), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Filtered Rows2" = Table.SelectRows(#"Split Column by Delimiter", each Text.Contains([Column1], "href")),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows2", "Column1", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.1", "Column1.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.1", type text}, {"Column1.2", type text}}),
#"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, true), {"Column1.2.1", "Column1.2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.2.1", "Url"}}),
#"Filtered Rows3" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Url], "http")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows3",{"Column1.1", "Column1.2.2"})
in
#"Removed Columns"

a more reliable way would be to use r and a webscraping library

 

library(rvest)
page=read_html("WEBSITE")
data <- as.data.frame(html_attr(html_nodes(page, "a"), "href"))

View solution in original post

5 REPLIES 5
Highlighted
Memorable Member
Memorable Member

Re: Extracting Hex Color Codes from Websites

If you have firefox there is an eyedropper in the developer tools that shows the hex value when you hover over.

 

https://developer.mozilla.org/en-US/docs/Tools/Eyedropper

 

Chrome dev tools have a similar tool, but its not as easy to get to.

https://paul.kinlan.me/eyedropper-chrome-dev-tools/

 

Highlighted
Helper I
Helper I

Re: Extracting Hex Color Codes from Websites

I aware of these tools, however we need a way to programatically extract the hex color codes. We have a large number of websites we need to scrape.

Highlighted
Memorable Member
Memorable Member

Re: Extracting Hex Color Codes from Websites

OK.

 

Something like this should work. I used this technique it to strip out links form a web site before.

From here you can split and filter your html source code.

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))})
 in
    Source
Highlighted
Helper I
Helper I

Re: Extracting Hex Color Codes from Websites

Thanks, curious to know how you extract the URLs exactly.

Highlighted
Memorable Member
Memorable Member

Re: Extracting Hex Color Codes from Websites

A couple of ways.

 

Something like this with but it's not pretty and doesn't always get the right results depending on the page.

 

let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://www.microsoft.com/en-us/"))}),
#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "http")),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Column1], "href")),
#"Replaced Value" = Table.ReplaceValue(#"Filtered Rows1","href","~href",Replacer.ReplaceText,{"Column1"}),
#"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Replaced Value", {{"Column1", Splitter.SplitTextByEachDelimiter({"~"}, QuoteStyle.None, false), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Column1"),
#"Filtered Rows2" = Table.SelectRows(#"Split Column by Delimiter", each Text.Contains([Column1], "href")),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Filtered Rows2", "Column1", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.1", "Column1.2"}),
#"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.1", type text}, {"Column1.2", type text}}),
#"Split Column by Delimiter2" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, true), {"Column1.2.1", "Column1.2.2"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter2",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
#"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1.2.1", "Url"}}),
#"Filtered Rows3" = Table.SelectRows(#"Renamed Columns", each Text.StartsWith([Url], "http")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows3",{"Column1.1", "Column1.2.2"})
in
#"Removed Columns"

a more reliable way would be to use r and a webscraping library

 

library(rvest)
page=read_html("WEBSITE")
data <- as.data.frame(html_attr(html_nodes(page, "a"), "href"))

View solution in original post

Helpful resources

Announcements
Community Blog

Community Blog

Visit our Community Blog for articles, guides, and information created by fellow community members.

Using the Community

Using the Community

Need help with the Power BI Community? Our 'Using the Community' support articles are a great place to start.

Galleries

Galleries

Looking for inspiration on how to present your data? Need instructional videos? Check out our Galleries!

Community Summit North America

Community Summit North America

Innovate, Collaborate, Grow. The top training and networking event across the globe for Microsoft Business Applications

Top Solution Authors
Top Kudoed Authors