Reply
New Contributor
Posts: 460
Registered: ‎10-18-2016
Accepted Solution

How to scrape image URL from a webpage?

Hi All,

 

Is there a way to get the image URL from a webpage? From example, the poster images from IMDB.

I don't want to manually right click each of them and copy the image url.

"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
www.linkedin.com/in/danebelarminocpa

Accepted Solutions
Established Member
Posts: 134
Registered: ‎11-20-2015

Re: How to scrape image URL from a webpage?

Just for fun, I scraped together some pretty abhorrent M using the GUI, pointed at a wikipedia article. Here's what it looks like if you scrape all images from https://en.wikipedia.org/wiki/The_Lord_of_the_Rings:_The_Return_of_the_King

 

PBIDesktop_2017-07-28_09-06-07.png

 

After all the mangling and filtering, the M came out like this:

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://en.wikipedia.org/wiki/The_Lord_of_the_Rings:_The_Return_of_the_King"), null, null, 65001)}),
    #"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "src=""//upload")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Column1", Splitter.SplitTextByEachDelimiter({"src=""//"}, QuoteStyle.None, true), {"Column1.1", "Column1.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.2.1", "Column1.2.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1.1", "Column1.2.2"}),
    #"Filtered Rows1" = Table.SelectRows(#"Removed Columns", each Text.EndsWith([Column1.2.1], ".jpg") or Text.EndsWith([Column1.2.1], ".png") or Text.EndsWith([Column1.2.1], ".gif")),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows1", "https", each "https://"),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"https", "Column1.2.1"}),
    #"Merged Columns" = Table.CombineColumns(#"Reordered Columns",{"https", "Column1.2.1"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
    #"Renamed Columns" = Table.RenameColumns(#"Merged Columns",{{"Merged", "Images"}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Renamed Columns", "Images", "Images - Copy"),
    #"Renamed Columns1" = Table.RenameColumns(#"Duplicated Column",{{"Images - Copy", "ImageURLs"}})
in
    #"Renamed Columns1"

There's definitely a few things I could do here to clean it up, but the point is, you're going to have to load the webpage as a text file and start filtering down to context around the link you need. Once you isolate the links as full URLs, you can set the column type to "Image URL" and use those images in your Power BI report.

 

View solution in original post


All Replies
Super User
Posts: 7,946
Registered: ‎07-11-2015

Re: How to scrape image URL from a webpage?

You could do a web query



Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!




New Contributor
Posts: 460
Registered: ‎10-18-2016

Re: How to scrape image URL from a webpage?

@Greg_Deckler

 

All i know is get  a table text data? Is there a link to a tutorial how to do this?

"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
www.linkedin.com/in/danebelarminocpa
Moderator
Posts: 3,051
Registered: ‎03-06-2016

Re: How to scrape image URL from a webpage?


@danextian wrote:

@Greg_Deckler

 

All i know is get  a table text data? Is there a link to a tutorial how to do this?


@danextian

I'd say it is out of the scope of Power BI. As to tutorial, you could get various blogs with the help of Google.

New Contributor
Posts: 460
Registered: ‎10-18-2016

Re: How to scrape image URL from a webpage?

@Eric_Zhang,

 

So it is not possible?

"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
www.linkedin.com/in/danebelarminocpa
Established Member
Posts: 134
Registered: ‎11-20-2015

Re: How to scrape image URL from a webpage?

It's possible, but it's a bit finnicky at best. You can get the page as a text document and strip out the URL you're looking for.

 

If it's specifically IMBD or movie information you're looking for, I'd suggest finding an API, rather than attempting to scrape.

Established Member
Posts: 134
Registered: ‎11-20-2015

Re: How to scrape image URL from a webpage?

Just for fun, I scraped together some pretty abhorrent M using the GUI, pointed at a wikipedia article. Here's what it looks like if you scrape all images from https://en.wikipedia.org/wiki/The_Lord_of_the_Rings:_The_Return_of_the_King

 

PBIDesktop_2017-07-28_09-06-07.png

 

After all the mangling and filtering, the M came out like this:

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://en.wikipedia.org/wiki/The_Lord_of_the_Rings:_The_Return_of_the_King"), null, null, 65001)}),
    #"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "src=""//upload")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Column1", Splitter.SplitTextByEachDelimiter({"src=""//"}, QuoteStyle.None, true), {"Column1.1", "Column1.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.2.1", "Column1.2.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1.1", "Column1.2.2"}),
    #"Filtered Rows1" = Table.SelectRows(#"Removed Columns", each Text.EndsWith([Column1.2.1], ".jpg") or Text.EndsWith([Column1.2.1], ".png") or Text.EndsWith([Column1.2.1], ".gif")),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows1", "https", each "https://"),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"https", "Column1.2.1"}),
    #"Merged Columns" = Table.CombineColumns(#"Reordered Columns",{"https", "Column1.2.1"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
    #"Renamed Columns" = Table.RenameColumns(#"Merged Columns",{{"Merged", "Images"}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Renamed Columns", "Images", "Images - Copy"),
    #"Renamed Columns1" = Table.RenameColumns(#"Duplicated Column",{{"Images - Copy", "ImageURLs"}})
in
    #"Renamed Columns1"

There's definitely a few things I could do here to clean it up, but the point is, you're going to have to load the webpage as a text file and start filtering down to context around the link you need. Once you isolate the links as full URLs, you can set the column type to "Image URL" and use those images in your Power BI report.

 

Highlighted
New Contributor
Posts: 460
Registered: ‎10-18-2016

Re: How to scrape image URL from a webpage?

@SonnyChilds

 

Yes, a littile finicky but  awesome. Been scraping data from IMDB since last night. Lol. Smiley Very Happy Thanks for the help.

"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
www.linkedin.com/in/danebelarminocpa