Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
danextian
Super User
Super User

How to scrape image URL from a webpage?

Hi All,

 

Is there a way to get the image URL from a webpage? From example, the poster images from IMDB.

I don't want to manually right click each of them and copy the image url.










Did I answer your question? Mark my post as a solution!


Proud to be a Super User!









"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.
1 ACCEPTED SOLUTION
Anonymous
Not applicable

Just for fun, I scraped together some pretty abhorrent M using the GUI, pointed at a wikipedia article. Here's what it looks like if you scrape all images from https://en.wikipedia.org/wiki/The_Lord_of_the_Rings:_The_Return_of_the_King

 

PBIDesktop_2017-07-28_09-06-07.png

 

After all the mangling and filtering, the M came out like this:

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://en.wikipedia.org/wiki/The_Lord_of_the_Rings:_The_Return_of_the_King"), null, null, 65001)}),
    #"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "src=""//upload")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Column1", Splitter.SplitTextByEachDelimiter({"src=""//"}, QuoteStyle.None, true), {"Column1.1", "Column1.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.2.1", "Column1.2.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1.1", "Column1.2.2"}),
    #"Filtered Rows1" = Table.SelectRows(#"Removed Columns", each Text.EndsWith([Column1.2.1], ".jpg") or Text.EndsWith([Column1.2.1], ".png") or Text.EndsWith([Column1.2.1], ".gif")),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows1", "https", each "https://"),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"https", "Column1.2.1"}),
    #"Merged Columns" = Table.CombineColumns(#"Reordered Columns",{"https", "Column1.2.1"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
    #"Renamed Columns" = Table.RenameColumns(#"Merged Columns",{{"Merged", "Images"}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Renamed Columns", "Images", "Images - Copy"),
    #"Renamed Columns1" = Table.RenameColumns(#"Duplicated Column",{{"Images - Copy", "ImageURLs"}})
in
    #"Renamed Columns1"

There's definitely a few things I could do here to clean it up, but the point is, you're going to have to load the webpage as a text file and start filtering down to context around the link you need. Once you isolate the links as full URLs, you can set the column type to "Image URL" and use those images in your Power BI report.

 

View solution in original post

7 REPLIES 7
Anonymous
Not applicable

Just for fun, I scraped together some pretty abhorrent M using the GUI, pointed at a wikipedia article. Here's what it looks like if you scrape all images from https://en.wikipedia.org/wiki/The_Lord_of_the_Rings:_The_Return_of_the_King

 

PBIDesktop_2017-07-28_09-06-07.png

 

After all the mangling and filtering, the M came out like this:

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://en.wikipedia.org/wiki/The_Lord_of_the_Rings:_The_Return_of_the_King"), null, null, 65001)}),
    #"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "src=""//upload")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Column1", Splitter.SplitTextByEachDelimiter({"src=""//"}, QuoteStyle.None, true), {"Column1.1", "Column1.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.2.1", "Column1.2.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1.1", "Column1.2.2"}),
    #"Filtered Rows1" = Table.SelectRows(#"Removed Columns", each Text.EndsWith([Column1.2.1], ".jpg") or Text.EndsWith([Column1.2.1], ".png") or Text.EndsWith([Column1.2.1], ".gif")),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows1", "https", each "https://"),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"https", "Column1.2.1"}),
    #"Merged Columns" = Table.CombineColumns(#"Reordered Columns",{"https", "Column1.2.1"},Combiner.CombineTextByDelimiter("", QuoteStyle.None),"Merged"),
    #"Renamed Columns" = Table.RenameColumns(#"Merged Columns",{{"Merged", "Images"}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Renamed Columns", "Images", "Images - Copy"),
    #"Renamed Columns1" = Table.RenameColumns(#"Duplicated Column",{{"Images - Copy", "ImageURLs"}})
in
    #"Renamed Columns1"

There's definitely a few things I could do here to clean it up, but the point is, you're going to have to load the webpage as a text file and start filtering down to context around the link you need. Once you isolate the links as full URLs, you can set the column type to "Image URL" and use those images in your Power BI report.

 

@Anonymous

 

Yes, a littile finicky but  awesome. Been scraping data from IMDB since last night. Lol. Smiley Very Happy Thanks for the help.










Did I answer your question? Mark my post as a solution!


Proud to be a Super User!









"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.
Greg_Deckler
Super User
Super User

You could do a web query


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler

 

All i know is get  a table text data? Is there a link to a tutorial how to do this?










Did I answer your question? Mark my post as a solution!


Proud to be a Super User!









"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.


@danextian wrote:

@Greg_Deckler

 

All i know is get  a table text data? Is there a link to a tutorial how to do this?


@danextian

I'd say it is out of the scope of Power BI. As to tutorial, you could get various blogs with the help of Google.

@Eric_Zhang,

 

So it is not possible?










Did I answer your question? Mark my post as a solution!


Proud to be a Super User!









"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.
Anonymous
Not applicable

It's possible, but it's a bit finnicky at best. You can get the page as a text document and strip out the URL you're looking for.

 

If it's specifically IMBD or movie information you're looking for, I'd suggest finding an API, rather than attempting to scrape.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.