Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
raymond
Post Patron
Post Patron

Webscraping with powerbi get img URL without using img src and using srcset or data-original instead

Hello everyone,

 

(once again I try to write this post).

I want to extract an Image from an website. However the img src parameter does not contain a readable img URL. Hence instead I would have to use something else. I was thinking of maybe retriving the value of data-original but I have no clou how to do it. Maybe data-scrset is another option or perhaps there is a way to extract the entire html content of the img tag. Do you have a solution?

 

<div class="article-image-container">
<div class="content">
<img
src="data&colon;image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7"
data-original="https://cdn2.chrono24.com/images/uhren/15832324-tbzie1krd5li6y9gt3q55f6r-Square210.jpg"
data-srcset="https://cdn2.chrono24.com/images/uhren/15832324-tbzie1krd5li6y9gt3q55f6r-Square420.jpg 2x"
alt="" class="js-lazy">
</div>
</div>

 Thanks in advance

3 REPLIES 3
raymond
Post Patron
Post Patron

@Admin: I cant read my post, it got translated to spanish?

I will go nuts here, everything is in spanish 😄

@PhilipTreacy : thanks for your support. But with that I cannot match an Image to another piece of information. I can download all the images now but since I dont just only want the images but also names, prices and so forth I would need a reference.

The code in actuality looks more like this, and to be frank even more complicated. From here I would like to extract the name, link, image, price.

<div class="article-item-container">
      <a href="/rolex/rolex-datejust-turn-o-graph--id16906358.htm">

<div class="article-image-container">
<img src="data&colon;image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-original="https://cdn2.chrono24.com/images/uhren/15429755-jawuuoydt9e7qbg2x62c3m8u-Square210.jpg" data-srcset="https://cdn2.chrono24.com/images/uhren/15429755-jawuuoydt9e7qbg2x62c3m8u-Square420.jpg 2x">
</div>

<div class="article-title">
        Rolex Datejust Turn-O-Graph
</div>

<div class="article-price-container">
         9.500
</div>

</div>

PhilipTreacy
Super User
Super User

Hi @raymond 

If you open the web page as text, you will see all the HTML, 1 line per row, and can then filter for what you want.

In this example I've filtered for the line containing data-original.

You can then extract the image URL using Transform -> Extract -> Text Between Delimiters.

You can use this query to test on a file I've placed on Amazon S3 that contains the HTML you posted.

 

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://d2cgdza3nuf1jv.cloudfront.net/img.htm"))}),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Column1] = "   data-original=""https://cdn2.chrono24.com/images/uhren/15832324-tbzie1krd5li6y9gt3q55f6r-Square210.jpg"" ")),
    #"Extracted Text Between Delimiters" = Table.TransformColumns(#"Filtered Rows", {{"Column1", each Text.BetweenDelimiters(_, """", """"), type text}})
in
    #"Extracted Text Between Delimiters"

 

 Phil


If I answered your question please mark my post as the solution.
If my answer helped solve your problem, give it a kudos by clicking on the Thumbs Up.



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.