Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello everyone,
(once again I try to write this post).
I want to extract an Image from an website. However the img src parameter does not contain a readable img URL. Hence instead I would have to use something else. I was thinking of maybe retriving the value of data-original but I have no clou how to do it. Maybe data-scrset is another option or perhaps there is a way to extract the entire html content of the img tag. Do you have a solution?
<div class="article-image-container">
<div class="content">
<img
src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7"
data-original="https://cdn2.chrono24.com/images/uhren/15832324-tbzie1krd5li6y9gt3q55f6r-Square210.jpg"
data-srcset="https://cdn2.chrono24.com/images/uhren/15832324-tbzie1krd5li6y9gt3q55f6r-Square420.jpg 2x"
alt="" class="js-lazy">
</div>
</div>
Thanks in advance
I will go nuts here, everything is in spanish 😄
@PhilipTreacy : thanks for your support. But with that I cannot match an Image to another piece of information. I can download all the images now but since I dont just only want the images but also names, prices and so forth I would need a reference.
The code in actuality looks more like this, and to be frank even more complicated. From here I would like to extract the name, link, image, price.
<div class="article-item-container">
<a href="/rolex/rolex-datejust-turn-o-graph--id16906358.htm">
<div class="article-image-container">
<img src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-original="https://cdn2.chrono24.com/images/uhren/15429755-jawuuoydt9e7qbg2x62c3m8u-Square210.jpg" data-srcset="https://cdn2.chrono24.com/images/uhren/15429755-jawuuoydt9e7qbg2x62c3m8u-Square420.jpg 2x">
</div>
<div class="article-title">
Rolex Datejust Turn-O-Graph
</div>
<div class="article-price-container">
9.500
</div>
</div>
Hi @raymond
If you open the web page as text, you will see all the HTML, 1 line per row, and can then filter for what you want.
In this example I've filtered for the line containing data-original.
You can then extract the image URL using Transform -> Extract -> Text Between Delimiters.
You can use this query to test on a file I've placed on Amazon S3 that contains the HTML you posted.
let
Source = Table.FromColumns({Lines.FromBinary(Web.Contents("https://d2cgdza3nuf1jv.cloudfront.net/img.htm"))}),
#"Filtered Rows" = Table.SelectRows(Source, each ([Column1] = " data-original=""https://cdn2.chrono24.com/images/uhren/15832324-tbzie1krd5li6y9gt3q55f6r-Square210.jpg"" ")),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Filtered Rows", {{"Column1", each Text.BetweenDelimiters(_, """", """"), type text}})
in
#"Extracted Text Between Delimiters"
Phil
If I answered your question please mark my post as the solution.
If my answer helped solve your problem, give it a kudos by clicking on the Thumbs Up.
Proud to be a Super User!
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
109 | |
98 | |
77 | |
66 | |
54 |
User | Count |
---|---|
144 | |
104 | |
101 | |
86 | |
64 |