Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello - can anyone find a way of scraping Bet365 without it timing out? https://www.bet365.com/#/AC/B1/C1/D8/E100842460/F3/I8/
I had a code that would work for this URL previously, but several months later it now times out instead of returning the data from that page. Does anyone know whether the site has something deliberately blocking web scraping somehow? It is all I can think of as to why it would stop the initial load process, when it previously worked.
Also this appears to connect within Excel Power Query, but not Power BI Desktop - and I need it connecting in the latter given Power BI Desktop's extra functionality for manipulating the data afterwards.
Source = Web.BrowserContents("https://www.bet365.com/#/AC/B1/C1/D8/E100842460/F3/I8/", [WaitFor=[Timeout=#duration(0, 0, 0, 2)]])
Thanks very much!
Hi, @jmillsjmills
I seem to have found a workaround, try to change Web.BrowserContents to Web.Contents
I tried the following M code, PowerQuery returned me the correct Html information
let
url="https://www.bet365.com/#/AC/B1/C1/D8/E100842460/F3/I8/",
web=Text.FromBinary(Web.Contents(url))
in
web
For references:
Hope this helps.
Best Regards,
Community Support Team _ Zeon Zheng
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thank you so much for your reply! That's very useful to know and I appreciate the workaround, given I had given up!!
It looks like the HTML it is now pulling in is a little different to before. I was using CSS selectors to plot an HTML table that pulled in the odds (for example I was looking for .srb_ParticipantLabelWithTeam_Name, .srb_ParticipantLabelWithTeam_Team and .gl-MarketGroupButton_Text, all with a row selector set as .srb-ParticipantLabelWithTeam). You can see these CSS selectors if you inspect element on the odds in the website directly.
However the HTML it's pulling in for me (and not sure it's the same for you) appears to be one large booting javascript function of some sort? Not the best at diagnosing these things but there certainly doesn't seem to be the full page content in the same way as before. Is this the case for you? Do you have any more ideas?
Thank you so much! The link may now have expired but this URL is a new match:
https://www.bet365.com/#/AC/B1/C1/D8/E100693610/F3/I8/
Really appreciate all your efforts!
Hi, @jmillsjmills
I think this may not be your problem. Some websites will set up some measures to prevent webpages from crawling. I also tried the above URL, which also prompts a timeout error, but when I try other websites, it works normally. This may require some crawler knowledge for the website to recognize Power BI as a browser and return data.
For references:
https://community.powerbi.com/t5/Desktop/Website-scraping-advice/m-p/762012
https://community.powerbi.com/t5/Desktop/Web-scraping-with-Power-Bi/m-p/927014
Hope this helps.
Best Regards,
Community Support Team _ Zeon Zheng
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly
Hi @v-angzheng-msft - thanks very much for your reply! I had a feeling it be something along these lines and that they are just set up to resist web scraping. Thank you for clarifying
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
100 | |
88 | |
69 | |
61 |
User | Count |
---|---|
151 | |
120 | |
103 | |
87 | |
68 |