cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
nynni
Frequent Visitor

Website scraping advice

I'm trying to scrape a website with a pretty simple HTML table, but it uses Javascript for pagination and I can only get the first 25 results when using the web connector. I've tried using 

 [WaitFor = [Timeout = #duration(0,0,0,0)]])

to see if Power BI could pick up the table before the javascript loads -- I'm not sure if that's how it works but it hasn't given me any results yet.

 

Is there anything I can do? This is the website and data in question: 

http://www.onequestionshootout.xyz/episodes/series_all.htm

1 ACCEPTED SOLUTION
v-yuta-msft
Community Support
Community Support

@nynni ,

 

I would suggest you to use python script in power bi to scrapy the website. About how to configure python environment and implement python script in power bi desktop, I would suggest you to refer to doc below:

https://docs.microsoft.com/en-us/power-bi/desktop-python-scripts

 

Community Support Team _ Jimmy Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

6 REPLIES 6
nynni
Frequent Visitor

I'm afraid my skills at this point won't allow for python scripting, so in the meanwhile I've downloaded the page as .html and I used the Text/CSV data connector to get the table in plain HTML. The downside of course is I cannot get the latest updates to my report over the internet.

@nynni 

Wow! Thank you for sharing the idea of dowloading the page as an HTML! I was having the same problem as you and was completely stuck. With your solution, I have at least succeeded in extracting a "snapshot" of the data as it stands currently, which is better than no data at all...

I would have never thought of downloading the actual page!

 

Thanks!!





Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.






v-yuta-msft
Community Support
Community Support

@nynni ,

 

Power query only support simple web scrapying. If the website needs dynamic scrapying, I'm afraid power query won't work.

 

Community Support Team _ Jimmy Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

kcantor
Community Champion
Community Champion

@nynni 

Perhaps this resource will help.

https://datachant.com/2017/03/30/web-scraping-power-bi-excel-power-query/





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




nynni
Frequent Visitor

It started as promising, but unfortunately I can't get any parameters from the url as it doesn't produce any when you navigate through the pages... Tricky!
v-yuta-msft
Community Support
Community Support

@nynni ,

 

I would suggest you to use python script in power bi to scrapy the website. About how to configure python environment and implement python script in power bi desktop, I would suggest you to refer to doc below:

https://docs.microsoft.com/en-us/power-bi/desktop-python-scripts

 

Community Support Team _ Jimmy Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Helpful resources

Announcements
UG GA Amplification 768x460.png

Launching new user group features

Learn how to create your own user groups today!

Power BI October Update 2021.jpg

Power BI Release

Click here to read more about the October 2021 Release!

Community Connections 768x460.jpg

Community & How To Videos

Check out the new Power Platform Community Connections gallery!

Teds Dev Camp Oct. 2021 768x460.jpg

Power BI Dev Camp - October 28th, 2021

Mark your calendars and join us for our next Power BI Dev Camp!