Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
maracles
Resolver II
Resolver II

Importing data from multiple pages of a single website.

For some research I'm doing I want to import that data from 20-30 pages of IMDb. The pages is the same page just for 30 different films, an example of the page is this:


http://www.imdb.com/title/tt2488496/fullcredits/  (Star Wars)

I'm not looking to import all the data, only the 'Camera Department' table.

I've tested using the Excel import from Web feature and it works, however creating 20-30 individual worksheets and having to make 20-30 connections seems inefficient. Does anyone know of a better way of perhaps adding all of the pages to a single query or something similar to streamline this process?

I have looked into this (http://www.omdbapi.com/) however the API only allows you to call one movie at a time and not the data I need.

Likewise, this api tool (https://www.npmjs.com/package/imdb-api) requires node.js and programming skills I don't have. Any help much appreciated.

2 REPLIES 2

The general approach to these types of situations is to look at the URL of the site and see what changes between films.  So, for example, Dunkirk is the following:

 

http://www.imdb.com/title/tt5013056/fullcredits

 

notice the only part that changed is the ID between title and full credits

 

So, you need to create a table that has all the IDs you want to pull, then create a function from your query, then invoke that function for every record in the table.

@dkay84_PowerBIthank you for the recommendation, that sounds like a better method. 

Do you have an example of how such a function would work? I haven't done something like that before.

I have also realised that when you import the page it gives you a list of tables. Unfortunately the tables are not always in the same order for example sometimes 'Camera Department' will be table 6, sometimes table 9. Is there a way of circumventing this too e.g. selecting by table name which is always the same? 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.