Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Webscrapping from URL

Dear powerbi community, I need help with scrapping data from a website, I have tried this method 

I have followed these steps as shown in one video, this is my URL: https://www.reliancedigital.in/laptops/c/S101210?searchQuery=:relevance&page=1, I have changed the URL in editor as :searchQuery=:relevance&page=&PageStart& , and also added (PageStart as text) => before the URL,  but the same 10 rows are duplicated after every 10th row, so my entire scrapped data is copies of those 10rows.

1 REPLY 1
jaweher899
Super User
Super User

It seems like you're trying to scrape data from a website using Power BI's Web Content Connector feature. However, it seems like the website is using a pagination system, where only 10 rows are displayed at a time, and you need to click a button to view the next set of data.

To scrape data from a website like this, you'll need to write some code to extract the data and paginate through the pages of data. There are several methods you can use, including using the M language in Power Query, using a third-party scraping tool, or using a programming language such as R or Python to write a script to scrape the data.

Here's an example of how you could do this using the M language in Power Query:

  1. Load the first page of data into Power Query: In the Power Query Editor, go to the "Home" tab and select the "From Web" option. Enter the URL for the first page of data and click "Load".

  2. Remove the duplicates: In the Power Query Editor, go to the "Home" tab and select the "Remove Duplicates" option.

  3. Write a function to paginate through the data: In the Power Query Editor, go to the "View" tab and select the "Advanced Editor" option. In the Advanced Editor, write a function to extract the data from each page and combine the data into a single table.

Here's an example of what the M code for the function might look like:

let Source = (PageStart as text) => let PageUrl = "https://www.reliancedigital.in/laptops/c/S101210?searchQuery=:relevance&page=" & PageStart, Data = Web.Page(Web.Contents(PageUrl)), Table = Data[Data], #"Removed Duplicates" = Table.Distinct(Table[Column1]) in #"Removed Duplicates",

data = Table.FromList(List.Generate(()=>[i=1, Table = Source(Text.From(i))], each [i]<=10, each [i=[i]+1, Table = Source(Text.From([i]))], each Table), splitter = Table.Combine),
#"Removed Top Rows" = Table.Skip(data,1)

in #"Removed Top Rows"

  1. Load the data into Power BI: In the Power Query Editor, go to the "Close & Apply" option to load the data into Power BI.

Note that this is just an example, and you may need to modify the code to match the structure of the website you're scraping. Also, be aware that scraping data from websites is subject to the website's terms of use and can be against their policy.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors