Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Dear powerbi community, I need help with scrapping data from a website, I have tried this method
I have followed these steps as shown in one video, this is my URL: https://www.reliancedigital.in/laptops/c/S101210?searchQuery=:relevance&page=1, I have changed the URL in editor as :searchQuery=:relevance&page=&PageStart& , and also added (PageStart as text) => before the URL, but the same 10 rows are duplicated after every 10th row, so my entire scrapped data is copies of those 10rows.
It seems like you're trying to scrape data from a website using Power BI's Web Content Connector feature. However, it seems like the website is using a pagination system, where only 10 rows are displayed at a time, and you need to click a button to view the next set of data.
To scrape data from a website like this, you'll need to write some code to extract the data and paginate through the pages of data. There are several methods you can use, including using the M language in Power Query, using a third-party scraping tool, or using a programming language such as R or Python to write a script to scrape the data.
Here's an example of how you could do this using the M language in Power Query:
Load the first page of data into Power Query: In the Power Query Editor, go to the "Home" tab and select the "From Web" option. Enter the URL for the first page of data and click "Load".
Remove the duplicates: In the Power Query Editor, go to the "Home" tab and select the "Remove Duplicates" option.
Write a function to paginate through the data: In the Power Query Editor, go to the "View" tab and select the "Advanced Editor" option. In the Advanced Editor, write a function to extract the data from each page and combine the data into a single table.
Here's an example of what the M code for the function might look like:
let Source = (PageStart as text) => let PageUrl = "https://www.reliancedigital.in/laptops/c/S101210?searchQuery=:relevance&page=" & PageStart, Data = Web.Page(Web.Contents(PageUrl)), Table = Data[Data], #"Removed Duplicates" = Table.Distinct(Table[Column1]) in #"Removed Duplicates",
data = Table.FromList(List.Generate(()=>[i=1, Table = Source(Text.From(i))], each [i]<=10, each [i=[i]+1, Table = Source(Text.From([i]))], each Table), splitter = Table.Combine),
#"Removed Top Rows" = Table.Skip(data,1)
in #"Removed Top Rows"
Note that this is just an example, and you may need to modify the code to match the structure of the website you're scraping. Also, be aware that scraping data from websites is subject to the website's terms of use and can be against their policy.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
102 | |
53 | |
21 | |
12 | |
12 |