Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
inzaghi1230
Frequent Visitor

Power BI Screen Scraping

Hi I am in my second week of working in Power BI. I am wanting to retrieve all listing data from RealEstate.co.nz and I have got to the following stage (please see code below), however I am having questions for the further step. Please help. Thank you.

 

  1. What shall I do to transform the current result to a meaning full table eg. Make the column “FindListingHeader” the table header however remove the duplication and make the “FindListingDetail” the data in the table. I have tried to use Transpose, but it doesn’t get me a proper table.
  2. How do I make the query to go to Next Page on the website. There are roughly about 50 pages.
  3. How do I get rid of “</p” in the description field, my Replace Value didn’t seem to work very well. Also the field doesn’t display the full text although I have already change the type to text.

 

let
    Source =Table.FromColumns({Lines.FromBinary(Web.Contents("http://www.realestate.co.nz/residential/search/districts/237/property_types/1%2C2%2C3%2C4%2C5%2C6%2C7%2C8%2C9%2C50"),null,null,1252)}),
    #"Renamed Columns to HTML" = Table.RenameColumns(Source,{{"Column1", "HTML"}}),
    #"Add Conditional Column Store Header" = Table.AddColumn(#"Renamed Columns to HTML", "FindListingHeader", each if Text.Contains([HTML], "id=""listing-") then "ListingNo" else if Text.Contains([HTML], "itemprop=""name""") then "Slogan" else if Text.Contains([HTML], "itemprop=""description""") then "Description" else if Text.Contains([HTML], "itemprop=""streetAddress""") then "StreetAddress" else if Text.Contains([HTML], "itemprop=""addressLocality""") then "Suburb" else if Text.Contains([HTML], "class=""price""") then "Price" else if Text.Contains([HTML], "Bedrooms") then "Bedrooms" else if Text.Contains([HTML], "Bathrooms") then "Bathrooms" else null ),
    #"Add Conditional Column Store Details" = Table.AddColumn(#"Add Conditional Column Store Header", "FindListingDetails", each if Text.Contains([HTML], "id=""listing-") then [HTML] else if Text.Contains([HTML], "itemprop=""name""") then [HTML] else if Text.Contains([HTML], "itemprop=""description""") then [HTML] else if Text.Contains([HTML], "itemprop=""addressLocality""") then [HTML] else if Text.Contains([HTML], "itemprop=""streetAddress""") then [HTML] else if Text.Contains([HTML], "class=""price""") then [HTML] else if Text.Contains([HTML], "Bedrooms</h6></li>") then [HTML] else if Text.Contains([HTML], "Bathrooms</h6></li>") then [HTML] else null ),
    #"Changed Data Type to Text" = Table.TransformColumnTypes(#"Add Conditional Column Store Details",{{"FindListingDetails", type text}, {"FindListingHeader", type text}}),
    #"Trimmed Text" = Table.TransformColumns(#"Changed Data Type to Text",{{"FindListingDetails", Text.Trim}}),
    #"Remove <span itemprop=""name"">" = Table.ReplaceValue(#"Trimmed Text","<span itemprop=""name"">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove <p itemprop=""description"">" = Table.ReplaceValue(#"Remove <span itemprop=""name"">","<p itemprop=""description"">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove itemprop=""addressLocality"">" = Table.ReplaceValue(#"Remove <p itemprop=""description"">","<span class=""location"" itemprop=""addressLocality"">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove <span itemprop=""streetAddress"">" = Table.ReplaceValue(#"Remove itemprop=""addressLocality"">","<span itemprop=""streetAddress"">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove <div class=""price"">" = Table.ReplaceValue(#"Remove <span itemprop=""streetAddress"">","<div class=""price"">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove <li><h6>" = Table.ReplaceValue(#"Remove <div class=""price"">","<li><h6>","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove </h6></li>" = Table.ReplaceValue(#"Remove <li><h6>","</h6></li>","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove </span>" = Table.ReplaceValue(#"Remove </h6></li>","</span>","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove </div>" = Table.ReplaceValue(#"Remove </span>","</div>","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove </p" = Table.ReplaceValue(#"Remove </div>","</p","",Replacer.ReplaceText,{"HTML"}),
    #"Remove """ = Table.ReplaceValue(#"Remove </p","""","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove <div id=listing-" = Table.ReplaceValue(#"Remove ""","<div id=listing-","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove class=listing" = Table.ReplaceValue(#"Remove <div id=listing-"," class=listing featuredListing itemscope itemtype=http://schema.org/Residence data-gtm={","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove ," = Table.ReplaceValue(#"Remove class=listing",",","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove }" = Table.ReplaceValue(#"Remove ,","}","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Remove >" = Table.ReplaceValue(#"Remove }",">","",Replacer.ReplaceText,{"FindListingDetails"}),
    #"Filtered Out Null and Blank on Details" = Table.SelectRows(#"Remove >", each [FindListingDetails] <> null and [FindListingDetails] <> ""),
    #"Removed Columns HTML" = Table.RemoveColumns(#"Filtered Out Null and Blank on Details",{"HTML"})
in
    #"Removed Columns HTML"

 

1 ACCEPTED SOLUTION

6 REPLIES 6
Phil_Seamark
Employee
Employee

Hi @inzaghi1230,

 

This should be achieveable.  Any chance you can share your PBIX file to fine tune?

 

PM me a link if that suits.


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Hi Phil,

 

Thanks for your reply. However I can not seem to attach the pbix file here. Actually if you start a new report and paste my code into the Advance Editor, you will get exactly what I have here.

 

The website I am wanting to get data is www.realestate.co.nz . At our workplace, we want to get an idea on the housing supply by taking snapshot on the property listings on a daily basis. Once we build up our data history, then we could look into the average asking price or property type on the supply side.

 

Really new to Power BI, but it's really powerful and I really want to dive into it.

 

Thanks for the help.

Hi @inzaghi1230,

 

If you upload to a One Drive, or Drop Box you can probably generate a link to share.


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Hi Phil,

 

Here is the link to one drive. https://1drv.ms/u/s!Au_vRMalV-Pujmc4UtPjcsO0LNmn

 

Please let me know if any problems. Once again, thank you for your help. Cheers.

 

 

Hi @inzaghi1230

 

Have you read these walkthroughs?

 

http://blogs.adatis.co.uk/callumgreen/post/Loop-through-Multiple-Web-Pages-using-Power-Query

 

and about URL Parameterization 

 

http://radacad.com/custom-functions-made-easy-in-power-bi-desktop 

 


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Thanks Phil. Exactly what I want! Cheers.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.