Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Alex_Ooi
Helper IV
Helper IV

Web scraping to show all rows in Power Query

Hi all,

 

I have a task of scraping the composite index historical available in Yahoo Finance. Let's take an example of NASDAQ (^IXIC) with the following URL: https://finance.yahoo.com/quote/%5EIXIC/history?period1=1570147200&period2=1601769600&interval=1d&fi...

 

Start date: 4/10/2019

End date: 4/10/2020

 

However, I noticed a problem where by default, the URL only displays the first 100 rows. It will only be expanded when you scroll down in the website. As such, the result of my code only returns the earliest date of 13/5/2020 (while obviously expecting the earliest date in the table to be 4/10/2019).

Alex_Ooi_0-1601897154205.png


I am not familiar with inline CSS and unfortunately I was not able to understand most of the articles that I could find. I suspect that I could use the second optional argument in Web.BrowserContents( ) by using the WaitFor function. I have inspected the CSS element of the website and I noticed the only thing that changes before the table is fully displayed on web is the class element. I don't know how to add that into my code.

<html id="atomic" class="chrome desktop JsEnabled themelight layoutEnhance(TwoColumnLayout) CollapsibleUh onDemandFocusSupport hasScrolled scrollDown HideNavrail" lang="en-US">

 

I hope that someone here could help me. Below is my full M code.

let
    Source = Web.BrowserContents("https://finance.yahoo.com/quote/%5EIXIC/history?period1=1577836800&period2=1609372800&interval=1d&filter=history&frequency=1d"),
    #"Extracted Table From Html" = Html.Table(Source, {
        {"Column1",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column2",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column3",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column4",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column5",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column6",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TD:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"},
        
        {"Column7",
        "TABLE.W\(100\%\).M\(0\) > * > TR > TH:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TH:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TH:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TH:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TH:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TH:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2)
            + TH:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(1),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(7)
            + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(6)
            + TD:not([colspan]):not([rowspan]):nth-child(3):nth-last-child(5)
            + TD:not([colspan]):not([rowspan]):nth-child(4):nth-last-child(4)
            + TD:not([colspan]):not([rowspan]):nth-child(5):nth-last-child(3)
            + TD:not([colspan]):not([rowspan]):nth-child(6):nth-last-child(2)
            + TD:not([colspan]):not([rowspan]):nth-child(7):nth-last-child(1),
        TABLE.W\(100\%\).M\(0\) > * > TR > TD[colspan=""7""]:not([rowspan]):nth-child(1):nth-last-child(1)"}},
    [RowSelector="TABLE.W\(100\%\).M\(0\) > * > TR"]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Extracted Table From Html", [PromoteAllScalars=true]),
    #"Removed Columns" = Table.RemoveColumns(#"Promoted Headers",{"High", "Low", "Adj Close**"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Close*", "Close"}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Date", type date}, {"Open", type number}, {"Close", type number}, {"Volume", type number}}),
    #"Removed Errors" = Table.RemoveRowsWithErrors(#"Changed Type", {"Date"}),
    #"Sorted Rows" = Table.Sort(#"Removed Errors",{{"Date", Order.Ascending}})
in
    #"Sorted Rows"

 Thank you

 

Regards,

Alex

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi Alex, notice the download link on the right!

 

= Csv.Document(Web.Contents("https://query1.finance.yahoo.com/v7/finance/download/%5EIXIC?period1=1570147200&period2=1601769600&i..."),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None])

 

Give the above a shot and that should pull everything in!

View solution in original post

2 REPLIES 2
LiorRahav
Regular Visitor

Hi,  what if the page number is hidden in the metadata, for example: https://dailymed.nlm.nih.gov/dailymed/services/v2/ndcs

 , this will only give me the first 100 records... any thoughts?

Anonymous
Not applicable

Hi Alex, notice the download link on the right!

 

= Csv.Document(Web.Contents("https://query1.finance.yahoo.com/v7/finance/download/%5EIXIC?period1=1570147200&period2=1601769600&i..."),[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None])

 

Give the above a shot and that should pull everything in!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.