Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
bleow
Frequent Visitor

Retrieve most updated excel file from a website without API support

I need to pull the latest excel data from https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/datasets/si...

bleow_2-1625756917215.png

 

Clicking the topmost button (red arrowed) gives me this link: https://www.ons.gov.uk/file?uri=%2femploymentandlabourmarket%2fpeopleinwork%2femploymentandemployeet...

 

But I cannot just use the above url because I need the latest data, so if 2021 data comes out for example, I will need to pull the 2021 data and not the 2020final1.xlsx data (as hardcoded into the above url)

 

I know that every dataset url starts with "/file?uri=%" and ends with ".xls" or ".xlsx", so I did the following:

1. Input the website url and obtain the html code

2. Text.Split the html code by "/file?uri=%"

3. Use Text.BeforeDelimiter to remove text after .xls

4. Add back the /file?=uri=% and .xls to "restore" the url

5. Select the first url (which will have the latest data)

bleow_1-1625756631719.png

Advanced editor code as below:

let
    Source = Web.BrowserContents("https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/datasets/sicknessabsenceinthelabourmarket"),
    #"Split by /file?uri=%" = Text.Split(Source, "/file?uri=%"),
    #"Converted to Table" = Table.FromList(#"Split by /file?uri=%", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Discard text after .xls" = Table.TransformColumns(#"Converted to Table", {{"Column1", each Text.BeforeDelimiter(_, ".xls"), type text}}),
    #"Removed Top Row" = Table.Skip(#"Discard text after .xls",1),
    #"Added back www.ons.gov.uk[...]" = Table.ReplaceValue(#"Removed Top Row","2femploymentandlabourmarket","https://www.ons.gov.uk/file?uri=%2femploymentandlabourmarket",Replacer.ReplaceText,{"Column1"}),
    #"Added .xlsx" = Table.AddColumn(#"Added back www.ons.gov.uk[...]", ".xlsx", each ".xlsx"),
    #"Added website_url" = Table.AddColumn(#"Added .xlsx", "website_url", each [Column1]&[#".xlsx"]),
    #"Kept First Row" = Table.FirstN(#"Added website_url",1),
    #"Kept website_url" = Table.SelectColumns(#"Kept First Row",{"website_url"}),
    #"Convert table into text" = Table.ToList(#"Kept website_url"){0}
in
    #"Convert table into text"

 

Is there a better way of parsing html code in Power BI?

I want to read the entire text and extract all strings that start with /file and end with .xls or .xlsx and store the results in a list. Something similar can be achieved with Python and Regex, but how would I do this in Power Query/DAX?

 

Note: I checked their API, and it does not have the above dataset that I need. https://api.beta.ons.gov.uk/v1/datasets

 

 

 

 

2 REPLIES 2
v-yingjl
Community Support
Community Support

Hi @bleow ,

To get the contents in the download file link from web link, you can try:

1. Right click the download button, choose 'Inspect'

inspect.png

 

2. In the Inspect page, find the 'real' link of this dataset:

link.png

 

3. Copy this link and use the Web connector in power bi to connect it, you can get the file contents:

web.png

 

For more details, you can refer this article: Tips to download files from webpages in Power Query and Power BI 

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks for the reply. Is there any way to do this url retrieval programmatically? My client wants as much automation as possible and doesn't want to manually check the website for dataset updates.

 

And in that regard, I do already have a script in the original post that does this, albeit in a hacked-together manner and I'm wondering if there's a better way to do it, or if say python/pandas integration is possible for my use case.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.