Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and a 50 percent discount on exams.
Get startedEarn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi PBi Users,
I, on a daily basis, scrape a web site for fund prices but the website has been updated over the weekend and my query (below) and table below, no longer works. Please may you help me out.
Much appreciate any help you can provide.
Thank you!
the new website is ... https://www.hl.co.uk/funds/fund-discounts,-prices--and--factsheets/search-results?sectorid=121&start...
let
// Function for each Sector-Information
WebCall = (SectorID as text) =>
let
Source = Web.Page(Web.Contents("https://www.hl.co.uk/funds/fund-discounts,-prices--and--factsheets/search-results?sectorid="&SectorI..., 0, 0, 60)])),
Data = Source{[ClassName="sortableTable"]}[Data],
ListOfAllPages = if List.Count(List.Skip(Text.Split(Data{0}[Name], "#(lf)")))=0 then {"1"} else List.Skip(Text.Split(Data{0}[Name], "#(lf)")),
#"Converted to Table" = Table.FromList(ListOfAllPages, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
RearragnedURL = Table.AddColumn(#"Converted to Table", "Custom", each Web.Page(Web.Contents("https://www.hl.co.uk/funds/fund-discounts,-prices--and--factsheets/search-results?tab=prices&filters...]),
#"Expanded Custom" = Table.ExpandTableColumn(RearragnedURL, "Custom", {"Name", "Current price Sell (p)", "Current price Change (p)", "More information"}, {"Name", "Sell (p)", "Change (p)", "More information"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Expanded Custom", "More information", Splitter.SplitTextByDelimiter("#(lf)", QuoteStyle.Csv), {"More information.1", "More information.2", "More information.3", "More information.4", "More information.5", "More information.6"}),
#"Renamed Columns" = Table.RenameColumns(#"Split Column by Delimiter",{{"More information.2", "Unit type"}, {"More information.4", "Sector"}, {"More information.6", "Valuation date"}}),
#"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"More information.1", "More information.3", "More information.5", "Column1"}),
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Name + type", each [Name]&" - "&[Unit type]),
#"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Name", "Name + type", "Sell (p)", "Change (p)", "Unit type", "Sector", "Valuation date"}),
#"Filtered Rows" = Table.SelectRows(#"Reordered Columns", each not Text.StartsWith([Name], "Page:")),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows",{"Name", "Unit type"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns1",{{"Name + type", type text}, {"Sell (p)", type number}, {"Change (p)", type number}, {"Sector", type text}, {"Valuation date", type date}})
in
#"Changed Type",
// Start of main query: Dig into Html to retrieve a table with all Sectors
Source = #"SectorIDs - 1",
// Call function and cleanup
CallEachFunction = Table.AddColumn(Source, "WebCall", each WebCall([SectorID])),
FilterOutEmpties = Table.SelectRows(CallEachFunction, each [WebCall] <> null),
#"Expanded WebCall" = Table.ExpandTableColumn(FilterOutEmpties, "WebCall", {"Name + type", "Sell (p)", "Change (p)", "Sector", "Valuation date"}, {"Name + type", "Sell (p)", "Change (p)", "Sector", "Valuation date"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded WebCall",{"SectorID", "SectorName"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Name + type", type text}, {"Sell (p)", type number}, {"Change (p)", type number}, {"Sector", type text}, {"Valuation date", type text}}),
#"Filtered Rows" = Table.SelectRows(#"Changed Type", each (not Number.IsNaN([#"Sell (p)"]))),
#"Changed Type1" = Table.TransformColumnTypes(#"Filtered Rows",{{"Change (p)", Currency.Type}})
in
#"Changed Type1"
Table "SectorIDs - 1" is below
121 | Asia Pacific Ex Japan |
114 | Asia Pacific Inc Japan |
155 | China/ Greater China |
111 | Europe Excluding UK |
102 | Europe Including UK |
117 | European Smaller Companies |
115 | Flexible Investment |
150 | GBP Corporate Bond |
151 | GBP High Yield |
152 | GBP Strategic Bond |
109 | Global |
106 | Global Bonds |
112 | Global Emerging Markets |
156 | Global Emerging Markets Bond |
139 | Global Equity Income |
113 | Japan |
132 | Japanese Smaller Companies |
133 | Latin America |
137 | Mixed Investment 0-35% Shares |
131 | Mixed Investment 20-60% Shares |
107 | Mixed Investment 40-85% Shares |
123 | Money Market |
999 | N/A |
110 | North America |
130 | North American Smaller Cos |
71 | Offshore |
400 | Pension Trusts |
108 | Property |
138 | Short Term Money Market |
129 | Specialist |
136 | Targeted Absolute Return |
134 | Technology & Telecoms |
101 | UK All Companies |
104 | UK Equity & Bond Income |
103 | UK Equity Income |
127 | UK Gilt |
135 | UK Index Linked Gilt |
118 | UK Smaller Companies |
48 | Unclassified |
157 | Volatility Managed |
I think it might be an anti-spam thing on this forum but they shorten any URLs in posted stuff so we don't know what the 2 lines with web links are.
I had a brief try with the Query. The table that gets returned with fund names is results__table not sortableTable.
The second web call (RearragnedURL ) presumably constructs each url with positions in the list e.g. You can see that the &start= parameter controls which fund is displayed on the page
https://www.hl.co.uk/funds/fund-discounts,-prices--and--factsheets/search-results?sectorid=121&start=0&rpp=20&lo=0%2C1&sort=fd.full_description&sort_dir=asc
https://www.hl.co.uk/funds/fund-discounts,-prices--and--factsheets/search-results?sectorid=121&start=20&rpp=20&lo=0%2C1&sort=fd.full_description&sort_dir=asc
Overall, I think the work involved is too complex for a forum issue. It needs time and effort.
I appreciate you having a go @HotChilli
Sadly, yes it is quite complex. I have spent most of the day manually scraping the website .... eeeek!
Yes, the table name has changed, apologies for not stating that in my original post. There are 39 "sectorid" and each sectorid, could possibly, have more than 1 page and with the change in the website you can't put page numbers in the URL. That's just another headache I have come across.
Anyway I appreciate you taking the time to look. I'll keep scratching my head and hopefully reach a solution
Thanks again
Another headache I faced was not able to navigate to the tab "Prices & Yields". This doesn't appear in the URL so I don't know how to access this tab via the query.
Many thanks
User | Count |
---|---|
93 | |
86 | |
78 | |
72 | |
66 |
User | Count |
---|---|
115 | |
106 | |
85 | |
65 | |
64 |