Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
dax50
New Member

Webscrape through a tbl column of links & retrieving data for multiple pages - pls help.

Hi, 

I build a basic webscraper that is running through a table of URLs for underlying stocks and loading the first page of the resulting derivatives data using the function below.
Since many of the underlying stocks have more than 100 results (= more than 1 result page), I want to load the data from the first 3 pages of each underlying stock (pages 0 to 2), which is what I am not getting to work.

I know that the topic with multiple pages came up before (reading posts here as well as watching YT videos like MyOnlineTrainingHub and The BIccountant ), but I can not get any of the solutions to work, because it seems to be a combination of 2 functions and my understanding of Power Query is just not good enough (I literally started 3 days ago) so any help would be really appreciated. 

This is what the table with the links looks like: 

table.png


This is the structure of the URL in the table column 'cURL'  (the page parameter is already there, but I cannot get it to loop through it and then go to the next link in the column):

https://www.onvista.de/derivate/Knock-Outs/Knock-Outs-auf-Covestro?sort=coverRatio&order=DESC&cols=instrument,strikeAbs,knockOutAbs,dateMaturity,issuer.name,coverRatio&idExerciseRight=2&knockOutAbsRange=19.6;25.6&page=


I built this function to run through the table column of links....

(URL) as table =>

let
    Quelle = Web.Page(Web.Contents(URL)),
    Data0 = Quelle{0}[Data],
    #"Geänderter Typ" = Table.TransformColumnTypes(Data0,{{"WKN", type text}, {"Basispreis", type text}, {"K.O.-Schwelle", type text}, {"Bewertungstag", type text}, {"Bezugsverhältnis", type text}, {"Emittent", type text}}),
    #"Gefilterte Zeilen" = Table.SelectRows(#"Geänderter Typ", each ([WKN] <> "Werbung" and [WKN] <> "Werbung von Société Générale"))
in
    #"Gefilterte Zeilen"


...and use the function to get the data in this query, adding a custom column:

let
    Quelle = Excel.CurrentWorkbook(){[Name="KO_URL"]}[Content],
    #"Geänderter Typ" = Table.TransformColumnTypes(Quelle,{{"ID_ISIN", type text}, {"SYMBOL", type text}, {"ROOT_LINK", type text}, {"CLASSE", Int64.Type}, {"C_LOW", type number}, {"C_HIGH", type number}, {"PX_LAST", type number}, {"P_LOW", type number}, {"P_HIGH", type number}, {"cURL", type text}, {"pURL", type any}, {"Hyperlinks", type text}}),
    #"Hinzugefügte benutzerdefinierte Spalte" = Table.AddColumn(#"Geänderter Typ", "fetch_cKOs", each fx_KOs([cURL])),
    #"Hinzugefügte benutzerdefinierte Spalte1" = Table.AddColumn(#"Hinzugefügte benutzerdefinierte Spalte", "PC", each "CALL"),
    #"Entfernte Fehler" = Table.RemoveRowsWithErrors(#"Hinzugefügte benutzerdefinierte Spalte1", {"fetch_cKOs"}),
    #"Erweiterte fetch_cKOs" = Table.ExpandTableColumn(#"Entfernte Fehler", "fetch_cKOs", {"WKN", "Basispreis", "K.O.-Schwelle", "Bewertungstag", "Bezugsverhältnis", "Emittent"}, {"WKN", "Basispreis", "K.O.-Schwelle", "Bewertungstag", "Bezugsverhältnis", "Emittent"})
in
    #"Erweiterte fetch_cKOs"

 

1 ACCEPTED SOLUTION
v-jingzhang
Community Support
Community Support

Hi @dax50 

 

Since you want to load the data from the first 3 pages of each underlying stock (pages 0 to 2), you can try something like below. First add a custom column to the Url table with a list value {0,1,2}.

vjingzhang_1-1661760532738.png

 

Expand the list column to New rows. Then add another custom column by concatenating the original cURL column and the new pageId column. Use Text.From() to surround pageId as it is numeric data type. Your will get a new cURL column which has links for the first 3 pages for every underlying stock. 

vjingzhang_2-1661761413933.png

 

After that, use the new cURL column to invoke your custom function fx_KOs([new cURL])

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("XY1BC4JAEIX/i2ddS0ioWCLMU0RgR+uw6VSDuSs7s1r/vhUv0u177zHflGXwYu5oE8fDMAijeyRWooa4Bou9YoiP2lRNdHZMM4yUe0SZ6YHYmh0Zy7LyyRaK0VzdYpGkxnqFPOSXbMqVeZNE7Q9cC5pDD9jA/k5hM2q9deTavzwpdhb5GyKRAyu0aiH812Odf8BWSFDg88UymeqZq1D6CXK5Fuk2WYl02jvlu+B2+wE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [cURL = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"cURL", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "pageId", each {0,1,2}),
    #"Expanded pageId" = Table.ExpandListColumn(#"Added Custom", "pageId"),
    #"Added Custom1" = Table.AddColumn(#"Expanded pageId", "new cURL", each [cURL] & Text.From([pageId])),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom", each fx_KOs([new cURL]))
in
    #"Added Custom2"

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

View solution in original post

2 REPLIES 2
v-jingzhang
Community Support
Community Support

Hi @dax50 

 

Since you want to load the data from the first 3 pages of each underlying stock (pages 0 to 2), you can try something like below. First add a custom column to the Url table with a list value {0,1,2}.

vjingzhang_1-1661760532738.png

 

Expand the list column to New rows. Then add another custom column by concatenating the original cURL column and the new pageId column. Use Text.From() to surround pageId as it is numeric data type. Your will get a new cURL column which has links for the first 3 pages for every underlying stock. 

vjingzhang_2-1661761413933.png

 

After that, use the new cURL column to invoke your custom function fx_KOs([new cURL])

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("XY1BC4JAEIX/i2ddS0ioWCLMU0RgR+uw6VSDuSs7s1r/vhUv0u177zHflGXwYu5oE8fDMAijeyRWooa4Bou9YoiP2lRNdHZMM4yUe0SZ6YHYmh0Zy7LyyRaK0VzdYpGkxnqFPOSXbMqVeZNE7Q9cC5pDD9jA/k5hM2q9deTavzwpdhb5GyKRAyu0aiH812Odf8BWSFDg88UymeqZq1D6CXK5Fuk2WYl02jvlu+B2+wE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [cURL = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"cURL", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "pageId", each {0,1,2}),
    #"Expanded pageId" = Table.ExpandListColumn(#"Added Custom", "pageId"),
    #"Added Custom1" = Table.AddColumn(#"Expanded pageId", "new cURL", each [cURL] & Text.From([pageId])),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Custom", each fx_KOs([new cURL]))
in
    #"Added Custom2"

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

dax50
New Member

sorry to bump this up, but Is this too trivial of a question or too hard?


Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors