Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello Experts,
I was following the famous BoxOfficeMojo video for do/while loop YoutubeLink. It is great and I have modified the code a little bit as following than the solution shown in the video.
(page as number, year as number) as table => let Source = Web.Page(Web.Contents("http://boxofficemojo.com/yearly/chart/?page=" & Number.ToText(page) & "&view=releasedate&view2=domestic&yr=" & Number.ToText(year) & "&p=.htm")), Data1 = Source{1}[Data], RemoveBottom = Table.RemoveLastN(Data1,3) in RemoveBottom
GD2
(x as number)=> let Source = List.Generate( ()=> [Page=1, Result=try GD2(1,x) otherwise null], each [Result]<>null, each [Page=[Page]+1, Result=try GD2([Page]+1,x) otherwise null ] ) in Source
PG2
let Source = {1980..2019}, #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error), #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", Int64.Type}}), #"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "PG2", each PG2([Column1])), #"Expanded PG2" = Table.ExpandListColumn(#"Invoked Custom Function", "PG2"), #"Expanded PG1" = Table.ExpandRecordColumn(#"Expanded PG2", "PG2", {"Page", "Result"}, {"Page", "Result"}), #"Expanded Result" = Table.ExpandTableColumn(#"Expanded PG1", "Result", {"Rank", "Movie Title (click to view)", "Studio", "Total Gross /Theaters", "Total Gross /Theaters2", "Opening /Theaters", "Opening /Theaters2", "Open"}, {"Rank", "Movie Title (click to view)", "Studio", "Total Gross /Theaters", "Total Gross /Theaters2", "Opening /Theaters", "Opening /Theaters2", "Open"}) in #"Expanded Result"
Result
Is there a way to generate an output similar to the result table by invoking the PG2 function with a starting year 1980
and
it would search all the pages from 1980 and generates a table by the corresponding page number on recursion
until
it exhausts all the pages within that year (1980 starting year) and generates an error
then
move on to next year
and
search all the pages by that year till exhuasted, gets an error and move on to next year and so on
until finally
hits the last page of the last year and the loop finishes
Currently PG2 ensures that all the pages are read within a particular year (generated by brute force). If one can replicate what I have in mind from passing only the first value of the starting year in PG2 one can generate the whole output only from passing the first value without needing to write the Result Table code in my example. It can also ensure that once a starting year value is mentioned, data for no years are missed from thereron without needing the coder to mention them at all.
Not sure if this has been asked beofre. if yes, apologies and if someone can please point me to the right direction would be great.
Thank you in adavance.
Solved! Go to Solution.
@d_gosbellI picked up the cue from your suggestion "what PG2 does to loop through all the pages in order to loop through all the years" and many thanks to Gil Raviv for teaching the nested loop through - NestedLoop-GilRaviv
(page as number, year as number) as table => let Source = Web.Page(Web.Contents("http://boxofficemojo.com/yearly/chart/?page=" & Number.ToText(page) & "&view=releasedate&view2=domestic&yr=" & Number.ToText(year) & "&p=.htm")), Data1 = Source{1}[Data], RemoveBottom = Table.RemoveLastN(Data1,3) in RemoveBottom
GD2
let maxYear = 3000, Source = Web.BrowserContents("https://www.boxofficemojo.com/yearly/chart/?page=1&view=releasedate&view2=domestic&yr=" & Number.ToText(maxYear) & "&p=.htm"), #"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2)"}, {"Column2", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1)"}}, [RowSelector="DIV[id='body'] > TABLE:nth-child(5) > TR, DIV[id='body'] > TABLE:nth-child(5) > * > TR"]), #"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{"Column1", type text}, {"Column2", type text}}), #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Column1.1", "Column1.2"}), #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", Int64.Type}, {"Column1.2", type text}}), #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1.2", "Column2"}), pageYear = Table.RenameColumns(#"Removed Columns",{{"Column1.1", "Year"}}), Year = pageYear{0}[Year] in Year
Max_Year
(x as number, y as number)=> let Source = List.Generate( ()=> [Page=1, Result=try GD2(y,x) otherwise null, year=x], each [Result]<>null, each [Page=[Page]+1, Result=try GD2([Page]+1,x) otherwise null, year=x] ) in Source
fnInnerLoop
(year as number, fnXfunction)=> let Source = List.Generate( ()=> [Page=1, Year=year], each [Year]<=Max_Year, each [Page=1, Year=[Year]+1], each fnXfunction([Year],[Page]) ) in Source
fnOuterLoop
let Source = fnOuterLoop(1980,fnInnerLoop), #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error), #"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1"), #"Expanded Column2" = Table.ExpandRecordColumn(#"Expanded Column1", "Column1", {"Page", "Result", "year"}, {"Column1.Page", "Column1.Result", "Column1.year"}) in #"Expanded Column2"
Output
The main takeaway from this exercise is that if the Iteration Termination clause is not desired to be mentioned by brute force it must come dynamically from another query (Max_Year in this case, thanks to @d_gosbell ).
In theory, one possible approach would be to do a similar thing to what PG2 does to loop through all the pages in order to loop through all the years. But unfortunately that will not work with BoxOfficeMojo as if you ask for a future year BoxOfficeMojo will not return an empty result, instead it returns the latest year it has (currently 2019) so the loop will keep running indefinitely.
So specifically for BoxOfficeMojo you could exploit this by asking for a year far into the future and then reading the year off the page itself, then using this max year in the generation of your list of years.
eg.
let maxYear = 3000, Source = Web.BrowserContents("https://www.boxofficemojo.com/yearly/chart/?page=1&view=releasedate&view2=domestic&yr=" & Number.ToText(maxYear) & "&p=.htm"), #"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2)"}, {"Column2", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1)"}}, [RowSelector="DIV[id='body'] > TABLE:nth-child(5) > TR, DIV[id='body'] > TABLE:nth-child(5) > * > TR"]), #"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{"Column1", type text}, {"Column2", type text}}), #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Column1.1", "Column1.2"}), #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", Int64.Type}, {"Column1.2", type text}}), #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1.2", "Column2"}), pageYear = Table.RenameColumns(#"Removed Columns",{{"Column1.1", "Year"}}), Year = pageYear{0}[Year] in Year
@d_gosbellI picked up the cue from your suggestion "what PG2 does to loop through all the pages in order to loop through all the years" and many thanks to Gil Raviv for teaching the nested loop through - NestedLoop-GilRaviv
(page as number, year as number) as table => let Source = Web.Page(Web.Contents("http://boxofficemojo.com/yearly/chart/?page=" & Number.ToText(page) & "&view=releasedate&view2=domestic&yr=" & Number.ToText(year) & "&p=.htm")), Data1 = Source{1}[Data], RemoveBottom = Table.RemoveLastN(Data1,3) in RemoveBottom
GD2
let maxYear = 3000, Source = Web.BrowserContents("https://www.boxofficemojo.com/yearly/chart/?page=1&view=releasedate&view2=domestic&yr=" & Number.ToText(maxYear) & "&p=.htm"), #"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2)"}, {"Column2", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1)"}}, [RowSelector="DIV[id='body'] > TABLE:nth-child(5) > TR, DIV[id='body'] > TABLE:nth-child(5) > * > TR"]), #"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{"Column1", type text}, {"Column2", type text}}), #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Column1.1", "Column1.2"}), #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", Int64.Type}, {"Column1.2", type text}}), #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1.2", "Column2"}), pageYear = Table.RenameColumns(#"Removed Columns",{{"Column1.1", "Year"}}), Year = pageYear{0}[Year] in Year
Max_Year
(x as number, y as number)=> let Source = List.Generate( ()=> [Page=1, Result=try GD2(y,x) otherwise null, year=x], each [Result]<>null, each [Page=[Page]+1, Result=try GD2([Page]+1,x) otherwise null, year=x] ) in Source
fnInnerLoop
(year as number, fnXfunction)=> let Source = List.Generate( ()=> [Page=1, Year=year], each [Year]<=Max_Year, each [Page=1, Year=[Year]+1], each fnXfunction([Year],[Page]) ) in Source
fnOuterLoop
let Source = fnOuterLoop(1980,fnInnerLoop), #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error), #"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1"), #"Expanded Column2" = Table.ExpandRecordColumn(#"Expanded Column1", "Column1", {"Page", "Result", "year"}, {"Column1.Page", "Column1.Result", "Column1.year"}) in #"Expanded Column2"
Output
The main takeaway from this exercise is that if the Iteration Termination clause is not desired to be mentioned by brute force it must come dynamically from another query (Max_Year in this case, thanks to @d_gosbell ).
Yes, that fnInnerLoop and fnOuterLoop pattern was the sort of approach I was suggesting. 🙂
@d_gosbellthank you very much for looking into this and I adopted your solution to the "Result" which basically determines the max year for the list. I named your query as Query5 and fed into the code as following.
let Source = {1980..Query5}, #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error), #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", Int64.Type}}), #"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "PG2", each PG2([Column1])), #"Expanded PG2" = Table.ExpandListColumn(#"Invoked Custom Function", "PG2"), #"Expanded PG1" = Table.ExpandRecordColumn(#"Expanded PG2", "PG2", {"Page", "Result"}, {"Page", "Result"}), #"Expanded Result" = Table.ExpandTableColumn(#"Expanded PG1", "Result", {"Rank", "Movie Title (click to view)", "Studio", "Total Gross /Theaters", "Total Gross /Theaters2", "Opening /Theaters", "Opening /Theaters2", "Open"}, {"Rank", "Movie Title (click to view)", "Studio", "Total Gross /Theaters", "Total Gross /Theaters2", "Opening /Theaters", "Opening /Theaters2", "Open"}) in #"Expanded Result"
However, the objective of my question was to find out whether List.Generate is capable at all of what I want it to do. If yes, how to do that nesting. Before asking the question I did not realise that for any values beyond 2019 boxofficemojo URL returns 2019 values (instead of generating an error what I expected list.generate to run into and terminate looping). But if you can imagine that it does generate an error beyond 2019 values (assuming it can satisfy the terminate condition) my question is how to rewrite the code for PG2 which alone would be capable enough of producing what Result query can do once the first two values (Page=1, Year=1980) passes through PG2.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
110 | |
98 | |
78 | |
64 | |
55 |
User | Count |
---|---|
143 | |
109 | |
89 | |
84 | |
66 |