Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
smpa01
Super User
Super User

Dyanmic Looping in List.Generate (Example-BoxOfficeMojo)

Hello Experts,

I was following the famous BoxOfficeMojo video for do/while loop YoutubeLink. It is great and I have modified the code a little bit as following than the solution shown in the video.

 

 

(page as number, year as number) as table =>
let
    Source = Web.Page(Web.Contents("http://boxofficemojo.com/yearly/chart/?page=" & Number.ToText(page) & "&view=releasedate&view2=domestic&yr=" & Number.ToText(year) & "&p=.htm")),
    Data1 = Source{1}[Data],
    RemoveBottom = Table.RemoveLastN(Data1,3)
in
    RemoveBottom

GD2

 

(x as number)=>
              let
                 Source = List.Generate(
                             ()=>
                                 [Page=1, Result=try GD2(1,x) otherwise null],
                                each [Result]<>null,
                                each [Page=[Page]+1, Result=try GD2([Page]+1,x) otherwise null ]                                
                            )
in
    Source

PG2

 

 

let
    Source = {1980..2019},
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "PG2", each PG2([Column1])),
    #"Expanded PG2" = Table.ExpandListColumn(#"Invoked Custom Function", "PG2"),
    #"Expanded PG1" = Table.ExpandRecordColumn(#"Expanded PG2", "PG2", {"Page", "Result"}, {"Page", "Result"}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Expanded PG1", "Result", {"Rank", "Movie Title (click to view)", "Studio", "Total Gross /Theaters", "Total Gross /Theaters2", "Opening /Theaters", "Opening /Theaters2", "Open"}, {"Rank", "Movie Title (click to view)", "Studio", "Total Gross /Theaters", "Total Gross /Theaters2", "Opening /Theaters", "Opening /Theaters2", "Open"})
in
    #"Expanded Result"

Result

 

 

Is there a way to generate an output similar to the result table by invoking the PG2 function with a starting year 1980

and

it would search all the pages from 1980 and generates a table by the corresponding page number on recursion

until

it exhausts all the pages within that year (1980 starting year) and generates an error

then

move on to next year

and

search all the pages by that year till exhuasted, gets an error and move on to next year and so on

until finally

hits the last page of the last year and the loop finishes

 

Currently PG2 ensures that all the pages are read within a particular year (generated by brute force). If one can replicate what I have in mind from passing only the first value of the starting year in PG2 one can generate the whole output only from passing the first value without needing to write the Result Table code in my example. It can also ensure that once a starting year value is mentioned, data for no years are missed from thereron without needing the coder to mention them at all.

 

Not sure if this has been asked beofre. if yes, apologies and if someone can please point me to the right direction would be great.

 

Thank you in adavance.

 

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
1 ACCEPTED SOLUTION

@d_gosbellI picked up the cue from your suggestion "what PG2 does to loop through all the pages in order to loop through all the years" and many thanks to Gil Raviv for teaching the nested loop through - NestedLoop-GilRaviv

 

(page as number, year as number) as table =>
let
    Source = Web.Page(Web.Contents("http://boxofficemojo.com/yearly/chart/?page=" & Number.ToText(page) & "&view=releasedate&view2=domestic&yr=" & Number.ToText(year) & "&p=.htm")),
    Data1 = Source{1}[Data],
    RemoveBottom = Table.RemoveLastN(Data1,3)
in
    RemoveBottom

 

GD2

 

let
    maxYear = 3000,
    Source = Web.BrowserContents("https://www.boxofficemojo.com/yearly/chart/?page=1&view=releasedate&view2=domestic&yr=" & Number.ToText(maxYear) & "&p=.htm"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2)"}, {"Column2", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1)"}}, [RowSelector="DIV[id='body'] > TABLE:nth-child(5) > TR, DIV[id='body'] > TABLE:nth-child(5) > * > TR"]),
    #"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{"Column1", type text}, {"Column2", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Column1.1", "Column1.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", Int64.Type}, {"Column1.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1.2", "Column2"}),
    pageYear = Table.RenameColumns(#"Removed Columns",{{"Column1.1", "Year"}}),
    Year = pageYear{0}[Year]
in
    Year

Max_Year

 

 

 

(x as number, y as number)=>
              let
                 Source = List.Generate(
                             ()=>
                                 [Page=1, Result=try GD2(y,x) otherwise null, year=x],
                                each [Result]<>null,
                                each [Page=[Page]+1, Result=try GD2([Page]+1,x) otherwise null, year=x]                                
                            )
in
    Source

fnInnerLoop

 

 

(year as number, fnXfunction)=>
              let
                 Source = List.Generate(
                             ()=>
                                 [Page=1, Year=year],
                                each [Year]<=Max_Year,
                                each [Page=1, Year=[Year]+1],
                                each fnXfunction([Year],[Page])
                                                               
                            )
in
    Source

fnOuterLoop

 

 

let
    Source = fnOuterLoop(1980,fnInnerLoop),
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1"),
    #"Expanded Column2" = Table.ExpandRecordColumn(#"Expanded Column1", "Column1", {"Page", "Result", "year"}, {"Column1.Page", "Column1.Result", "Column1.year"})
in
    #"Expanded Column2"

Output

 

The main takeaway from this exercise is that if the Iteration Termination clause is not desired to be mentioned by brute force it must come dynamically from another query (Max_Year in this case, thanks to @d_gosbell ).

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

View solution in original post

4 REPLIES 4
d_gosbell
Super User
Super User

In theory, one possible approach would be to do a similar thing to what PG2 does to loop through all the pages in order to loop through all the years. But unfortunately that will not work with BoxOfficeMojo as if you ask for a future year BoxOfficeMojo will not return an empty result, instead it returns the latest year it has (currently 2019) so the loop will keep running indefinitely.

 

So specifically for BoxOfficeMojo you could exploit this by asking for a year far into the future and then reading the year off the page itself, then using this max year in the generation of your list of years.

 

eg.

 

let
    maxYear = 3000,
    Source = Web.BrowserContents("https://www.boxofficemojo.com/yearly/chart/?page=1&view=releasedate&view2=domestic&yr=" & Number.ToText(maxYear) & "&p=.htm"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2)"}, {"Column2", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1)"}}, [RowSelector="DIV[id='body'] > TABLE:nth-child(5) > TR, DIV[id='body'] > TABLE:nth-child(5) > * > TR"]),
    #"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{"Column1", type text}, {"Column2", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Column1.1", "Column1.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", Int64.Type}, {"Column1.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1.2", "Column2"}),
    pageYear = Table.RenameColumns(#"Removed Columns",{{"Column1.1", "Year"}}),
    Year = pageYear{0}[Year]
in
    Year

@d_gosbellI picked up the cue from your suggestion "what PG2 does to loop through all the pages in order to loop through all the years" and many thanks to Gil Raviv for teaching the nested loop through - NestedLoop-GilRaviv

 

(page as number, year as number) as table =>
let
    Source = Web.Page(Web.Contents("http://boxofficemojo.com/yearly/chart/?page=" & Number.ToText(page) & "&view=releasedate&view2=domestic&yr=" & Number.ToText(year) & "&p=.htm")),
    Data1 = Source{1}[Data],
    RemoveBottom = Table.RemoveLastN(Data1,3)
in
    RemoveBottom

 

GD2

 

let
    maxYear = 3000,
    Source = Web.BrowserContents("https://www.boxofficemojo.com/yearly/chart/?page=1&view=releasedate&view2=domestic&yr=" & Number.ToText(maxYear) & "&p=.htm"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Column1", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2)"}, {"Column2", "DIV[id='body'] > TABLE:nth-child(5) > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1), DIV[id='body'] > TABLE:nth-child(5) > * > TR > TD:not([colspan]):not([rowspan]):nth-child(1):nth-last-child(2) + TD:not([colspan]):not([rowspan]):nth-child(2):nth-last-child(1)"}}, [RowSelector="DIV[id='body'] > TABLE:nth-child(5) > TR, DIV[id='body'] > TABLE:nth-child(5) > * > TR"]),
    #"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{"Column1", type text}, {"Column2", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv, false), {"Column1.1", "Column1.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", Int64.Type}, {"Column1.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"Column1.2", "Column2"}),
    pageYear = Table.RenameColumns(#"Removed Columns",{{"Column1.1", "Year"}}),
    Year = pageYear{0}[Year]
in
    Year

Max_Year

 

 

 

(x as number, y as number)=>
              let
                 Source = List.Generate(
                             ()=>
                                 [Page=1, Result=try GD2(y,x) otherwise null, year=x],
                                each [Result]<>null,
                                each [Page=[Page]+1, Result=try GD2([Page]+1,x) otherwise null, year=x]                                
                            )
in
    Source

fnInnerLoop

 

 

(year as number, fnXfunction)=>
              let
                 Source = List.Generate(
                             ()=>
                                 [Page=1, Year=year],
                                each [Year]<=Max_Year,
                                each [Page=1, Year=[Year]+1],
                                each fnXfunction([Year],[Page])
                                                               
                            )
in
    Source

fnOuterLoop

 

 

let
    Source = fnOuterLoop(1980,fnInnerLoop),
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Expanded Column1" = Table.ExpandListColumn(#"Converted to Table", "Column1"),
    #"Expanded Column2" = Table.ExpandRecordColumn(#"Expanded Column1", "Column1", {"Page", "Result", "year"}, {"Column1.Page", "Column1.Result", "Column1.year"})
in
    #"Expanded Column2"

Output

 

The main takeaway from this exercise is that if the Iteration Termination clause is not desired to be mentioned by brute force it must come dynamically from another query (Max_Year in this case, thanks to @d_gosbell ).

 

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Yes, that fnInnerLoop and fnOuterLoop pattern was the sort of approach I was suggesting. 🙂

@d_gosbellthank you very much for looking into this and I adopted your solution to the "Result" which basically determines the max year for the list. I named your query as Query5 and fed into the code as following.

let
    Source = {1980..Query5},
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Changed Type" = Table.TransformColumnTypes(#"Converted to Table",{{"Column1", Int64.Type}}),
    #"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "PG2", each PG2([Column1])),
    #"Expanded PG2" = Table.ExpandListColumn(#"Invoked Custom Function", "PG2"),
    #"Expanded PG1" = Table.ExpandRecordColumn(#"Expanded PG2", "PG2", {"Page", "Result"}, {"Page", "Result"}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Expanded PG1", "Result", {"Rank", "Movie Title (click to view)", "Studio", "Total Gross /Theaters", "Total Gross /Theaters2", "Opening /Theaters", "Opening /Theaters2", "Open"}, {"Rank", "Movie Title (click to view)", "Studio", "Total Gross /Theaters", "Total Gross /Theaters2", "Opening /Theaters", "Opening /Theaters2", "Open"})
in
    #"Expanded Result"

However, the objective of my question was to find out whether List.Generate is capable at all of what I want it to do. If yes, how to do that nesting. Before asking the question I did not realise that for any values beyond 2019 boxofficemojo URL returns 2019 values (instead of generating an error what I expected list.generate to run into and terminate looping). But if you can imagine that it does generate an error beyond 2019 values (assuming it can satisfy the terminate condition) my question is how to rewrite the code for PG2 which alone would be capable enough of producing what Result query can do once the first two values (Page=1, Year=1980) passes through PG2.

 

What current GD2 doesWhat current GD2 does

What I want GD2 to do (if possible)What I want GD2 to do (if possible)

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.