Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi,
I'm trying to get 13 years of financial data for Apple from this page https://www.macrotrends.net/stocks/charts/AAPL/apple/income-statement .
When I'm accessing the web page in browser I'm able to see almost all data:
But when I'm trying to scrap de data using PowerBI I only get the first 6 years of data.
Although in browser I see details from 2021 back to 2009, in PowerBI I see only 2021 to 2016
Is there a solution to be able to take at least 10 years of data?
Solved! Go to Solution.
If you look in the HTML code, there's a section that looks like
var originalData = [{"field_name": ...bunch of data... }];
This contains all of the data in the table and is in a JSON format, so we can write the full query along these lines:
let
#"HTML Code" = Web.BrowserContents("https://www.macrotrends.net/stocks/charts/AAPL/apple/income-statement"),
#"Split Text" = "[" & Text.BetweenDelimiters(List.Single(List.Select(Text.Split(#"HTML Code", "var "), each Text.StartsWith(_, "originalData = "))),"[","]") & "]",
#"Parsed JSON" = Json.Document(#"Split Text"),
#"Converted to Table" = Table.FromList(#"Parsed JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"field_name", "2021-09-30", "2020-09-30", "2019-09-30", "2018-09-30", "2017-09-30", "2016-09-30", "2015-09-30", "2014-09-30", "2013-09-30", "2012-09-30", "2011-09-30", "2010-09-30", "2009-09-30"}, {"field_name", "2021-09-30", "2020-09-30", "2019-09-30", "2018-09-30", "2017-09-30", "2016-09-30", "2015-09-30", "2014-09-30", "2013-09-30", "2012-09-30", "2011-09-30", "2010-09-30", "2009-09-30"}),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Expanded Column1", {{"field_name", each Text.BetweenDelimiters(_, ">", "<"), type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Extracted Text Between Delimiters", {"field_name"}, "Date", "Value"),
#"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Date", type date}, {"Value", Currency.Type}})
in
#"Changed Type"
This results in a nice clean unpivoted tabular format.
If you look in the HTML code, there's a section that looks like
var originalData = [{"field_name": ...bunch of data... }];
This contains all of the data in the table and is in a JSON format, so we can write the full query along these lines:
let
#"HTML Code" = Web.BrowserContents("https://www.macrotrends.net/stocks/charts/AAPL/apple/income-statement"),
#"Split Text" = "[" & Text.BetweenDelimiters(List.Single(List.Select(Text.Split(#"HTML Code", "var "), each Text.StartsWith(_, "originalData = "))),"[","]") & "]",
#"Parsed JSON" = Json.Document(#"Split Text"),
#"Converted to Table" = Table.FromList(#"Parsed JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"field_name", "2021-09-30", "2020-09-30", "2019-09-30", "2018-09-30", "2017-09-30", "2016-09-30", "2015-09-30", "2014-09-30", "2013-09-30", "2012-09-30", "2011-09-30", "2010-09-30", "2009-09-30"}, {"field_name", "2021-09-30", "2020-09-30", "2019-09-30", "2018-09-30", "2017-09-30", "2016-09-30", "2015-09-30", "2014-09-30", "2013-09-30", "2012-09-30", "2011-09-30", "2010-09-30", "2009-09-30"}),
#"Extracted Text Between Delimiters" = Table.TransformColumns(#"Expanded Column1", {{"field_name", each Text.BetweenDelimiters(_, ">", "<"), type text}}),
#"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Extracted Text Between Delimiters", {"field_name"}, "Date", "Value"),
#"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"Date", type date}, {"Value", Currency.Type}})
in
#"Changed Type"
This results in a nice clean unpivoted tabular format.
When you need to scrap data from tables which are constructed from html pages using <div> tags, then situation becomes challenging. Html.Table function is used by Power BI to extract data from children and it can go to a depth of 7 children only. That is the limit. Hence, only 7 columns are shown and you will not get columns beyond first 7 columns.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
102 | |
53 | |
21 | |
12 | |
12 |