Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
prat91
Frequent Visitor

Power BI

Hi, 

 

Just seeking some help getting data from multiple files on a HTML web page via Power BI. 

 

 

Any help could do.

 

Cheers

9 REPLIES 9
Greg_Deckler
Super User
Super User

Going to need some additional detail to help, are you referring to files linked from a web page? What kinds of files?


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

HI @Greg_Deckler,

 

Thank you for attempting to help me out. 

The files are Tables in xls format stored on a web pages. I need to get those files straight into power BI without downloading them manually. This is so data refresh can work. 

 

Look forward to your response. 

 

You would use the web connector and if they are true tables on the web page, they should show up in the navigation.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

I have tried that, but unfortunately it doesnt work, even once I have expanded or drilled down on the columns. From my understanding there is a need to create a parameter and function, but not sure on the process of doing so. 

Try this

 

let
    Source = Excel.Workbook(Web.Contents("go.microsoft.com/fwlink/?LinkID=521962"), null, true),
    financials_Table = Source{[Item="financials",Kind="Table"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(financials_Table,{{"Segment", type text}, {"Country", type text}, {"Product", type text}, {"Discount Band", type text}, {"Units Sold", type number}, {"Manufacturing Price", Int64.Type}, {"Sale Price", Int64.Type}, {"Gross Sales", type number}, {"Discounts", type number}, {" Sales", type number}, {"COGS", type number}, {"Profit", type number}, {"Date", type date}, {"Month Number", Int64.Type}, {"Month Name", type text}, {"Year", Int64.Type}})
in
    #"Changed Type"

I connected to the site using the web connector, then clicked right mouse menu over the document and Selected Excel.

 

If you have lots of documents you can use functions to loop over the documents. There are several posts that explain how.

e.g.

https://powerbi-pro.com/iterating-over-an-unknown-number-of-pages-in-power-query/

 

Phil

 

 

Hi Phil, 

 

Yes I have to loop over multiple files. 

 

How would I go about doing that?

Hi @prat91,

 

Here is a good article about how to iterate over multiple pages of web data using Power Query for your reference. Smiley Happy

  1. Creating the initial query to access a single page of data
  2. Turning the query into a parameterized function
  3. Invoking the function for each page of data you want to retrieve

Regards

HI @v-ljerr-msft,

 

Thank you for the source.

 

I had a look, however, I need to loop over multipe xls files located on a single web page.

 

Is the process the same?

 

Ive have tried the mentioned steps, but still no success. 

 

Cheers

Loading the excel worked fine, but the tricky bit is getting the list of excel links from the website you sent is not easy.

 

I've done it 2 ways one is treating the web page as text instead of html then stripping out the xls links.

I also put an r script version that uses a html scraping library.

 

Once you have the list you can use a function to process into a list.

I did this by doing a single excel weblink, then using the Create Function option from the Queries List.

The paramaterise the link by putting a name in the source brackets  Source = (weblink) and using this name in place of the excel weblink.

 

Finally in the list you add new column using invoke function.

 

It's a bit slow but does work. I've filtered to the top 4 files, so remove this for all excels.

 

fLoadExcelRaw

 

let
    Source = (weblink) => let
        Source = Excel.Workbook(Web.Contents(weblink), null, true),
    Data2 = Source{[Name="Data1"]}[Data]
in
    Data2
in
    Source

 

This is the R scraping version

 

R Code

library(rvest)
page=read_html("WEBSITE")
data <- as.data.frame(html_attr(html_nodes(page, "a"), "href"))

M Query

 

let
    Source = Table.FromColumns({Lines.FromBinary(Web.Contents("WEBSITE"))}),
    #"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "Time Series Spreadsheet") and Text.Contains([Column1], "xls")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Column1", Splitter.SplitTextByEachDelimiter({"a href=""/"}, QuoteStyle.None, false), {"Column1.1", "Column1.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}}),
    #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.2.1", "Column1.2.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type1","WEBSITE","WEBSITE",Replacer.ReplaceText,{"Column1.2.1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Replaced Value",{{"Column1.2.1", "ExcelLink"}}),
    #"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Column1.1", "Column1.2.2"}),
    #"Kept First Rows" = Table.FirstN(#"Removed Columns",4),
    #"Invoked Custom Function" = Table.AddColumn(#"Kept First Rows", "ExcelData", each fLoadExcelRaw([ExcelLink])),
    #"Expanded ExcelData" = Table.ExpandTableColumn(#"Invoked Custom Function", "ExcelData", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29", "Column30", "Column31", "Column32", "Column33", "Column34", "Column35", "Column36", "Column37", "Column38", "Column39", "Column40", "Column41", "Column42", "Column43", "Column44", "Column45", "Column46", "Column47", "Column48", "Column49", "Column50", "Column51", "Column52", "Column53", "Column54", "Column55", "Column56", "Column57", "Column58", "Column59", "Column60", "Column61", "Column62", "Column63", "Column64", "Column65", "Column66", "Column67", "Column68", "Column69", "Column70", "Column71", "Column72", "Column73", "Column74", "Column75", "Column76", "Column77", "Column78", "Column79", "Column80", "Column81", "Column82", "Column83", "Column84", "Column85", "Column86", "Column87", "Column88", "Column89", "Column90", "Column91", "Column92", "Column93", "Column94", "Column95", "Column96", "Column97", "Column98", "Column99", "Column100", "Column101", "Column102", "Column103", "Column104", "Column105", "Column106", "Column107", "Column108", "Column109", "Column110", "Column111", "Column112", "Column113", "Column114", "Column115"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29", "Column30", "Column31", "Column32", "Column33", "Column34", "Column35", "Column36", "Column37", "Column38", "Column39", "Column40", "Column41", "Column42", "Column43", "Column44", "Column45", "Column46", "Column47", "Column48", "Column49", "Column50", "Column51", "Column52", "Column53", "Column54", "Column55", "Column56", "Column57", "Column58", "Column59", "Column60", "Column61", "Column62", "Column63", "Column64", "Column65", "Column66", "Column67", "Column68", "Column69", "Column70", "Column71", "Column72", "Column73", "Column74", "Column75", "Column76", "Column77", "Column78", "Column79", "Column80", "Column81", "Column82", "Column83", "Column84", "Column85", "Column86", "Column87", "Column88", "Column89", "Column90", "Column91", "Column92", "Column93", "Column94", "Column95", "Column96", "Column97", "Column98", "Column99", "Column100", "Column101", "Column102", "Column103", "Column104", "Column105", "Column106", "Column107", "Column108", "Column109", "Column110", "Column111", "Column112", "Column113", "Column114", "Column115"})
in
    #"Expanded ExcelData"

 

 

 

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.