Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Data importing issue / approach for hyperlinks

Hi All, sorry I am new to all of this and to Power BI so apologies if I am asking very basic questions.

 

Objective:

  • I am trying to create a database which has a list of companies registered in a country with fiels like name of the company, address, ... and accounts information (e.g.: shareholders funds, creditors, debitors, profit, .... - the usual financial info)

 

Where I have got to:

  • The information I need is made available by an official website in two different type of files:
  1.  One massive csv file that include 4.5mil companies (this includes all information except for the financials - each company has an ID)
  2. Zipped folders (so 12 zip files) that include between 178K to 850K html files. Each file represent the financial accounts of each company within the 4.5mil csv file above

 - I have uploaded file (1) into Power BI and discovered that it does have a handy column which shows the hyperlinks to retrieve for each row (each company) related to their financial accounts potentially making useless downloading file (2) as well....

 

Issues:

  1. I don't know if / how to use those hyperlinks to import financials. Given the number (c.4.5mil) I suspect there may be a much easier / more efficient way to do so
  2. I have played with the importing of one of those HTML files to see how Power BI reads it. It looks like the HTML includes some table and with some effort I can isolate those tables that have relevant financials. The point though is going one by one will take probably a year... 😁

 

Thanks

Patrick

Annotation 2020-05-16 172135.jpg

1 ACCEPTED SOLUTION

Create an custome column and in the custom column formula use

 

= Csv.Document(Web.Contents([URI]),[Delimiter=",", Columns=18, Encoding=65001, QuoteStyle=QuoteStyle.None])

 

Reading from a website can take long so I would first try it with a few rows only. Again, if this doesn't work do share some sampel data so that I can be more specific in my answers

Please mark the question solved when done and consider giving kudos if posts are helpful.

Contact me privately for support with any larger-scale BI needs

Cheers 

SU18_powerbi_badge

View solution in original post

3 REPLIES 3
AlB
Super User
Super User

Hi @Anonymous 

Based on the table you are showing, with the URLs in the forst column, you could create a custom column in which you use a function to extract the info from each URL/Company. I've seen that the site offers the info in different file formats (json, html, csv...)

You could use something like:

 

Source = Csv.Document(Web.Contents( [FirstColumnNameHere_with_the_URLS]),[Delimiter=",", Columns=18, Encoding=65001, QuoteStyle=QuoteStyle.None])

 

and then add the transformations thta you need to get to the table format you want. After that you would have a table for each URL and just need to combine them all in one final table with all the finacial information. If you share a sample of the data (or point me to where I can download it) I'd be able to be more specific.

Please mark the question solved when done and consider giving kudos if posts are helpful.

Contact me privately for support with any larger-scale BI needs

Cheers 

SU18_powerbi_badge

Anonymous
Not applicable

Hi as requested I am posting through initial thread.

 

Firstly thanks for your help.   I have tried to use your 'Source' code for the custom column and typed it as per image but doesn't seem to work. It may be as very new to Power BI and its functions so would welcome if you can shed further lights on the why. Thanks in advance for your further assistance.

 

Annotation 2020-05-17 155625.jpg

Create an custome column and in the custom column formula use

 

= Csv.Document(Web.Contents([URI]),[Delimiter=",", Columns=18, Encoding=65001, QuoteStyle=QuoteStyle.None])

 

Reading from a website can take long so I would first try it with a few rows only. Again, if this doesn't work do share some sampel data so that I can be more specific in my answers

Please mark the question solved when done and consider giving kudos if posts are helpful.

Contact me privately for support with any larger-scale BI needs

Cheers 

SU18_powerbi_badge

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors