Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
pat_energetics
Advocate II
Advocate II

Download and unzip multiple files from URL then query csv file/s within

Hi all,

 

I am trying to automate a process that involves;

1 - File/s download from given fixed URL,

2 - Unzip contents,

3 - Query contained CSV file/s.

 

The URL remains fixed, but files available for download are updated at regular frequency and so do not parmeterize easily using - for example - steps discussed here by Chris Webb and others. Ideally I would like to have a table of downloaded file names to avoid downloading the same file multiple times. Having a local copy of the files downloaded would be an advantage in this case.

 

If zip files were downloaded to a local directory, there seem to be a few different approaches to unzipping contents using either R script here from @prathy or a Power Query M language approach discussed here at Mark White's sql10 blogspot.

 

The last step has been completed using power query within Power BI, but currently relies on a manual process to download and unzip contents to a local directory.

 

I would be very interested to know anyone's experience with this.

 

Questions

Is the R script or the Power Query M the best approach for now given the rate at which Power BI data connectors are being developed ?

Is a seamless download , unzip and query all within Power BI too far away ?

5 REPLIES 5

Is the URL a folder that contains the zipped files?

Yes - an example of one URL is here.

Anonymous
Not applicable

If you open that URL as a web page in Power Query then drill down through the tables, you can get a list of ZIP files:

let
    Source = Web.Page(Web.Contents("http://www.nemweb.com.au/REPORTS/CURRENT/Daily_Reports/")),
    Data0 = Source{0}[Data],
    Children = Data0{0}[Children],
    Children1 = Children{1}[Children],
    Children2 = Children1{2}[Children],
    #"Removed Other Columns" = Table.SelectColumns(Children2,{"Children"}),
    #"Expanded Children" = Table.ExpandTableColumn(#"Removed Other Columns", "Children", {"Text"}, {"Children.Text"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Children", each ([Children.Text] <> null and [Children.Text] <> "[To Parent Directory]"))
in
    #"Filtered Rows"

You could save that as a CSV each day with the R script you mention, then join with the result above the following day to find any new files, and pass those as parameters to a function to Unzip-process-save as CSV etc.

 

Is that the sort of thing you're after?

Yes @Anonymous - your suggested process would seem to fit, however I am concerned about the maintainability and portability given the number of steps and reliance on coming in and out of R scripts.

 

One of the original intents of the question was to draw an indication from the Power BI  / Power Query development team if this something that is on the radar. 

 

As public data sets are becoming increasing large and available on line as zip files, the ability to connect to a web source, download, unzip and then process without specialist coding in Power BI would be a distinct product advantage. There appear to be many pieces of the puzzle already, so joining the dots would be a fantastic improvement !

 

Maybe this could go to the ideas forum / group ??

Anonymous
Not applicable

I too would be intrigued to see what feedback you get from the Dev team on this.  There are at least 3 entries on this already in the ideas forum - see https://ideas.powerbi.com/search?filter=ideas&query=zip%20file

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors