cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Helper I
Helper I

Failure extracting files from large zip archives in ZIP64 format

Issue: I'm unable to extract a csv file from a large zip file (1.52 GB compressed, >10 GB uncompressed). The zip file format is ZIP64 which I suspect is the issue since I'm able to successfully extract files from smaller zip files like this one using using Mark White's custom function. When I try to extract csv file from the large zip file mentioned above, Power Query gives me the following error: 

 

Rob_B_0-1600900610934.png

 

Details: As I mentioned above, I'm able to use Mark White's custom function for other zip files on the same website (using the Web.Contents function as the source) however it doesn't work on the zip file linked above. I've also tried @lbendlin's solution outlined on his blog post but I haven't been able to make that work either.

 

Has anyone else been able to successfully extract files from large zip files which use the ZIP64 file format?

 

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Super User V
Super User V

Re: Failure extracting files from large zip archives in ZIP64 format

here's how the central directory looks like

 

lbendlin_0-1601946101021.png

It starts at 61c63f46

lbendlin_1-1601946185193.png

Note that we are still talking unsigned 32 bit integers here. The only indication that this may be ZIP64 seems to be the FF instead of 00. The bad news is that all these FFFFFFFF are in the place where the file sizes are supposed to be. The actual file sizes (compressed and uncompressed) are nowhere to be seen.

 

Now, the compressed file size can be calculated under the assumption that the ZIP only includes one file. This would make the extraction possible if the ZIP file stays within the 32 bit limits (4GiB)  but it would likely break down for files larger than that.

 

I updated my blog entry with a modified function that can read your sample file successfully.

 

https://community.powerbi.com/t5/Community-Blog/Working-With-Zip-Files-in-Power-Query/bc-p/1414143#M...

 

 

View solution in original post

4 REPLIES 4
Highlighted
Super User V
Super User V

Re: Failure extracting files from large zip archives in ZIP64 format

Do we know which compression is used in your ZIP64 files?  You may need to raise an idea to add new decompression methods to 

 

https://docs.microsoft.com/en-us/powerquery-m/binary-decompress

 

According to Wikipedia https://en.wikipedia.org/wiki/Zip_(file_format)#ZIP64  there are extra directory entries in a ZIP64 file.  Can you post a sample file in ZIP64 format or explain how to create one ?

 

Last but not least - how much memory does your local Power BI Desktop has allocated, and will it be enough?

Highlighted
Helper I
Helper I

Re: Failure extracting files from large zip archives in ZIP64 format

Great idea @lbendlin! Unfortunately since the Power BI ideas portal started linking idea submissions accounts with work accounts, I'm not able to vote on ideas and up-vote other ideas. I'm working with our admins to have access but in the meantime, would you feel comfortable adding an idea?

 

As for the example, were you able to access the file in the link in my original post? My understanding is that file archivers will automatically use ZIP64 for files that exceed the size limits for ZIP. If you have a large .csv file handy, you can try compressing it and see if the properties show ZIP64. Here's an example of the properties from the file I shared the link to:

 

ZIP64 properites.png

 

My machine has 8GB of RAM, which hasn't been an issue for Power BI Desktop, even for datasets with multiple large .csv file so I expect/hope it will be sufficient. Ultimately I'm building a dataflow with this query so I'm not sure if I'll run into an issues there.

 

Thanks,

Rob

Highlighted
Super User V
Super User V

Re: Failure extracting files from large zip archives in ZIP64 format

here's how the central directory looks like

 

lbendlin_0-1601946101021.png

It starts at 61c63f46

lbendlin_1-1601946185193.png

Note that we are still talking unsigned 32 bit integers here. The only indication that this may be ZIP64 seems to be the FF instead of 00. The bad news is that all these FFFFFFFF are in the place where the file sizes are supposed to be. The actual file sizes (compressed and uncompressed) are nowhere to be seen.

 

Now, the compressed file size can be calculated under the assumption that the ZIP only includes one file. This would make the extraction possible if the ZIP file stays within the 32 bit limits (4GiB)  but it would likely break down for files larger than that.

 

I updated my blog entry with a modified function that can read your sample file successfully.

 

https://community.powerbi.com/t5/Community-Blog/Working-With-Zip-Files-in-Power-Query/bc-p/1414143#M...

 

 

View solution in original post

Highlighted
Helper I
Helper I

Re: Failure extracting files from large zip archives in ZIP64 format

It worked @lbendlin you're a genius! Thanks so much for your help!

Helpful resources

Announcements
Community Conference

Power Platform Community Conference

Check out the on demand sessions that are available now!

Upcoming Events

Experience what’s next for Power BI

See the latest Power BI innovations, updates, and demos from the Microsoft Business Applications Launch Event.

secondImage

Power Platform 2020 release wave 2 plan

Features releasing from October 2020 through March 2021

Get Ready for Power BI Dev Camp

Get Ready for Power BI Dev Camp

Mark your calendars and join us for our next Power BI Dev Camp!.

Top Solution Authors
Top Kudoed Authors