Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Rob_B
Helper I
Helper I

Failure extracting files from large zip archives in ZIP64 format

Issue: I'm unable to extract a csv file from a large zip file (1.52 GB compressed, >10 GB uncompressed). The zip file format is ZIP64 which I suspect is the issue since I'm able to successfully extract files from smaller zip files like this one using using Mark White's custom function. When I try to extract csv file from the large zip file mentioned above, Power Query gives me the following error: 

 

Rob_B_0-1600900610934.png

 

Details: As I mentioned above, I'm able to use Mark White's custom function for other zip files on the same website (using the Web.Contents function as the source) however it doesn't work on the zip file linked above. I've also tried @lbendlin's solution outlined on his blog post but I haven't been able to make that work either.

 

Has anyone else been able to successfully extract files from large zip files which use the ZIP64 file format?

 

1 ACCEPTED SOLUTION

here's how the central directory looks like

 

lbendlin_0-1601946101021.png

It starts at 61c63f46

lbendlin_1-1601946185193.png

Note that we are still talking unsigned 32 bit integers here. The only indication that this may be ZIP64 seems to be the FF instead of 00. The bad news is that all these FFFFFFFF are in the place where the file sizes are supposed to be. The actual file sizes (compressed and uncompressed) are nowhere to be seen.

 

Now, the compressed file size can be calculated under the assumption that the ZIP only includes one file. This would make the extraction possible if the ZIP file stays within the 32 bit limits (4GiB)  but it would likely break down for files larger than that.

 

I updated my blog entry with a modified function that can read your sample file successfully.

 

https://community.powerbi.com/t5/Community-Blog/Working-With-Zip-Files-in-Power-Query/bc-p/1414143#M...

 

 

View solution in original post

4 REPLIES 4
lbendlin
Super User
Super User

Do we know which compression is used in your ZIP64 files?  You may need to raise an idea to add new decompression methods to 

 

https://docs.microsoft.com/en-us/powerquery-m/binary-decompress

 

According to Wikipedia https://en.wikipedia.org/wiki/Zip_(file_format)#ZIP64  there are extra directory entries in a ZIP64 file.  Can you post a sample file in ZIP64 format or explain how to create one ?

 

Last but not least - how much memory does your local Power BI Desktop has allocated, and will it be enough?

Great idea @lbendlin! Unfortunately since the Power BI ideas portal started linking idea submissions accounts with work accounts, I'm not able to vote on ideas and up-vote other ideas. I'm working with our admins to have access but in the meantime, would you feel comfortable adding an idea?

 

As for the example, were you able to access the file in the link in my original post? My understanding is that file archivers will automatically use ZIP64 for files that exceed the size limits for ZIP. If you have a large .csv file handy, you can try compressing it and see if the properties show ZIP64. Here's an example of the properties from the file I shared the link to:

 

ZIP64 properites.png

 

My machine has 8GB of RAM, which hasn't been an issue for Power BI Desktop, even for datasets with multiple large .csv file so I expect/hope it will be sufficient. Ultimately I'm building a dataflow with this query so I'm not sure if I'll run into an issues there.

 

Thanks,

Rob

here's how the central directory looks like

 

lbendlin_0-1601946101021.png

It starts at 61c63f46

lbendlin_1-1601946185193.png

Note that we are still talking unsigned 32 bit integers here. The only indication that this may be ZIP64 seems to be the FF instead of 00. The bad news is that all these FFFFFFFF are in the place where the file sizes are supposed to be. The actual file sizes (compressed and uncompressed) are nowhere to be seen.

 

Now, the compressed file size can be calculated under the assumption that the ZIP only includes one file. This would make the extraction possible if the ZIP file stays within the 32 bit limits (4GiB)  but it would likely break down for files larger than that.

 

I updated my blog entry with a modified function that can read your sample file successfully.

 

https://community.powerbi.com/t5/Community-Blog/Working-With-Zip-Files-in-Power-Query/bc-p/1414143#M...

 

 

It worked @lbendlin you're a genius! Thanks so much for your help!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors