Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I know there has been several attempts to address this previously, but so far without resolution (as far as I am aware).
So I am providing more detailed information into the zip file properties, in the hope that people much smarter than me, will be able to identify the root-cause of the problem.
Background
New Information
This indicates that the Power Query error is being triggered by either the method used to compress the CSV file or potentially the code for the custom function not handling all compression formats (??)
ZIP File Details
I'm hoping this can shed some light on this problem & maybe generate a solution.
Thanks in advance for anyone spending time looking into this, and if it's useful here are the query and custom function in excel
Solved! Go to Solution.
Here is a version that should work with all of your ZIP files. It ignores the local file entries and grabs the data from the central directory instead. Please test it out.
// expects full path to the ZIP file, only extracts the first data file after getting its size from the central directory
// https://en.wikipedia.org/wiki/Zip_(file_format)#Structure
(ZIPFile) =>
let
//read the entire ZIP file into memory - we'll use it often so this is worth it
Source = Binary.Buffer(File.Contents(ZIPFile)),
// get the full size of the ZIP file
Size = Binary.Length(Source),
//Find the start of the central directory at the sixth to last byte
Directory = BinaryFormat.Record([
MiscHeader=BinaryFormat.Binary(Size-6),
Start=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian)
]) ,
Start = Directory(Source)[Start],
//find the first entry in the directory and get the compressed file size
FirstDirectoryEntry = BinaryFormat.Record([
MiscHeader=BinaryFormat.Binary(Start+20),
FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
UnCompressedFileSize=BinaryFormat.Binary(4),
FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
]) ,
//figure ou where the raw data starts
Offset = 30+FirstDirectoryEntry(Source)[FileNameLen]+FirstDirectoryEntry(Source)[ExtrasLen],
Compressed = FirstDirectoryEntry(Source)[FileSize]+1,
//get the raw data of the compressed file
Raw = BinaryFormat.Record([
Header=BinaryFormat.Binary(Offset),
Data=BinaryFormat.Binary(Compressed)
])
// unzip it
in
Binary.Decompress(Raw(Source)[Data], Compression.Deflate)
Name it unzip and and then call it like this
let
Source = unzip("C:\downloads\doesnotwork.zip"),
#"Imported CSV" = Csv.Document(Source,[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None])
in
#"Imported CSV"
Your doesnotwork.zip file does not adhere to the "old" ZIP standard. Both the compressed file size and the uncompressed file size of the "Report 1.csv" file are zero in the local header. That behavior is normally reserved for files over 4GB.
That can be handled by the unzip tools (which use the central directory at the end), but your M code (which doesn't use the directory - it uses local file headers ) expects actual data there. You need to modify the M code to handle this different situation.
https://en.wikipedia.org/wiki/Zip_(file_format)#Structure
This gets even more exciting:
Here is the hex dump of the end of your zip file.
0x02014b50 marks the beginning of the official copy of the central directory, as indicated by the last few bytes 0x00761ba9 (which point to the start of the directory)
then at 0x00761bbd you can see the compressed size of the first file (0x00761b6f) and the uncompressed size (0x04eddec7) which translates to 82697927 bytes - exactly the uncompressed size of report 1.csv
That means you need to provide the 0x00761b6f value to the decompression function.
For reference here is my current code, with the hardcoded size of that one ZIP file so you can see that it works. Obviously this code needs work to properly handle cases where the file size is reported as zero
// credits: http://www.excelandpowerbi.com/?p=155
// http://sql10.blogspot.com/2016/06/reading-zip-files-in-powerquery-m.html
let
Source = File.Contents("C:\downloads\doesnotwork.zip"),
//define function
Decompress = (ZIPFile) =>
let
//describe ZIP file format
MyBinaryFormat = BinaryFormat.Record([
MiscHeader=BinaryFormat.Binary(18),
FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
UnCompressedFileSize=BinaryFormat.Binary(4),
FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
]) ,
// how many bytes to get
MyCompressedFileSizer = MyBinaryFormat(ZIPFile)[FileSize]+1 ,
MyCompressedFileSize = 7740272 ,
// impacts offset
MyFileNameLen = MyBinaryFormat(ZIPFile)[FileNameLen],
MyExtrasLen = MyBinaryFormat(ZIPFile)[ExtrasLen] ,
// describe where the actual data starts and how long it is
MyBinaryFormat2 = BinaryFormat.Record([
Header=BinaryFormat.Binary(30+MyFileNameLen+MyExtrasLen),
Data=BinaryFormat.Binary(MyCompressedFileSize)
]) ,
// grab the data and deflate it
DecompressData = Binary.Decompress(MyBinaryFormat2(ZIPFile)[Data], Compression.Deflate)
in
//return deflated binary data
DecompressData,
// call the function with the path to the ZIP file. Only extracts the first file in the ZIP, regardless of name
MyData = Decompress(Source)
in MyData
Here is a version that should work with all of your ZIP files. It ignores the local file entries and grabs the data from the central directory instead. Please test it out.
// expects full path to the ZIP file, only extracts the first data file after getting its size from the central directory
// https://en.wikipedia.org/wiki/Zip_(file_format)#Structure
(ZIPFile) =>
let
//read the entire ZIP file into memory - we'll use it often so this is worth it
Source = Binary.Buffer(File.Contents(ZIPFile)),
// get the full size of the ZIP file
Size = Binary.Length(Source),
//Find the start of the central directory at the sixth to last byte
Directory = BinaryFormat.Record([
MiscHeader=BinaryFormat.Binary(Size-6),
Start=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian)
]) ,
Start = Directory(Source)[Start],
//find the first entry in the directory and get the compressed file size
FirstDirectoryEntry = BinaryFormat.Record([
MiscHeader=BinaryFormat.Binary(Start+20),
FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
UnCompressedFileSize=BinaryFormat.Binary(4),
FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
]) ,
//figure ou where the raw data starts
Offset = 30+FirstDirectoryEntry(Source)[FileNameLen]+FirstDirectoryEntry(Source)[ExtrasLen],
Compressed = FirstDirectoryEntry(Source)[FileSize]+1,
//get the raw data of the compressed file
Raw = BinaryFormat.Record([
Header=BinaryFormat.Binary(Offset),
Data=BinaryFormat.Binary(Compressed)
])
// unzip it
in
Binary.Decompress(Raw(Source)[Data], Compression.Deflate)
Name it unzip and and then call it like this
let
Source = unzip("C:\downloads\doesnotwork.zip"),
#"Imported CSV" = Csv.Document(Source,[Delimiter=",", Encoding=1252, QuoteStyle=QuoteStyle.None])
in
#"Imported CSV"
@lbendlin firstly thanks for your efforts & excellent analysis, I really appreacite it.
I have tried your code, but at the moment it is only returning the first column from the zip file:
Remove and reapply the csv import step.
Sorry, but I don't understand??
the .Buffer functions load the data once into memory and avoid reloading it over and over again (as is done in the original code)
that part I understand.
What I don't understand is what you meant by "Remove and reapply the csv import step" or how to get all the columns from the zip
I meant you should remove the CSV step from my last instructions. Try this instead. It will result in a CSV icon that you can then double click.
let
Source = unzip("C:\downloads\doesnotwork.zip")
in
Source
@lbendlin sorry, I'm back on this topic again 🙄
How would you modify your code to handle mutilple CSV files inside a ZIP?
Currently I always get the first CSV.
Thank you
@lbendlin thanks for providing a solution to this problem. I'm running into a problem and I can't figure out what's going on. When I try to invoke the custom Unzip function, I receive the following error:
An error occurred in the " query. Expression.Error: We cannot convert a value of type Binary to type Text.
Details:
Value=[Binary]
Type=[Type]
Any ideas? I should note that I tried Mark White's solution first (http://sql10.blogspot.com/2016/06/reading-zip-files-in-powerquery-m.html) but I ran into issues with that solution as well.
Thanks in advance for your help!
You may want to share some more details. I also wrote a blog entry on this very topic if you are interested.
Thanks @lbendlin, I'd love to see your blog. Can you please share the link?
I tried using Mark White's solution to access this zip file but I receive the following error:
His custom function works for other zip files on the same website (using the Web.Contents function as the source) however it doesn't work on the zip file linked above. My guess is that the file size is too large (1.52 GB compressed, >10 GB uncompressed). The .csv in the .zip folder is so large that Zip64 is used. Perhaps that's why Mark's solution doesn't work.
I tried your function but when I invoke it, Power BI gives me the following error message:
I'm not familiar enough with M code to find out what's going wrong in your function. I should note that I receive the same error for all zip files on the website linked above, including the smaller zip files not using Zip64.
Thanks!
Rob
Here's the blog
https://community.powerbi.com/t5/Community-Blog/Working-With-Zip-Files-in-Power-Query/ba-p/1190186
but your file sizes may be the actual issue. I only tested with files up to 600 MB.
Thanks @lbendlin, your blog post is really helpful!
Next step is to figure out how to modify the function to accomodate larger file sizes using Zip64 😀
Ah ok, apologies.
Confirm all is working as expected.
Thanks again for your time & effort on this 👍
@artemus thank you for providing the complete structural solution. I have added a sprinkle of Binary.Buffer to it to make it a bit faster.
@artemus I hadn't seen that post, but you are right, it does work! 🙂
Thanks for your help
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.