Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Failure to extract file from ZIP

I know there has been several attempts to address this previously, but so far without resolution (as far as I am aware).

So I am providing more detailed information into the zip file properties, in the hope that people much smarter than me, will be able to identify the root-cause of the problem. 

 

Background

  1. Custom function to extract files from Zip, successfully created in line with Mark White's BI Blog
  2. When trying to access the binary data of the CSV file, Power Query throws the error "We didn’t recognize the format of your first file (). Please filter the list of files so it contains only supported types (Text, .csv, Excel workbooks etc.) and try again"

 

New Information

  • The error mentioned above in point #2, only occurs when I am using ZIP files which havw been generated by SAP BI.
  • If I extract the CSV file from the zip, and manually create a new zip file containing the same CSV, then Power Query is able to successfully access the binary data using the same custom function created above in point #1.

This indicates that the Power Query error is being triggered by either the method used to compress the CSV file or potentially the code for the custom function not handling all compression formats (??)

 

ZIP File Details

  • The zip file without the problem has been named works.zip and the one which causes the error is doesnotwork.zip
  • Here you can view the properties of the compressed CSV files inside their respective ZIPs (works and doesnotwork), the differences are highlighted in yellow on the "doesnotwork" image
  • Here are the properties for the zip files themselves, again with the differences highlighted in yellow

I'm hoping this can shed some light on this problem & maybe generate a solution.

Thanks in advance for anyone spending time looking into this, and if it's useful here are the query and custom function in excel

2 ACCEPTED SOLUTIONS
artemus
Employee
Employee

Here is a version that should work with all of your ZIP files. It ignores the local file entries and grabs the data from the central directory instead.  Please test it out.

 

// expects full path to the ZIP file, only extracts the first data file after getting its size from the central directory
// https://en.wikipedia.org/wiki/Zip_(file_format)#Structure
(ZIPFile) =>
let
    //read the entire ZIP file into memory - we'll use it often so this is worth it
    Source = Binary.Buffer(File.Contents(ZIPFile)),
    // get the full size of the ZIP file
    Size = Binary.Length(Source),
    //Find the start of the central directory at the sixth to last byte
    Directory = BinaryFormat.Record([ 
                    MiscHeader=BinaryFormat.Binary(Size-6), 
                    Start=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian)
            ]) ,
    Start = Directory(Source)[Start],
    //find the first entry in the directory and get the compressed file size
    FirstDirectoryEntry = BinaryFormat.Record([ 
                    MiscHeader=BinaryFormat.Binary(Start+20), 
                    FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
                    UnCompressedFileSize=BinaryFormat.Binary(4),
                    FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
                    ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
            ]) ,
    //figure ou where the raw data starts            
    Offset = 30+FirstDirectoryEntry(Source)[FileNameLen]+FirstDirectoryEntry(Source)[ExtrasLen],
    Compressed = FirstDirectoryEntry(Source)[FileSize]+1,
    //get the raw data of the compressed file
    Raw = BinaryFormat.Record([
                    Header=BinaryFormat.Binary(Offset), 
                    Data=BinaryFormat.Binary(Compressed)
            ]) 
    // unzip it
in 
    Binary.Decompress(Raw(Source)[Data], Compression.Deflate)

 

Name it  unzip and and then call it like this

 

let
    Source = unzip("C:\downloads\doesnotwork.zip"),
    #"Imported CSV" = Csv.Document(Source,[Delimiter=",",  Encoding=1252, QuoteStyle=QuoteStyle.None])
in
    #"Imported CSV"

View solution in original post

20 REPLIES 20

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors