Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I'm trying to decompress a zip. It can contain one or more txt files. In this example I have created two with different names but the exact same content (for testing). My problem is that the first will always show up as null whereas the other shows as binary. When I run it containing one txt file, that one shows up as null. How do I get it to show/import the data and not show null? I'm using the code from Mark Whites Blog
For null issue, use this code
// Function build for decompressing a Zip file even if the filelength is missing in the localheader
// More info on zip files here: https://en.wikipedia.org/wiki/ZIP_(file_format)
// Known limitation: If there is a comment appended to the central header then this function will fail. You can use a hex editor to find these comments at the end of the file wicht will be just readable text
//Credit - https://github.com/Michael19842/PowerBiFunctions/blob/main/ZipFile/Unzip.m
let
Source = File.Contents("C:\Users\Downloads\fo16AUG2022bhav.csv.zip"),
UnzipContents=(ZipFile as binary) =>
let
//Load the file into a buffer
ZipFileBuffer = Binary.Buffer(ZipFile),
ZipFileSize = Binary.Length(ZipFileBuffer),
//Constant values used in the query
CentralHeaderSignature = 0x02014b50,
CentralHeaderSize = 42,
LocalHeaderSize = 30,
// Predefined byteformats that are used many times over
Unsigned16BitLittleIEndian =
BinaryFormat.ByteOrder(
BinaryFormat.UnsignedInteger16,
ByteOrder.LittleEndian
),
Unsigned32BitLittleIEndian =
BinaryFormat.ByteOrder(
BinaryFormat.UnsignedInteger32,
ByteOrder.LittleEndian
),
// Definition of central directory header
CentralDirectoryHeader =
BinaryFormat.Record(
[
Version = Unsigned16BitLittleIEndian,
VersionNeeded = Unsigned16BitLittleIEndian,
GeneralPurposeFlag = Unsigned16BitLittleIEndian,
CompressionMethod = Unsigned16BitLittleIEndian,
LastModifiedTime = Unsigned16BitLittleIEndian,
LastModifiedDate = Unsigned16BitLittleIEndian,
CRC32 = Unsigned32BitLittleIEndian,
CompressedSize = Unsigned32BitLittleIEndian,
UncompressedSize = Unsigned32BitLittleIEndian,
FileNameLength = Unsigned16BitLittleIEndian,
ExtrasLength = Unsigned16BitLittleIEndian,
FileCommentLenght = Unsigned16BitLittleIEndian,
DiskNumberStarts = Unsigned16BitLittleIEndian,
InternalFileAttributes = Unsigned16BitLittleIEndian,
EnternalFileAttributes = Unsigned32BitLittleIEndian,
LocalHeaderOffset = Unsigned32BitLittleIEndian
]
),
// Definition of the end of central directory record
EndOfCentralDirectoryRecord =
BinaryFormat.Record(
[
RestOfFile = BinaryFormat.Binary(ZipFileSize - 22),
EOCDsignature = Unsigned32BitLittleIEndian,
NumberOfThisDisk = Unsigned16BitLittleIEndian,
DiskWhereCentralDirectoryStarts = Unsigned16BitLittleIEndian,
NumberOfRecordsOnThisDisk = Unsigned16BitLittleIEndian,
TotalNumberOfRecords = Unsigned16BitLittleIEndian,
CentralDirectorySize = Unsigned32BitLittleIEndian,
OffsetToStart = Unsigned32BitLittleIEndian
]
),
//Formatter used for building a table of all files in te central directory
CentralHeaderFormatter =
BinaryFormat.Choice(
Unsigned32BitLittleIEndian,
// Should contain the signature
each
if _ <> CentralHeaderSignature // Test if the signature is not there
then
BinaryFormat.Record(
[
LocalHeaderOffset = null,
CompressedSize = null,
FileNameLength = null,
HeaderSize = null,
IsValid = false,
Filename = null
]
)
// if so create a dummy entry
else
BinaryFormat.Choice(
//Catch the staticly sized part of the central header
BinaryFormat.Binary(CentralHeaderSize),
//Create a record containing the files size, offset(of the local header), name, etc..
each
BinaryFormat.Record(
[
LocalHeaderOffset = CentralDirectoryHeader(_)[LocalHeaderOffset],
CompressedSize = CentralDirectoryHeader(_)[CompressedSize],
FileNameLength = CentralDirectoryHeader(_)[FileNameLength],
HeaderSize =
LocalHeaderSize
+ CentralDirectoryHeader(_)[FileNameLength]
+ CentralDirectoryHeader(_)[ExtrasLength],
IsValid = true,
Filename = BinaryFormat.Text(CentralDirectoryHeader(_)[FileNameLength])
]
),
type binary
)
),
//Get a record of the end of central directory, this contains the offset of the central header so we can itterate from that position
EOCDR = EndOfCentralDirectoryRecord(ZipFileBuffer),
//Get the central directory as a binary extract
CentralDirectory =
Binary.Range(
ZipFileBuffer,
EOCDR[OffsetToStart]
),
//A list formatter for the central directory
CentralDirectoryFormatter =
BinaryFormat.List(
CentralHeaderFormatter,
each _[IsValid] = true
),
//Get a Table from Records containing the file info extracted from the central directory
FilesTable =
Table.FromRecords(
List.RemoveLastN(
CentralDirectoryFormatter(CentralDirectory),
1
)
),
//Add the binary to the table and decompress it
ReturnValue =
Table.AddColumn(
FilesTable,
"Content",
each
Binary.Decompress(
Binary.Range(
ZipFileBuffer,
[LocalHeaderOffset] + [HeaderSize],
[CompressedSize]
),
Compression.Deflate
)
)
in
ReturnValue,
Files = UnzipContents(Source),
Content = Files{0}[Content]
in
Content
If I'm using a portal like sharepoint, how would I go about updating the first line of the source? When changing the link it's either too long or error says must be a valid absolute path.
Can you first try this with a local zip file and see whether the code is working or not for you?
I get: "DataFormat.Error: Failed to construct a huffman tree using the length array. The stream might be corrupted."
For what it's worth the data is double piped delimited.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
101 | |
50 | |
19 | |
12 | |
11 |