Skip to content

Different binary produced when producing ZIP to Azure Blob Stream #744

@carlreid

Description

@carlreid

We found that our ZIP files were not being opened correctly on Macs, producing an error with ditto:
ditto: Couldn't read pkzip signature.

After further investigation, it seems that ZIP files produced to a local file (which can be opened on a Mac) are different to that produced to a stream that is persisting to Azure Blob Store. However, I also tested simply opening a file stream, and using the blob stream to persist it, which resulted in the same binary. I think this indicates that something with the zipping process is causing the problem.

From a binary comparison, it seems the differences are in the start/end of the files, though I am note sure which parts are significant, or show what could be going wrong.
On left is file persisted via Blob stream. Right is locally to disk.
Start:
image
End:
image

You should be able to reproduce the issue with the following code. You can uncomment the various sections if you wish to write a local zip, or use a local file to upload to a blob store. Otherwise the uncommented code should produce a ZIP that would be invalid when opening on a Mac.

using System;
using System.Net.Http;
using System.Threading.Tasks;
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Specialized;
using ICSharpCode.SharpZipLib.Zip;

internal class CreateZipFile
{
	public static void Main(string[] args)
	{
		MainAsync(args).GetAwaiter().GetResult();
	}

	private static async Task MainAsync(string[] args)
	{
		var httpClient = new HttpClient();

		var _blobServiceClient = new BlobServiceClient("DefaultEndpointsProtocol=https;AccountName=<replace_with_account_name>;AccountKey=<replace_with_a_key>;EndpointSuffix=core.windows.net");
		var _containerClient = _blobServiceClient.GetBlobContainerClient("artifacts");
		var blobName = System.IO.Path.ChangeExtension(Guid.NewGuid().ToString(), "zip");
		var blockClient = _containerClient.GetBlockBlobClient(blobName);
		var artifactUploadStream = blockClient.OpenWrite(overwrite: true);

		try
		{
			string[] filenames = new[] {
				"https://peach.blender.org/wp-content/uploads/bbb-splash.png"
			};

			// Writing an already produced ZIP to blob store results in matching binary.

			//var localFileMs = new MemoryStream(await System.IO.File.ReadAllBytesAsync("C:\\DevTools\\Debug\\o-local.zip"));
			//localFileMs.Position = 0;
			//await localFileMs.CopyToAsync(artifactUploadStream);
			//await artifactUploadStream.FlushAsync();
			//await artifactUploadStream.DisposeAsync();
			//await localFileMs.DisposeAsync();
			//return;

			// Local file writing produces expected ZIP
			//using var zipStream = new ZipOutputStream(System.IO.File.Create("C:\\DevTools\\Debug\\o-local.zip"))
			using var zipStream = new ZipOutputStream(artifactUploadStream)
			{
				IsStreamOwner = true
			};

			var downloadStream = await httpClient.GetStreamAsync("https://peach.blender.org/wp-content/uploads/bbb-splash.png");

			var zipFileEntry = new ZipEntry("big-buck-bunny.png")
			{
				CompressionMethod = CompressionMethod.Deflated,
				DateTime = new DateTime(2021, 7, 13, 4, 28, 44),
			};

			zipStream.PutNextEntry(zipFileEntry);
			await downloadStream.CopyToAsync(zipStream);
			zipStream.CloseEntry();
			zipStream.Finish();
			zipStream.Close();
		}
		catch (Exception ex)
		{
			Console.WriteLine("Exception during processing {0}", ex);
		}
	}
}

Not sure where else I can investigate further, or if this is more an issue with the Blob Block Stream, or even a mix of both?

Edit:
Here are the files to compare:
remote.zip
local.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions