Skip to content

fix: Make reproducable zips in exports#3491

Merged
another-rex merged 3 commits into
masterfrom
reproducable-zip
May 26, 2025
Merged

fix: Make reproducable zips in exports#3491
another-rex merged 3 commits into
masterfrom
reproducable-zip

Conversation

@another-rex
Copy link
Copy Markdown
Contributor

Set the modification date on the OSV records written to disk to be of the Modification time of the record.

Partially resolves: #3365

@jess-lowe
Copy link
Copy Markdown
Contributor

Just to confirm - does this mean that the modified date on the OSV final record is the last-modified of the Datastore entry (when we update it) or does it mean the modified date of the original record (when the data source updates it), as with this issue: #3451?

@another-rex
Copy link
Copy Markdown
Contributor Author

Modified is the last-modified datastore entry.

hogo6002
hogo6002 previously approved these changes May 26, 2025
Copy link
Copy Markdown
Contributor

@hogo6002 hogo6002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@another-rex another-rex merged commit 8519756 into master May 26, 2025
13 checks passed
@another-rex another-rex deleted the reproducable-zip branch May 26, 2025 05:12
SanskaarUndale21 added a commit to SanskaarUndale21/osv.dev that referenced this pull request May 24, 2026
Before this change, the exporter uploaded every output file on every
run regardless of whether the content had changed. Since all.zip and
other outputs are now reproducible (google#3491), unchanged files would
accumulate redundant object generations in the bucket, making it
harder for downstream consumers to detect real updates.

The writer now calls ReadObjectAttrs before each GCS write and
computes the CRC32C of the outgoing data using the Castagnoli
polynomial (the same algorithm GCS uses for its stored checksums).
If the checksums match, the upload is skipped and an info log is
emitted. New objects (ErrNotFound) and any transient attr-read
errors fall through to the normal upload path so the exporter
remains correct under all conditions.

Tests verify the three cases: same content is skipped, changed
content is uploaded, and brand-new objects are always created.

Fixes google#3513
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid download archive if there is no updates

3 participants