Skip to content

Implement streaming HDiffPatch-style binary differ with memory limit #261

@JusterZhu

Description

@JusterZhu

Summary

Implement a new streaming binary differ inspired by HDiffPatch concepts, replacing BSDIFF's full-memory approach with sliding-window streaming.

Scope

  • \StreamingHdiffDiffer\ implementing \IBinaryDiffer\
  • Content-defined chunking (CDC) with rolling hash for block-level matching
  • Sliding window with configurable memory limit (default 64MB)
  • Two-level matching: fast block match + fine-grained byte adjustment
  • Separate chunk stream for parallelism-ready output

Expected Improvements

  • Memory: O(windowSize) instead of O(fileSize × 17)
  • Supports arbitrarily large files (GB+)
  • Patch size: 30-50% smaller than BSDIFF for binary files (executables, DLLs)
  • Diff speed: 2-5x faster with block-level pre-filtering

Dependencies

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions