20240020201. GENERATING DIFFS BETWEEN ARCHIVES USING A GENERIC GRAMMAR simplified abstract (MICROSOFT TECHNOLOGY LICENSING, LLC)

From WikiPatents
Jump to navigation Jump to search

GENERATING DIFFS BETWEEN ARCHIVES USING A GENERIC GRAMMAR

Organization Name

MICROSOFT TECHNOLOGY LICENSING, LLC

Inventor(s)

Mark W. Zagorski of Redmond WA (US)

Mario Henrique Santos Da Silva of Redmond WA (US)

Elijah Wigmore of Seattle WA (US)

GENERATING DIFFS BETWEEN ARCHIVES USING A GENERIC GRAMMAR - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240020201 titled 'GENERATING DIFFS BETWEEN ARCHIVES USING A GENERIC GRAMMAR

Simplified Explanation

The techniques disclosed in this patent application aim to generate smaller diff archives, reducing bandwidth, storage, and processing costs for storing or transmitting archives. This is achieved by utilizing specific knowledge of the structure of the source and target archives to generate a diff archive.

  • The patent application describes the use of an archive-specific tokenization engine to identify data chunks and payload files within each archive.
  • Recipes for generating payload files from data chunks and data chunks from payload files are identified and stored in a manifest file.
  • The manifest file also includes recipes for decompressing files, concatenating data chunks, and generating binary deltas to convert older versions of a file into a newer version.
  • These recipes are composed by replacing recipe inputs with the outputs of other recipes, allowing for efficient generation of the target archive.
  • Composite recipes use inline data and data obtained from a copy of the source archive to reconstitute the target archive.

Potential applications of this technology:

  • Data storage and archiving systems can benefit from the reduced size of diff archives, saving storage space and reducing the need for bandwidth when transmitting archives.
  • Software version control systems can utilize this technology to efficiently store and transmit updates and patches, reducing the time and resources required for software updates.

Problems solved by this technology:

  • Large archives can be challenging to store and transmit due to their size, requiring significant bandwidth and storage resources. The techniques described in this patent application address this problem by generating smaller diff archives.
  • Updating files or software versions can be time-consuming and resource-intensive. The use of binary deltas and efficient recipes allows for the generation of newer versions from older versions, reducing the need to transmit or store complete files.

Benefits of this technology:

  • Reduced bandwidth and storage requirements: The generation of smaller diff archives reduces the amount of data that needs to be transmitted or stored, saving resources.
  • Cost savings: By reducing the need for bandwidth and storage, the costs associated with storing or transmitting archives are reduced.
  • Efficient updates and patches: The ability to generate newer versions from older versions using binary deltas and recipes allows for efficient updates and patches, saving time and resources in software version control systems.


Original Abstract Submitted

the techniques disclosed herein generate minimally sized diff archives. as a result, bandwidth, storage, and processing costs of storing or transmitting an archive are reduced. in some configurations, a diff archive is generated utilizing specific knowledge of the structure of the source and target archives it is derived from. specifically, an archive-specific tokenization engine identifies data chunks and payload files within each archive. recipes for generating payload files from data chunks and data chunks from payload files are identified and stored in a manifest file, as are recipes for decompressing files, concatenating data chunks, and generating binary deltas that convert older versions of a file into a newer version. these recipes are composed by replacing recipe inputs with the outputs of other recipes. composite recipes use inline data and data obtained from a copy of the source archive to reconstitute the target archive.