US Patent Application 17827795. DATA LAKE WITH TRANSACTIONAL SEMANTICS simplified abstract

From WikiPatents
Jump to navigation Jump to search

DATA LAKE WITH TRANSACTIONAL SEMANTICS

Organization Name

VMware, Inc.

Inventor(s)

Christos Karamanolis of Los Gatos CA (US)

Abhishek Gupta of San Jose CA (US)

Richard P. Spillane of Palo Alto CA (US)

Marin Nozhchev of Sofia (BG)

DATA LAKE WITH TRANSACTIONAL SEMANTICS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17827795 titled 'DATA LAKE WITH TRANSACTIONAL SEMANTICS

Simplified Explanation

The abstract describes a version control interface that allows access to a data lake with transactional capabilities.

  • The interface generates multiple tables for data objects stored in the data lake.
  • Each table has name fields and maps columns or rows to the data objects.
  • Transactions can read and write data objects across multiple tables, ensuring atomicity, consistency, isolation, and durability.
  • Incomplete transactions are accumulated until a complete transaction message is received.
  • Upon receiving the complete transaction message, the master branch is updated to reference the data objects.
  • Tables can be grouped into data groups to improve the speed of master branch updates.


Original Abstract Submitted

A version control interface provides for accessing a data lake with transactional semantics. Examples generate a plurality of tables for data objects stored in the data lake. The tables each comprise a set of name fields and map a space of columns or rows to a set of the data objects. Transactions read and write data objects and may span a plurality of tables with properties of atomicity, consistency, isolation, durability (ACID). Performing the transaction comprises: accumulating transaction-incomplete messages, indicating that the transaction is incomplete, until a transaction-complete message is received, indicating that the transaction is complete. Upon this occurring, a master branch is updated to reference the data objects according to the transaction-incomplete messages and the transaction-complete message. Tables may be grouped into data groups that provide atomicity boundaries so that different groups may be served by different master branches, thereby improving the speed of master branch updates.