Patent Applications by Databricks, Inc. on January 9th, 2025

Databricks, Inc.: 3 patent applications

Databricks, Inc. has applied for patents in the areas of G06F16/22 (2), G06F16/2453 (2), G06F16/28 (2), G06F16/16 (1), G06F16/13 (1) G06F16/16 (1), G06F16/2246 (1), G06F16/24544 (1)

With keywords such as: data, files, node, tree, table, performing, nodes, resulting, rows, and matching in patent application abstracts.

Patent Applications by Databricks, Inc.

20250013606. DATA FILE CLUSTERING WITH KD-CLASSIFIER TREES_simplified_abstract_(databricks, inc.)

Inventor(s): Prakhar Jain of Sunnyvale CA (US) for databricks, inc., Frederick Ryan Johnson of Orem UT (US) for databricks, inc., Terry Kim of Bellevue WA (US) for databricks, inc., Vijayan Prabhakaran of Los Gatos CA (US) for databricks, inc., Bart Samwel of Oegstgeest (NL) for databricks, inc.

IPC Code(s): G06F16/16, G06F16/13

CPC Code(s): G06F16/16

Abstract: a data processing service generates a data classifier tree for managing data files of a data table. the data classifier tree may be configured as a kd-classifier tree and includes a plurality of nodes and edges. a node of the data classifier tree may represent a splitting condition with respect to key-values for a respective key. a node of the data classifier tree may be associated with one or more data files assigned to the node. the data files assigned to the node each include a subset of records having key-values that satisfy the conditions represented by the node and parent nodes of the node. the data processing service may efficiently cluster the data in the data table while reducing the number of data files that are rewritten when data is modified or added to the data table.

20250013619. DATA FILE CLUSTERING WITH KD-EPSILON TREES_simplified_abstract_(databricks, inc.)

Inventor(s): Prakhar Jain of Sunnyvale CA (US) for databricks, inc., Frederick Ryan Johnson of Otem UT (US) for databricks, inc., Bart Samwel of Oegstgeest (NL) for databricks, inc.

IPC Code(s): G06F16/22, G06F16/2453, G06F16/28

CPC Code(s): G06F16/2246

Abstract: a data tree for managing data files of a data table and performing one or more transaction operations to the data table is described. the data tree is configured as a kd-epsilon tree and includes a plurality of nodes and edges. a node of the data tree may represent a splitting condition with respect to key-values for a respective key. a leaf node of the data tree may correspond to a data file for a data table that includes a subset of records having key-values that satisfy the condition for the node and conditions associated with parent nodes of the node. a parent node may correspond to a file including a buffer that stores changes to data files reachable by this parent node, and also includes dedicated storage to pointers of the child nodes. by using the data tree, the data processing system may efficiently cluster the data in the data table while reducing the number of data files that are rewritten.

20250013644. Efficient Merging of Tabular Data with Post-Processing Compaction_simplified_abstract_(databricks, inc.)

Inventor(s): Bart Samwel of Oegstgeest (NL) for databricks, inc., Tathagata Das of New Haven CT (US) for databricks, inc., Lars Kroll of Almere (NL) for databricks, inc., Yijia Cui of Sunnyvale CA (US) for databricks, inc., Juliusz Sompolski of Amsterdam (NL) for databricks, inc., Tom Van Bussel of Amsterdam (NL) for databricks, inc., Prakhar Jain of Sunnyvale CA (US) for databricks, inc.

IPC Code(s): G06F16/2453, G06F11/34, G06F16/22, G06F16/28

CPC Code(s): G06F16/24544

Abstract: a method, system, and computer system for performing an operation with respect to a target table are disclosed. the method includes performing first and second jobs, obtaining one or more other resulting files based at least in part on unmatched rows, and obtaining a set of processed files based at least in part on performing a post-processing operation with respect to the set of resulting files. the set of processed files has less files than the set of resulting files. performing the first job includes determining a set of matching target table files and storing target table information indicating for each of the set of matching target table files, a particular set of rows having matching rows. performing the second job includes performing a matching action based on matched rows and obtaining the second job resulting file(s).

Databricks, Inc. patent applications on January 9th, 2025