20240012827. CLEANING AND ORGANIZING SCHEMALESS SEMI-STRUCTURED DATA FOR EXTRACT, TRANSFORM, AND LOAD PROCESSING simplified abstract (Capital One Services, LLC)

From WikiPatents
Jump to navigation Jump to search

CLEANING AND ORGANIZING SCHEMALESS SEMI-STRUCTURED DATA FOR EXTRACT, TRANSFORM, AND LOAD PROCESSING

Organization Name

Capital One Services, LLC

Inventor(s)

Venkateshwara Mudumba of Glen Allen VA (US)

Govind Pande of Chantilly VA (US)

Angshuman Bhattacharya of Glenn Allen VA (US)

CLEANING AND ORGANIZING SCHEMALESS SEMI-STRUCTURED DATA FOR EXTRACT, TRANSFORM, AND LOAD PROCESSING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240012827 titled 'CLEANING AND ORGANIZING SCHEMALESS SEMI-STRUCTURED DATA FOR EXTRACT, TRANSFORM, AND LOAD PROCESSING

Simplified Explanation

The abstract of the patent application describes a system that obtains event data from a data repository and infers an event-specific schema based on common attributes among the events. The system then stores the event data in a second data repository, partitioned based on the organizational structure defined by the event-specific schema. A third dataset is generated, which includes a subset of the event data that satisfies registration parameters for an ETL use case. This third dataset is provided to an ETL system for processing.

  • The system obtains event data from a data repository.
  • It infers an event-specific schema based on common attributes among the events.
  • The event data is stored in a second data repository, partitioned based on the event-specific schema.
  • A third dataset is generated, which includes a subset of the event data satisfying registration parameters for an ETL use case.
  • The third dataset is provided to an ETL system for processing.

Potential applications of this technology:

  • Data analytics: The system can be used to analyze event data and extract valuable insights.
  • Data organization: The event-specific schema helps in organizing and structuring the event data for efficient storage and retrieval.
  • ETL processing: The system facilitates the extraction, transformation, and loading of the event data for various use cases.

Problems solved by this technology:

  • Schema inference: The system automates the process of inferring an event-specific schema, saving time and effort.
  • Data partitioning: The system partitions the event data based on the organizational structure defined by the schema, making it easier to manage and process.
  • Subset selection: The system generates a subset of the event data that satisfies specific registration parameters, allowing for targeted processing.

Benefits of this technology:

  • Improved data analysis: The event-specific schema helps in identifying and analyzing common attributes among events, leading to better insights.
  • Efficient data storage: The partitioning of event data based on the schema allows for optimized storage and retrieval.
  • Streamlined ETL processing: The system provides a curated dataset to the ETL system, reducing processing time and improving efficiency.


Original Abstract Submitted

in some implementations, a system may obtain, from a first data repository, a first dataset that includes event data associated with a generic schema. the system may infer an event-specific schema that defines an organizational structure for the event data based on common attributes identified among a plurality of events included in the event data using one or more data analytics functions. the system may store, in a second data repository, a second dataset in which the event data is partitioned based on the organizational structure defined by the event-specific schema. the system may generate a third dataset that includes a subset of the event data included in the second dataset that satisfies one or more registration parameters related to an extract, transform, load (etl) use case. the system may provide the third dataset to an etl system configured to process the third dataset based on the etl use case.