17457301. Data Merging in Distributed Computing System simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

Data Merging in Distributed Computing System

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

Xing Wei of Xi'an (CN)

Xiao Bin Sun of Xi'an (CN)

Zhe Shao of Xi'an (CN)

Dong Hai Yu of Xi'an (CN)

Liu Zhen Duo of Xi'an (CN)

Chun Lei Xu of Xi'an (CN)

Data Merging in Distributed Computing System - A simplified explanation of the abstract

This abstract first appeared for US patent application 17457301 titled 'Data Merging in Distributed Computing System

Simplified Explanation

The abstract describes a computer implemented method for managing datasets for a histogram. The method uses multiple processor units to determine the appropriate span for bins containing datapoints in different datasets. The span is determined based on the distribution of datapoints and a desired number of bins. The processor units adjust the span for the bins in one dataset to match the span of the bins in another dataset. The datapoints from the two datasets are then merged to form a merged dataset for the histogram.

  • The method uses multiple processor units to manage datasets for a histogram.
  • The processor units determine the appropriate span for bins containing datapoints in different datasets.
  • The span is determined based on the distribution of datapoints and a desired number of bins.
  • The processor units adjust the span for the bins in one dataset to match the span of the bins in another dataset.
  • The datapoints from the two datasets are merged to form a merged dataset for the histogram.

Potential Applications

This technology can be applied in various fields where histograms are used for data analysis, such as:

  • Data visualization and analysis tools
  • Statistical analysis software
  • Machine learning algorithms
  • Data mining applications

Problems Solved

The method solves the following problems:

  • Efficient management of datasets for histograms
  • Ensuring consistent bin spans across different datasets
  • Merging datasets for accurate histogram representation

Benefits

The benefits of this technology include:

  • Improved accuracy and consistency in histogram representation
  • Efficient utilization of processor units for managing datasets
  • Enhanced data analysis capabilities in various applications


Original Abstract Submitted

A computer implemented method for managing datasets for a histogram. The method uses a number of processor units to determine a first span for first bins containing first datapoints in a first dataset in the datasets. The first span is determined based a distribution of the first datapoints in the first dataset and a desired number of bins. The number of processor units adjusts a second span for second bins containing second datapoints in a second dataset in the datasets to form an adjusted span that matches the first span for the first bins. The number of processor units merges the first datapoints in the first bins having the first span with the second datapoints in the second bins having the adjusted span to form a merged dataset for the histogram.