17457301. Data Merging in Distributed Computing System simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)
Contents
Data Merging in Distributed Computing System
Organization Name
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor(s)
Data Merging in Distributed Computing System - A simplified explanation of the abstract
This abstract first appeared for US patent application 17457301 titled 'Data Merging in Distributed Computing System
Simplified Explanation
The abstract describes a computer implemented method for managing datasets for a histogram. The method uses multiple processor units to determine the appropriate span for bins containing datapoints in different datasets. The span is determined based on the distribution of datapoints and a desired number of bins. The processor units adjust the span for the bins in one dataset to match the span of the bins in another dataset. The datapoints from the two datasets are then merged to form a merged dataset for the histogram.
- The method uses multiple processor units to manage datasets for a histogram.
- The processor units determine the appropriate span for bins containing datapoints in different datasets.
- The span is determined based on the distribution of datapoints and a desired number of bins.
- The processor units adjust the span for the bins in one dataset to match the span of the bins in another dataset.
- The datapoints from the two datasets are merged to form a merged dataset for the histogram.
Potential Applications
This technology can be applied in various fields where histograms are used for data analysis, such as:
- Data visualization and analysis tools
- Statistical analysis software
- Machine learning algorithms
- Data mining applications
Problems Solved
The method solves the following problems:
- Efficient management of datasets for histograms
- Ensuring consistent bin spans across different datasets
- Merging datasets for accurate histogram representation
Benefits
The benefits of this technology include:
- Improved accuracy and consistency in histogram representation
- Efficient utilization of processor units for managing datasets
- Enhanced data analysis capabilities in various applications
Original Abstract Submitted
A computer implemented method for managing datasets for a histogram. The method uses a number of processor units to determine a first span for first bins containing first datapoints in a first dataset in the datasets. The first span is determined based a distribution of the first datapoints in the first dataset and a desired number of bins. The number of processor units adjusts a second span for second bins containing second datapoints in a second dataset in the datasets to form an adjusted span that matches the first span for the first bins. The number of processor units merges the first datapoints in the first bins having the first span with the second datapoints in the second bins having the adjusted span to form a merged dataset for the histogram.