US Patent Application 18345657. Systems and Methods for Anonymizing Large Scale Datasets simplified abstract

From WikiPatents
Jump to navigation Jump to search

Systems and Methods for Anonymizing Large Scale Datasets

Organization Name

Google LLC


Inventor(s)

Alessandro Epasto of New York NY (US)

Hossein Esfandiari of Jersey City NJ (US)

Vahab Seyed Mirrokni of Hoboken NJ (US)

Andres Munoz Medina of Brooklyn NY (US)

Umar Syed of Rahway NJ (US)

Sergei Vassilvitskii of New York NY (US)

Systems and Methods for Anonymizing Large Scale Datasets - A simplified explanation of the abstract

This abstract first appeared for US patent application 18345657 titled 'Systems and Methods for Anonymizing Large Scale Datasets

Simplified Explanation

The abstract describes a computer-implemented method for anonymizing a dataset to protect privacy.

  • The method involves obtaining a dataset with data about multiple entities and at least one data item for each entity.
  • The entities are clustered into groups called entity clusters.
  • A majority condition is determined for each entity cluster, indicating that a data item is associated with a majority of the entities in the cluster.
  • The data item is then assigned to the entities in an anonymized dataset based on the majority condition.

This method aims to provide privacy guarantees for all columns in the dataset by anonymizing the data items and ensuring that they are assigned to the entities in a way that protects their identities.


Original Abstract Submitted

A computer-implemented method for k-anonymizing a dataset to provide privacy guarantees for all columns in the dataset can include obtaining, by a computing system including one or more computing devices, a dataset comprising data indicative of a plurality of entities and at least one data item respective to at least one of the plurality of entities. The computer-implemented method can include clustering, by the computing system, the plurality of entities into at least one entity cluster. The computer-implemented method can include determining, by the computing system, a majority condition for the at least one entity cluster, the majority condition indicating that the at least one data item is respective to at least a majority of the plurality of entities. The computer-implemented method can include assigning, by the computing system, the at least one data item to the plurality of entities in an anonymized dataset based at least in part on the majority condition.