Patent Application 17805674 - Hierarchical Gradient Averaging For Enforcing

Title: Hierarchical Gradient Averaging For Enforcing Subject Level Privacy

Application Information

Invention Title: Hierarchical Gradient Averaging For Enforcing Subject Level Privacy
Application Number: 17805674
Submission Date: 2025-05-14T00:00:00.000Z
Effective Filing Date: 2022-06-06T00:00:00.000Z
Filing Date: 2022-06-06T00:00:00.000Z
National Class: 706
National Sub-Class: 012000
Examiner Employee Number: 88484
Art Unit: 2124
Tech Center: 2100

Rejection Summary

102 Rejections: 0
103 Rejections: 2

Cited Patents

No patents were cited in this rejection.

Office Action Text

Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to the abstract idea of a mathematical relationship without significantly more. The claims recite training a machine learning algorithm with gradient descent. This judicial exception is not integrated into a practical application because the additional limitations of a processor and memory are generic computer parts. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because outside of the mathematical relationship there are no other meaningful limitations.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-5, 8-11 and 14-18 are rejected under 35 U.S.C. 103 as being unpatentable over US 20230351042 A1 to De et al and https://pair-code.github.io/saliency/#home as archived 4/18/2022 (PAIR).
Claims 6-7, 12-13 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over US 20230351042 A1 to De et al, https://pair-code.github.io/saliency/#home as archived 4/18/2022 (PAIR) and Advances and Open Problems in Federated Learning by Kairouz et al.

De teaches claims 1, 8 and 15. A system, comprising:
at least one processor;
a memory, (De para 135 “a central processing unit will receive instructions and data from a read-only memory or a random access memory or both.”) comprising program instructions that when executed by the at least one processor cause the at least one processor to implement a machine learning system, the machine learning system configured to: (De Fig. 2 and 3)
train a machine learning model using gradient descent on a data set comprising a plurality of subjects, wherein individual ones of the plurality of subjects comprise one or more data items, (De para 113 “Training system samples a batch of network inputs from the set of training data (220)…”) and wherein to train the machine learning model, the machine learning system is configured to:
identify a sample of data items from the data set; (De para 113 “Training system samples a batch of network inputs from the set of training data (220)…”)
determine respective gradients for individual data items in the sample of data items; (De fig. 2 [230] determine clipped gradient for each network input in batch of network inputs.)
clip the respective gradients for the individual data items in the sample of data items according to a threshold; (De fig. 2 [230] determine clipped gradient for each network input in batch of network inputs. De para 124 “generates the clipped gradient for the network input by scaling the combined gradient for the network input to cause a norm of the combined gradient for the network input to satisfy a clipping threshold.”)
average the clipped gradients of individual ones of the subjects with the individual data items in the sample of data items; (De para 123 “training system determines the combined gradient for the network input by averaging the gradients determined for the multiple augmented versions of the network input.”)
add a noise value to a sum of the averaged gradients for the individual ones of the subjects; and (De para 115-116 “randomly sampling the noise parameters from a noise distribution (240). In some implementations, the noise distribution includes a Gaussian noise distribution.
[0116]
Training system applies the noise parameters to the clipped gradients for the network inputs in the batch of network inputs (250).”)
De doesn’t average the average noisy gradients.
However, PAIR teaches determine a sample average gradient for the sample of data items from the sum of the noisy averaged gradients with the added noise value divided by a number of data items in the sample of data items. (PAIR “SmoothGrad creates noisy copies of an input image then averages gradients (or another saliency method) with respect to these copies.”)
PAIR, De and the claims are all averaging gradients. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to average the noisy gradients because it “sharpens the resulting saliency map, and removes irrelevant noisy regions.” Id.

De teaches claims 2, 9 and 16. The system of claim 1, wherein the identification of the sample of data items, the determination of the respective gradients, the clip of the respective gradients, the average of the clipped gradients, the addition of the noise value, and the determination of the sample average gradient for the sample of data items is performed as part of one training round, and wherein a number of other training rounds in addition to the one training round are performed as determined according to a privacy budget. (De para 107 “In cases when training system 100 implements a DP learning algorithm, a privacy budget (ε, δ) for the DP learning algorithm may be fixed, e.g., defining a particular target privacy protection for a training dataset 120. In these cases, training system 100 can calibrate one or more of the hyper-parameters within this privacy budget.”)

De teaches claims 3, 10 and 17. The system of claim 1, wherein the noise is Gaussian noise determined for the machine learning system. (De para 115 “randomly sampling the noise parameters from a noise distribution (240). In some implementations, the noise distribution includes a Gaussian noise distribution.”)

De teaches claims 4, 11 and 18. The system of claim 1, wherein the sample is one of a plurality of mini-batches taken from the data set as part of the training, and wherein the identification of the sample of data items, the determination of the respective gradients, the clip of the respective gradients, the average of the clipped gradients, the addition of the noise value, and the determination of the sample average gradient for the sample of data items are performed for other ones of the plurality of mini-batches. (De para 127 “FIG. 5A shows a plot of training and validation dataset accuracy of a Wide-ResNet neural network model (WRN-16-4) versus batch size of batches of training examples.”)

De teaches claims 5 and 14. The system of claim 1, wherein the machine learning model is a non-federated machine learning model. (De doesn’t teach the models are federated, so they are not federated.)

De teaches claims 6, 12 and 19. The system of claim 1, wherein the machine learning system… De doesn’t teach federated learning.
However, Kairouz is a federated model user system, and machine learning system is further configured to:
receive the machine learning model from a federation server; and (Kairouz fig. 1 model deployment, below.)

PNG
media_image1.png
370
682
media_image1.png
Greyscale

return parameter updates to the machine learning model determined from performing the training to the federation server. (Kairouz p. 8 sec. 1.1.2 point 5 “5. Model update: The server locally updates the shared model based on the aggregated update computed from the clients that participated in the current round.”)
The claims, De and Kairouz all train machine learning models. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to federate learning to “mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches.” Id. abs.

De teaches claims 7, 13 and 20. The system of claim 6, wherein the federated model user system is one of a plurality of federated model user systems that received the machine learning model from the federation server, and wherein the data set is one of a plurality of data sets respectively used at the plurality of federated model user systems, wherein at least one of the plurality of subjects has an associated data item at a different one of the plurality of data sets used at a different one of the plurality of federated mode user systems. (Kairouz p. 87 sec. 7.5 first bullet point “How data is generated, pre-processed and labeled. Learning across silos will require data normalization which may be difficult when such data is collected and stored differently (e.g. use of different medical imaging systems, and inconsistencies in labeling procedures, annotations, and storage formats).” This normalization is the “associated data item at a different one of the plurality of data sets…” Because the data in different silos share type and format, they are associated. The silos are the respective data sets used at the respective user systems.

Notice of References cited
US 20220083911 A1 to Flanagan et al’s abstract teaches “a master machine learning model for generating a user recommendation related to use of an application of the user equipment, calculate a model update for the master machine learning model using the master machine learning model and data related to one or more of a user of the user equipment or a user interaction with the user equipment, encode the calculated model update using an ε-differential privacy mechanism and transmit the ε-differential privacy encoded model update.” This is similar to the federated learning claimed by applicant.
Federated Machine Learning: Concept and Applications by Yang et al teaches federated leaning with a privacy budget as well.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Austin Hicks whose telephone number is (571)270-3377. The examiner can normally be reached Monday - Thursday 8-4 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/AUSTIN HICKS/Primary Examiner, Art Unit 2124

Patent Application 17805674 - Hierarchical Gradient Averaging For Enforcing - Rejection

Patent Application 17805674 - Hierarchical Gradient Averaging For Enforcing

Application Information

Rejection Summary

Cited Patents

Office Action Text

(Ad) Transform your business with AI in minutes, not months

Transform your business with AI in minutes, not months