Patent Application 17514698 - METHOD AND SYSTEM FOR LEARNING REPRESENTATIONS

Title: METHOD AND SYSTEM FOR LEARNING REPRESENTATIONS LESS PRONE TO CATASTROPHIC FORGETTING
Application Information

Invention Title: METHOD AND SYSTEM FOR LEARNING REPRESENTATIONS LESS PRONE TO CATASTROPHIC FORGETTING
Application Number: 17514698
Submission Date: 2025-05-20T00:00:00.000Z
Effective Filing Date: 2021-10-29T00:00:00.000Z
Filing Date: 2021-10-29T00:00:00.000Z
National Class: 706
National Sub-Class: 025000
Examiner Employee Number: 99028
Art Unit: 2142
Tech Center: 2100
Rejection Summary

102 Rejections: 0
103 Rejections: 5
Cited Patents

No patents were cited in this rejection.
Office Action Text


    DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-24 are rejected by 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. This judicial exception is not integrated into a practical
application because of the reasons stated below for Step 2A Prong Two. The claim(s) does/do
not include additional elements that are sufficient to amount to significantly more than the
judicial exception because of the reasons stated below for Step 2B.

Step 1: Two Criteria For Subject Matter Eligibility
	First, the claimed invention must be to one of the four statutory categories. 35 U.S.C. 101 defines the four categories of invention that Congress deemed to be the appropriate subject of a patent: process, machines, manufactures and composition of matter.
	The claims fall into the category of process in a computer system environment that is tangibly embodied in a manner so as to be executable.
	Second, the claimed invention also must qualify as patent-eligible subject matter, i.e., the claim must not be directed to a judicial exception unless the claim as a whole includes additional
limitations amount to significantly more than exception. The judicial exceptions (also called
“judicially recognized exceptions”) are subject matter that the courts have found to be outside of,
the four statutory categories of invention, and are limited to abstract ideas, laws of nature and
natural phenomena (including products of nature).

Step 2A: Prong One Recites Abstract Idea, Law Of Nature, Natural Phenomenon

Claims 1-20 are directed to an abstract idea, specifically, a mental process – concepts performed
in the human mind (including an observation, evaluation, judgement, opinion).

Independent claim 1 recites in part:

	determining at least one set of auxiliary model parameters 
	determining a set of primary model parameters 

	The limitations above are broadly and reasonably interpreted as a mental process, as a form or mental evaluation or judgement. For example, one can mentally evaluate data that is received, and determine based on a judgment and opinion on the most significant variable that will help accurately represent and predict outcomes in his/her mind. 

Step 2A: Prong Two Does Not Integrate Into Practical Application

The judicial exception is not integrated into a practical application. The method in claim 1
doesn’t explicitly include any computing components and does/do not articulate in a way that
would be considered “significantly more” under the MPEP guidelines.

	The limitations of “determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model;
determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain;
updating the neural network model with the set of primary model parameters”, as drafted, amount to insignificant extra-solution activity, as a form of gathering and analyzing information using conventional techniques and displaying the result, TLI Communications, 823 F.3d at 612-13, 118 USPQ2d at 1747-48. A form of pre-solution data gathering that does not provide integration into a practical application. 
Looking at the claims limitations as an ordered combination and taking the claim as a whole, there still is not integration into a practical application. The claims also don’t appear to improve the functioning of a computer or require the use of a specific machine. SEE MPEP 2106.04(d)(1) and 2106.05(a). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they don’t impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: Does Not Amount To Significantly More

As per MPEP 2106.05(II) the considerations discussed above for mere instructions to apply the exception and merely linking to a field of use are carried over for Step 2B.

Independent claim 21 recites in part:


	determining at least one set of auxiliary model parameters 

	determining a set of primary model parameters 

	The limitations above are broadly and reasonably interpreted as a mental process, as a form or mental evaluation or judgement. For example, one can mentally evaluate data that is received, and determine based on a judgment and opinion on the most significant variable that will help accurately represent and predict outcomes in his/her mind. 

Step 2A: Prong Two Does Not Integrate Into Practical Application

The judicial exception is not integrated into a practical application. The method in claim 21
doesn’t explicitly include any computing components and does/do not articulate in a way that
would be considered “significantly more” under the MPEP guidelines.

	The limitations of “a method for performing a task in at least a first primary domain, the method comprising: 
performing, by a neural network model trained on the first primary domain, the task in the first primary domain; 
and performing, by the trained neural network model trained on the first primary domain and fine-tuned to a second primary domain, the task in the first primary domain or the second primary domain;
determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model;
determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain;
updating the neural network model with the set of primary model parameters”, as drafted, amount to insignificant extra-solution activity, as a form of gathering and analyzing information using conventional techniques and displaying the result, TLI Communications, 823 F.3d at 612-13, 118 USPQ2d at 1747-48. A form of pre-solution data gathering that does not provide integration into a practical application. 
Looking at the claims limitations as an ordered combination and taking the claim as a whole, there still is not integration into a practical application. The claims also don’t appear to improve the functioning of a computer or require the use of a specific machine. SEE MPEP 2106.04(d)(1) and 2106.05(a). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they don’t impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: Does Not Amount To Significantly More

As per MPEP 2106.05(II) the considerations discussed above for mere instructions to apply the exception and merely linking to a field of use are carried over for Step 2B.

Independent claim 23 recites in part:

	determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model
	determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain

	The limitations above are broadly and reasonably interpreted as a mental process, as a form or mental evaluation or judgement. For example, one can mentally evaluate data that is received, and determine based on a judgment and opinion on the most significant variable that will help accurately represent and predict outcomes in his/her mind. 

Step 2A: Prong Two Does Not Integrate Into Practical Application

The judicial exception is not integrated into a practical application. The method in claim 1
doesn’t explicitly include any computing components and does/do not articulate in a way that
would be considered “significantly more” under the MPEP guidelines.

	The limitations of “an apparatus for training a neural network model comprising: a non-transitory computer-readable medium having executable instructions stored thereon for causing a processor and a memory to perform a method comprising:
determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model;
determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain;
updating the neural network model with the set of primary model parameters”, as drafted, amount to insignificant extra-solution activity, as a form of gathering and analyzing information using conventional techniques and displaying the result, TLI Communications, 823 F.3d at 612-13, 118 USPQ2d at 1747-48. A form of pre-solution data gathering that does not provide integration into a practical application. 
Looking at the claims limitations as an ordered combination and taking the claim as a whole, there still is not integration into a practical application. The claims also don’t appear to improve the functioning of a computer or require the use of a specific machine. SEE MPEP 2106.04(d)(1) and 2106.05(a). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they don’t impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: Does Not Amount To Significantly More

As per MPEP 2106.05(II) the considerations discussed above for mere instructions to apply the exception and merely linking to a field of use are carried over for Step 2B.


Independent claim 24 recites in part:

	determining at least one set of auxiliary model parameters 
	determining a set of primary model parameters by 

	The limitations above are broadly and reasonably interpreted as a mental process, as a form or mental evaluation or judgement. For example, one can mentally evaluate data that is received, and determine based on a judgment and opinion on the most significant variable that will help accurately represent and predict outcomes in his/her mind. 

Step 2A: Prong Two Does Not Integrate Into Practical Application

The judicial exception is not integrated into a practical application. The method in claim 1
doesn’t explicitly include any computing components and does/do not articulate in a way that
would be considered “significantly more” under the MPEP guidelines.

	The limitations of “a processor; a memory; and computer-executable instructions stored on a non-transitory computer- readable medium for causing the processor to perform a method comprising:
determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model;
determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain;
updating the neural network model with the set of primary model parameters”, as drafted, amount to insignificant extra-solution activity, as a form of gathering and analyzing information using conventional techniques and displaying the result, TLI Communications, 823 F.3d at 612-13, 118 USPQ2d at 1747-48. A form of pre-solution data gathering that does not provide integration into a practical application. 
Looking at the claims limitations as an ordered combination and taking the claim as a whole, there still is not integration into a practical application. The claims also don’t appear to improve the functioning of a computer or require the use of a specific machine. SEE MPEP 2106.04(d)(1) and 2106.05(a). Accordingly, these additional elements do not integrate the abstract idea into a practical application because they don’t impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.

Step 2B: Does Not Amount To Significantly More

As per MPEP 2106.05(II) the considerations discussed above for mere instructions to apply the exception and merely linking to a field of use are carried over for Step 2B.

Even considering these additional elements as a combination and taking the claim as a whole, they do not amount to significantly more. Accordingly, the claim recites an abstract idea. The claim doesn’t include additional elements that are sufficient to amount to significantly more than the judicial exception. The judicial exception is not integrated into a practical application. Therefore, the claim is not patent eligible.

Claim 2 is dependent on claim 1, and include a mental concept. A human mind can generate auxiliary domains from a primary domain through processes like generalization, abstraction, and analogy. This involves creating new concepts, frameworks, or perspectives based on a core understanding. Humans can extend a primary understanding (the primary domain) to encompass similar but distinct areas. For example, understanding the concept of "bird" might lead to a broader domain of "avian life" encompassing all birds and related concepts.

Claim 3 is dependent on claim 1, merely reciting the words "apply it" (or an equivalent) with the judicial exception, or merely including instructions to implement an abstract idea on a computer, or merely using a computer as a tool to perform an abstract idea.

Claim 4 is dependent on claim 1, and include a mental concept. A human mind can generate auxiliary domains from a primary domain through processes like generalization, abstraction, and analogy. This involves creating new concepts, frameworks, or perspectives based on a core understanding. 

Claim 5 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 6 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 7 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 8 is dependent on claim 1, and include a mathematical concept, quantifiable measuring of how well the model is performing. 

Claim 9 is dependent on claim 1, and include a mathematical concept, quantifiable measuring of how well the model is performing. 

Claim 10 is dependent on claim 1, and include a mathematical concept, quantifiable measuring of how well the model is performing. 
Claim 11 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 12 is dependent on claim 1, and include mental activity performed in the human mind (including an observation, evaluation, judgment, opinion). Therefore doesn’t break away from the reasons for identified abstract idea.

Claim 13 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 14 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 15 is dependent on claim 1, and include a mental concept. A human mind can generate auxiliary domains from a primary domain through processes like generalization, abstraction, and analogy. This involves creating new concepts, frameworks, or perspectives based on a core understanding. 

Claim 16 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 17 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 18 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 19 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 20 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim 22 is dependent on claim 1, and include outside organizations, which doesn’t provided integration into a practical or add significantly more to the abstract idea because an outside organization doesn’t add significantly more to the mental activity and it’s a general inclusion of a new organization associated with a mental activity, and therefore doesn’t break away from the reasons for the identified abstract idea.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8, 10-15, 17-18,  20-21 and 23-24 are rejected under 35 U.S.C. 103 as being unpatentable over ZHANG et al. (Pub No.: 20220147818 A1), hereinafter referred to as ZHANG, in view of Vahdatet al. (Pub No.: 20200257984 A1),  hereinafter referred to as Vahdatet. 

With respect to claim 1, ZHANG disclose:
A computer-implemented method for training a neural network model for sequentially learning a plurality of domains associated with a task, the computer- implemented method comprising: determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model (In the Abstract, ZHANG outlines a method where an auxiliary model is developed to improve the efficiency and accuracy of predictions made by a primary model by predicting new parameters based on contextual representations of data features. In paragraph [0064], ZHANG discloses machine learning where two models are involved: a primary model and an auxiliary model (referred to as model 700). The primary model is trained first, and during this training, it develops a set of representation vectors. These vectors summarize important information from the data used to train the primary model. Once the primary model is trained and the representation vectors are available, the auxiliary model is then trained to predict new parameters based on these vectors. The prediction task of the auxiliary model focuses on creating new parameters that should enhance or refine the primary model's performance)
Determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain (In paragraph [0063], ZHANG discloses the second neural network receiving the context vector from the first neural network and transforms it into a set of new parameters. These parameters are expected to aid in predicting values for the new features identified earlier. In paragraph [0064], the auxiliary model 700 is designed to predict new parameters that accurately generate new feature values)
With respect to claim 1, ZHANG does not explicitly disclose:
Updating the neural network model with the set of primary model parameters
However, it is known by Vahdatet to disclose:
Updating the neural network model with the set of primary model parameters (In paragraph [0036], Vahdat discloses updating the neural network or suitable machine learning models with one or more parameters of the primary model)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the method for “updating the neural network model with the set of primary model parameters” as taught by Vahdat  with the method for “training a neural network model for sequentially learning a plurality of domains associated with a task, the computer- implemented method comprising: determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model” and “determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain” as taught by ZHANG so the primary model can enhance the machine's performance by predicting which settings to adjust. It can also be used to improve the machine's condition by recommending actions to extend its lifespan as taught by ZHANG (see[0072]).

Regarding claim 2,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, Vahdat disclose:
The computer-implemented method of claim 1 further comprising: generating the at least one auxiliary domain from the primary domain (In Fig. 3 & paragraph [0036], Vahdat discloses the ancillary model generated from the primary model)
Wherein the generating the at least one auxiliary domain from the primary domain comprises modifying the one or more data points of the primary domain via data manipulation (In paragraph [0036], Vahdat discloses updating parameters via a training module (optimizing the cross-entropy loss term with respect to the parameters of the primary model (model 232)))
Wherein the at least one auxiliary domain comprises the one or more modified data points (In paragraph [0039], Vahdat discloses the ancillary model fine-tuned via relevant data (e.g., a validation set in the target domain custom-character with ground-truth labels, public datasets with suitable annotations, etc.))

Regarding claim 3,  ZHANG in view of Vahdat discloses elements of claim 2. In addition, ZHANG disclose:
The computer-implemented method of claim 2, wherein the data manipulation is performed automatically (In paragraph [0102], ZHANG discloses continual learning where a model seeks to adapt to new tasks or a shifting data distribution while avoiding catastrophic forgetting)

Regarding claim 4,  ZHANG in view of Vahdat discloses elements of claim 2. In addition, ZHANG disclose:
The computer-implemented method of claim 2, wherein the generating the at least one auxiliary domain from the primary domain comprises selecting the one or more data points from the primary domain (In paragraph [0202], ZHANG discloses that the auxiliary machine learning model predicts a set of new parameters of a primary model)

Regarding claim 5,  ZHANG in view of Vahdat discloses elements of claim 2. In addition, ZHANG disclose:
The computer-implemented method of claim 2, wherein the modifying the one or more data points of the primary domain via data manipulation comprises automatically and/or randomly selecting one or more transformations from a set of transformations (In paragraphs [0202], ZHANG discloses training a secondary machine learning model to predict new parameters for a primary machine learning model. The primary model takes a specific set of real-world features and produces a predicted version of those features)
Wherein each auxiliary domain of the at least one auxiliary domain is defined by one or more respective transformations of the set of transformations (In paragraph [0203], ZHANG further discloses that the auxiliary model consists of a neural network that processes multiple input vectors—each linked to a separate data point in the primary model. Each input vector includes a representation of the current real-world features for that data point, along with the corresponding value of the new feature for that same data point)

Regarding claim 6,  ZHANG in view of Vahdat discloses elements of claim 2. In addition, Vahdat disclose:
The computer-implemented method of claim 2, wherein the data manipulation comprises at least one image transformation (In paragraph [0045], Vahdat discloses the dataset comprising the distribution of custom-character and custom-character. Sub.)
Wherein the at least one image transformation comprises at least one of a photometric and a geometric transformation (In paragraph [0045], Vahdat further discloses the geometric means of those distributions)


Regarding claim 10,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, wherein a loss function associated with the second optimization step comprises: (ii) a second loss function associated with the at least one set of auxiliary model parameters and the primary domain (In paragraph [0220], ZHANG discloses that a loss function minimizes the difference between its predicted values and the actual observed values to improve the accuracy of the model)

Regarding claim 11,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, further comprising: initializing the neural network model, wherein initializing the neural network model comprises setting model parameters of a pre-trained neural network model as initial model parameters for the neural network model to fine-tune the pre-trained neural network model (In paragraph [0101], ZHANG discloses an auxiliary model to help initialize the parameters (weights) of an existing neural network)

Regarding claim 12,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, further comprising: selecting a first sample or a first batch of samples from the auxiliary domain for the determining at least one set of auxiliary model parameters and selecting a second sample or a second batch of samples from the primary domain and at least one of selecting a third sample or a third batch of samples from the primary domain and selecting a fourth sample or a fourth batch of samples from the at least one auxiliary domain for the determining a set of primary model parameters (In paragraph [0220], ZHANG discloses that the auxiliary model only has access to certain features during training. From these features, the model uses data points while keeping other data points hidden. This means the model can see some values for a feature but not all. The model then predicts the hidden values based on the new parameters it has generated. To improve its predictions, the model is trained using a loss function that adjusts the predicted parameters so that the predicted values align better with the actual values)

Regarding claim 13,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, wherein a set of auxiliary model parameters of the at least one set of auxiliary model parameters minimizes a respective loss associated with a respective auxiliary domain of the at least one auxiliary domain with respect to the set of current model parameters (In paragraph [0220], ZHANG discloses that a loss function minimizes the difference between its predicted values and the actual observed values to improve the accuracy of the model)

Regarding claim 14,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, Vahdat disclose:
The computer-implemented method of claim 1, wherein the set of primary model parameters minimizes a loss associated with the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain with respect to the current model parameters (In paragraph [0036], Vahdat discloses optimizing a cross-entropy loss term over the parameters of the primary model)

Regarding claim 15,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, wherein the steps of determining at least one set of auxiliary model parameters, determining a set of primary model parameters, and updating the neural network model are repeated until at least one of a gradient descent step size for the second optimization is below a threshold and a maximum number of gradient descent steps is reached (In paragraph [0148], ZHANG discloses that minimizing the KL divergence can be done using gradient descent, a common optimization technique used to update the model parameters incrementally to minimize the loss or cost function)
Regarding claim 17,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, wherein the one or more data points of the primary domain include or are divided into a first set of data points for training the neural network model, a second set of data points for validating the neural network model and a third set of data points for testing the neural network model (In paragraph [0068], ZHANG discloses neural networks, specifically an auxiliary model that includes a third neural network (referred to as 801). The main function of the third neural network is to transform the metadata into a standardized format called a metadata vector, which serves as a representation of the metadata values. This metadata vector is then supplied back to a second neural network)

Regarding claim 18,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, wherein the neural network model is trained on the one or more data points of the primary domain being a first primary domain in a first step, and wherein the trained neural network model is subsequently trained on data points of a second primary domain in a second step without accessing data points of the first primary domain in the second step (In paragraph [0063], ZHANG discloses the second neural network receiving the context vector from the first neural network and transforms it into a set of new parameters. These parameters are expected to aid in predicting values for the new features identified earlier. In paragraph [0064], the auxiliary model 700 is designed to predict new parameters that accurately generate new feature values)

Regarding claim 20,  ZHANG in view of Vahdat discloses elements of claim 18. In addition, ZHANG disclose:
A neural network trained in accordance with the method of claim 18 to perform the task in the first primary domain and the second primary domain (In paragraph [0059], ZHANG discloses a first neural network 701 and a second neural network 702 to perform a task)

With respect to claim 21, ZHANG disclose:
A method for performing a task in at least a first primary domain, the method comprising: performing, by a neural network model trained on the first primary domain, the task in the first primary domain (In paragraph [0063], ZHANG discloses a neural network trained on the primary model)
Performing, by the trained neural network model trained on the first primary domain and fine-tuned to a second primary domain, the task in the first primary domain or the second primary domain (In paragraph [0064], ZHANG discloses the trained neural network trained on the primary model and auxiliary model. The prediction task of the auxiliary model focuses on creating new parameters that should enhance or refine the primary model's performance)
Wherein the neural network model is fine-tuned by: determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with the second primary domain, wherein the second primary domain comprises one or more data points for training the neural network model (In the Abstract, ZHANG outlines a method where an auxiliary model is developed to improve the efficiency and accuracy of predictions made by a primary model by predicting new parameters based on contextual representations of data features. In paragraph [0064], ZHANG discloses machine learning where two models are involved: a primary model and an auxiliary model (referred to as model 700). The primary model is trained first, and during this training, it develops a set of representation vectors. These vectors summarize important information from the data used to train the primary model. Once the primary model is trained and the representation vectors are available, the auxiliary model is then trained to predict new parameters based on these vectors. The prediction task of the auxiliary model focuses on creating new parameters that should enhance or refine the primary model's performance)
Determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain (In paragraph [0063], ZHANG discloses the second neural network receiving the context vector from the first neural network and transforms it into a set of new parameters. These parameters are expected to aid in predicting values for the new features identified earlier. In paragraph [0064], the auxiliary model 700 is designed to predict new parameters that accurately generate new feature values)
With respect to claim 21, ZHANG does not explicitly disclose:
Updating the neural network model with the set of primary model parameters
However, it is known by Vahdatet to disclose:
Updating the neural network model with the set of primary model parameters (In paragraph [0036], Vahdat discloses updating the neural network or suitable machine learning models with one or more parameters of the primary model)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the method for “updating the neural network model with the set of primary model parameters” as taught by Vahdat  with the method for ”performing a task in at least a first primary domain, the method comprising: performing, by a neural network model trained on the first primary domain, the task in the first primary domain”, “performing, by the trained neural network model trained on the first primary domain and fine-tuned to a second primary domain, the task in the first primary domain or the second primary domain”, “wherein the neural network model is fine-tuned by: determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with the second primary domain, wherein the second primary domain comprises one or more data points for training the neural network model” and “determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain” as taught by ZHANG so the primary model can enhance the machine's performance by predicting which settings to adjust. It can also be used to improve the machine's condition by recommending actions to extend its lifespan as taught by ZHANG (see[0072]).
Regarding claim 22,  ZHANG in view of Vahdat discloses elements of claim 18. In addition, Vahdat disclose:
The method of claim 21, wherein the neural network model is fine-tuned to perform the task in the second primary domain without accessing data points of the first primary domain (In paragraph [0039], Vahdat disclose ancillary model 234 can be trained separately from primary model 232, using a different objective function if needed. It may also have a different architecture than primary model 232).

With respect to claim 23, ZHANG disclose:
An apparatus for training a neural network model comprising: a non-transitory computer-readable medium having executable instructions stored thereon for causing a processor and a memory to perform a method comprising: determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model (In the Abstract, ZHANG outlines a method where an auxiliary model is developed to improve the efficiency and accuracy of predictions made by a primary model by predicting new parameters based on contextual representations of data features. In paragraph [0064], ZHANG discloses machine learning where two models are involved: a primary model and an auxiliary model (referred to as model 700). The primary model is trained first, and during this training, it develops a set of representation vectors. These vectors summarize important information from the data used to train the primary model. Once the primary model is trained and the representation vectors are available, the auxiliary model is then trained to predict new parameters based on these vectors. The prediction task of the auxiliary model focuses on creating new parameters that should enhance or refine the primary model's performance)
Determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain (In paragraph [0063], ZHANG discloses the second neural network receiving the context vector from the first neural network and transforms it into a set of new parameters. These parameters are expected to aid in predicting values for the new features identified earlier. In paragraph [0064], the auxiliary model 700 is designed to predict new parameters that accurately generate new feature values)
With respect to claim 23, ZHANG does not explicitly disclose:
Updating the neural network model with the set of primary model parameters
However, it is known by Vahdatet to disclose:
Updating the neural network model with the set of primary model parameters (In paragraph [0036], Vahdat discloses updating the neural network or suitable machine learning models with one or more parameters of the primary model)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the apparatus for “updating the neural network model with the set of primary model parameters” as taught by Vahdat  with the apparatus for “training a neural network model for sequentially learning a plurality of domains associated with a task, the computer- implemented method comprising: determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model” and “determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain” as taught by ZHANG so the primary model can enhance the machine's performance by predicting which settings to adjust. It can also be used to improve the machine's condition by recommending actions to extend its lifespan as taught by ZHANG (see[0072]).


With respect to claim 24, ZHANG disclose:
A system for training a neural network model comprising: a processor; a memory; and computer-executable instructions stored on a non-transitory computer- readable medium for causing the processor to perform a method comprising: determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model (In the Abstract, ZHANG outlines a method where an auxiliary model is developed to improve the efficiency and accuracy of predictions made by a primary model by predicting new parameters based on contextual representations of data features. In paragraph [0064], ZHANG discloses machine learning where two models are involved: a primary model and an auxiliary model (referred to as model 700). The primary model is trained first, and during this training, it develops a set of representation vectors. These vectors summarize important information from the data used to train the primary model. Once the primary model is trained and the representation vectors are available, the auxiliary model is then trained to predict new parameters based on these vectors. The prediction task of the auxiliary model focuses on creating new parameters that should enhance or refine the primary model's performance)
Determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain (In paragraph [0063], ZHANG discloses the second neural network receiving the context vector from the first neural network and transforms it into a set of new parameters. These parameters are expected to aid in predicting values for the new features identified earlier. In paragraph [0064], the auxiliary model 700 is designed to predict new parameters that accurately generate new feature values)
With respect to claim 24, ZHANG does not explicitly disclose:
Updating the neural network model with the set of primary model parameters
However, it is known by Vahdatet to disclose:
Updating the neural network model with the set of primary model parameters (In paragraph [0036], Vahdat discloses updating the neural network or suitable machine learning models with one or more parameters of the primary model)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to incorporate the system apparatus for “updating the neural network model with the set of primary model parameters” as taught by Vahdat with the system for “training a neural network model for sequentially learning a plurality of domains associated with a task, the computer- implemented method comprising: determining at least one set of auxiliary model parameters by simulating at least one first optimization step based on a set of current model parameters and at least one auxiliary domain, wherein the at least one auxiliary domain is associated with a primary domain comprising one or more data points for training a neural network model” and “determining a set of primary model parameters by performing a second optimization step based on the set of current model parameters and the primary domain and based on the at least one set of auxiliary model parameters and at least one of the primary domain and the at least one auxiliary domain” as taught by ZHANG so the primary model can enhance the machine's performance by predicting which settings to adjust. It can also be used to improve the machine's condition by recommending actions to extend its lifespan as taught by ZHANG (see[0072]).
Claim(s) 7-8 are rejected under 35 U.S.C. 103 as being unpatentable over ZHANG in view of Vahdat and further in view of Kolouri et al. (US Patent No.: 11,210,559 B1), hereinafter referred to as Kolouri.

Regarding claim 7,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, wherein the second optimization step employs a regularizer having a first objective of avoiding catastrophic forgetting and a second objective of encouraging domain adaptation 
However, Kolouri disclose the limitation (In Col. 8, lines 20-35, Kolouri discloses Task A and Task B exhibits selective plasticity without catastrophic forgetting. Regularizing loss function used for learning a second task (task B) that is different from the first task A tasks and the method can adaptively learn the changes in the training data.)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of ZHANG, Vahdat before them to include Kolouri’s selective plasticity using artificial neural network to achieve selective plasticity of the artificial neural network as taught by Kolouri (see(Col. 2, lines 55-58)) .


Regarding claim 8,  ZHANG in view of Vahdat discloses elements of claim 1. In addition, ZHANG disclose:
The computer-implemented method of claim 1, wherein the second optimization step employs a loss function having terms associated with task learning, avoiding catastrophic forgetting, and encouraging domain adaptation 
However, Kolouri disclose the limitation(In Col. 8, lines 20-35, Kolouri disclose an original loss function for learning a second task (task B) different than the first task A (i.e., the cross entropy loss), λ is the regularization coefficient, γ.sub.k is the synaptic importance parameter defined,  the importance parameters may be calculated in an online manner such that there is no need for definition of tasks and the method can adaptively learn the changes in the training data.)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of ZHANG, Vahdat before them to include Kolouri’s selective plasticity using artificial neural network to achieve selective plasticity of the artificial neural network as taught by Kolouri (see(Col. 2, lines 55-58)) .


Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over ZHANG in view of Vahdat, Kolouri and further in view of Tomasev et al. (Pub No.: 20200152333 A1),  hereinafter referred to as Vahdat. 

Regarding claim 9, ZHANG in view of Vahdat and Kolouri disclose elements of claim 8. ZHANG in view of Vahdat and Kolouri does not explicitly disclose:
The computer-implemented method of claim 8, wherein the loss function is used for optimization of the model via gradient descent
However, Tomasev disclose the limitation (In paragraph [0071], Tomasev discloses the loss function using an optimizer (e.g., gradient descent optimizer))
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of ZHANG, Vahdat and Kolouri before them to include Tomasev’s  prediction using neural network to improve the performance of the neural network on the main task but that are not used to make predictions after training as taught by Tomasev (see[0069]).

Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over ZHANG in view of Vahdat and further in view of Ceccaldi et al. (Pub No.: 20210042930  A1),  hereinafter referred to as Ceccaldi. 

Regarding claim 16, ZHANG in view of Vahdat disclose elements of claim 1. ZHANG in view of Vahdat does not explicitly disclose:
The computer-implemented method of claim 1, wherein at least one of the at least one first optimization step comprises at least one gradient descent step and the second optimization step comprises a gradient descent step
However, Ceccaldi disclose the limitation (In paragraph [0036], Ceccaldi discloses that one of the initial optimization steps includes a gradient descent step, and the second optimization step also features a gradient descent step)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of ZHANG and Vahdat before them to include Ceccaldis’s  image analysis to determine the suitability of an image for a trained machine-learning model as taught by Ceccaldi (see[0001]).

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over ZHANG in view of Vahdat and further in view of Wang et al. (Pub No.: 20200293032 A1),  hereinafter referred to as Wang. 

Regarding claim 19, ZHANG in view of Vahdat disclose elements of claim 1. ZHANG in view of Vahdat does not explicitly disclose:
The computer-implemented method of claim 18, wherein the neural network model is trained by empirical risk minimization (ERM)
However, Wang disclose the limitation (In paragraph [0057], Wang discloses that the neural network is trained as empirical risk minimization or structural risk minimization)
Accordingly, it would have been obvious to a person having ordinary skills in the art before the effective filling date of the claimed invention, having the teaching of ZHANG and Vahdat before them to include Wang’s substation asset to enhance the classification accuracy under a small sample size and unbalanced data challenge as taught by Wang (see[0033]).



Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to EVEL HONORE whose telephone number is (703) 756- 1179. The examiner can normally be reached on Monday through Friday from 8am to 5pm. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor Ashish Thomas. can be reached on (571) 272-0631. The fax phone number for the organization where this application or proceeding is assigned is 571-273- 8300. Information regarding the status of published or unpublished application may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and mange patent submissions in Patent Center, center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571- 272-1000

/EVEL  HONORE/Examiner, Art Unit 2142               

/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2142
Patent Application 17514698 - METHOD AND SYSTEM FOR LEARNING REPRESENTATIONS - Rejection

Patent Application 17514698 - METHOD AND SYSTEM FOR LEARNING REPRESENTATIONS

Application Information

Rejection Summary

Cited Patents

Office Action Text

Transform your business with AI in minutes, not months