Patent Application 17287879 - Systems and Methods for Active Transfer Learning

Title: Systems and Methods for Active Transfer Learning with Deep Featurization

Application Information

Invention Title: Systems and Methods for Active Transfer Learning with Deep Featurization
Application Number: 17287879
Submission Date: 2025-05-14T00:00:00.000Z
Effective Filing Date: 2021-04-22T00:00:00.000Z
Filing Date: 2021-04-22T00:00:00.000Z
National Class: 702
National Sub-Class: 019000
Examiner Employee Number: 98488
Art Unit: 1685
Tech Center: 1600

Rejection Summary

102 Rejections: 0
103 Rejections: 1

Cited Patents

No patents were cited in this rejection.

Office Action Text

DETAILED ACTION
Applicant's Remarks, filed 12/30/2024, have been fully considered. The following rejections and/or objections are either reiterated or newly applied in view of instant application amendments. They constitute the complete set presently being applied to the instant application. Herein, "the previous Office action" refers to the Non-Final rejection of 09/28/2024.

Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Status
Claims 67-70, 72-75, 77-82, 84-89 are currently pending and under exam herein.
Claims 71, 76, and 83 are newly canceled.
Claims 87-89 are newly added.
Claims 67-70, 72-75, 77-82, 84-89 are rejected.

Withdrawn Rejections/Objections

Rejections and/or objections not reiterated from previous office actions are hereby withdrawn in view of the 12/30/2024 amendments.
All rejections of claims 71, 76, and 83 are hereby withdrawn; its cancellation moots the rejections.
The drawing objection regarding FIG. 1 is hereby withdrawn in view of replacement drawing sheet and amendments.
The claim interpretation of “secondary model” is hereby withdrawn in view of claim amendments, in which “secondary model” limitations are now interpreted as “orthogonal machine learning models.”

The rejection under 35 U.S.C. 112(b) is hereby withdrawn on independent claims 67, 85, and 86, and therein, their dependent claims, regarding limitations “the drug candidate” and “secondary models.”

Upon further consideration, newly applied rejections/portions are necessitated by instant application amendment as discussed below.

Priority
In view of the previously discussed claim for the benefit of priority, all claims are examined for an effective filing date of 10/23/2018 in this action. This application is a 371 of PCT/US19/57468 10/22/2019 which claims benefit of 62/749,653, filed 10/23/2018. In future actions, the effective filing date of one or more claims may change, due to amendments to the claims, or further analysis of the disclosure(s) of the priority application(s).

Drawings
The replacement Drawing of FIG 1, submitted 12/30/2024, is accepted.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 67-70, 72-75, 77-82, 84-89 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract ideas without significantly more. Any newly applied rejection/portion is necessitated by instant application amendment.
The instant rejection reflects the framework as outlined in the MPEP at 2106.04:
Framework with which to Evaluate Subject Matter Eligibility:
(1) Are the claims directed to a process, machine, manufacture, or composition of matter;
(2A) Prong One: Do the claims recite a judicially recognized exception, i.e. a law of nature, a natural phenomenon, or an abstract idea;
Prong Two: If the claims recite a judicial exception under Prong One, then is the judicial exception integrated into a practical application (Prong Two); and
(2B) If the claims do not integrate the judicial exception, do the claims provide an inventive concept.

Framework Analysis as Pertains to the Instant Claims:
With respect to step (1): yes, the claims are directed to a computational method (claims 67-84), system (claim 85), and non-transitory computer storage medium (claim 86-89) for generating drug candidate property predictions through active transfer learning of featurization with deep neural network (herein, dfNN) & orthogonal models (herein, oMLM), therefore the answer is "yes."
With respect to step (2A), under the broadest reasonable interpretation (BRI), a method, system, and non-transitory computer storage medium generating drug candidate property predictions through active transfer learning of featurization with deep neural networks & orthogonal models. Instant claims are therefore directed to the judicial exceptions of abstract groupings, both mathematical concepts (training and using the trained orthogonal model… random forest, a support vector machine, XGBoost, linear regression… computing an out of bag score/validation score…) and mental processes (training…perform… process… validating… compositing…to classify… predicts… identify…) which can be performed with the human mind with pen and paper, albeit more slowly.
With respect to step (2A)(1), the claims recite abstract ideas. To determine if the claims recite any concepts that equate to an abstract idea, law of nature, or natural phenomenon, MPEP at 2106.03 teaches abstract ideas include mathematical concepts (mathematical formulas or equations, mathematical relationships, and mathematical calculations), certain methods of organizing human activity, and mental processes (including procedures for collecting, observing, evaluating, and organizing information (see MPEP 2106.04(a)(2)). In the instant application, the claims recite the following limitations that equate to an abstract idea with mental steps and mathematical concepts. The groupings of mental processes (in particular, steps for predicting and analyzing molecules which can be performed in the mind a chemist with pen and paper) and mathematical concepts (in particular, training and using algorithms of a deep featurizer neural network/dfNN and an orthogonal machine learning model/oMLM to select and predict molecules based on mathematical relationships) are directed to abstract ideas, as follows:
Mental processes:
Claims 67, 85, and 86: to perform operations for reducing over- fitting when training machine learning models to predict properties of molecules…processing a representation of the training molecule using the deep featurizer neural network to generate a set of one or more intermediate outputs from an intermediate layer of the deep featurizer neural network which is between an input layer and an output layer of the deep featurizer neural network
Claim 73: validating the dfNN and the orthogonal model.
Claim 78: compositing the dfNN and the orthogonal model as a composite model to classify a new set of inputs.
Claim 79: wherein the trained orthogonal model predicts a property of a drug candidate.
Claim 80: property of the drug candidate comprises at least one of the group consisting of absorption, distribution, metabolism, elimination, toxicity, solubility, metabolic stability, in vivo endpoints, ex vivo endpoints, molecular weight, potency, lipophilicity, hydrogen bonding, permeability, selectivity, pKa, clearance, half-life, volume of distribution, plasma concentration, and stability.
Claim 84: using the trained orthogonal model to identify a drug candidate.

Mathematical concepts:
Claims 67, 85, and 86: training a deep featurizer neural network… training a deep featurizer neural network on the training data,…perform a plurality of different molecule property prediction tasks: training an orthogonal machine learning model to perform an orthogonal molecule property prediction task using the deep featurizer neural network…the orthogonal machine learning model is a non-differentiable model that is separate from the deep featurizer neural network and has a lower variance than the deep featurizer neural network… training the orthogonal machine learning model to process a model input that comprises the intermediate outputs generated by the intermediate layer of the deep featurizer neural network to generate a predicted molecule property value of the training molecule.
Claim 69 and 88: training the deep featurizer neural network for one or more epochs.
Claims 70 and 89: wherein each epoch comprises training the deep featurizer neural network on the one or more datasets of molecules and associated molecule property values.
Claims 68 and 87: freezing weights of the dfNN.
Claim 74: computing an out of bag score for the orthogonal model.
Claim 75: training a deep featurizer neural network on a master data set comprising a training data set and a validation data set; training the orthogonal model on the training data set; and computing a validation score for the orthogonal models based on the validation data set.
Claim 77: wherein the orthogonal model comprises at least one of: a random forest, a support vector machine, XGBoost, linear regression, nearest neighbor, naive bayes, decision trees, or k-means clustering.

Hence, the claims explicitly recite elements that, individually and in combination, constitute abstract ideas.
Because the claims do recite judicial exceptions, direction under step (2A)(2) provides that the claims must be examined further to determine whether they integrate the abstract ideas into a practical application (MPEP 2106.04(d). A claim can be said to integrate a judicial exception into a practical application when it applies, relies on, or uses the judicial exception in a manner that imposes a meaningful limit on the judicial exception. This is performed by analyzing the additional elements of the claim to determine if the abstract idea is integrated into a practical application (MPEP 2106.04(d)I; MPEP 2106.05(a-h)). If the claim contains no additional elements beyond the judicial exception, the claim is said to fail to integrate into a practical application (MPEP 2106.04(d).III).
With respect to the instant recitations, the claims recite the following additional elements considered for practical application:
Claims 67, 85, and 86: computer, processors, non-transitory computer storage media, instructions… system… one or more computers; and one or more storage devices communicatively coupled to the one or more computers, wherein the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers…
Claims 67, 85, and 86: obtaining training data that comprises one or more datasets of molecules and associated molecule property values.
Claim 72: datasets has labels for a different characteristic of inputs of the dataset.
Claim 81: wherein the drug candidate is a ligand molecule.
Claim 82: wherein the ligand molecule targets a protein.

Said structural analysis steps that are “in addition” to the recited judicial exception in the instant claims represent mere instructions or field of use limitations (datasets used for dfNN/oMLM … process a model input that comprises the intermediate outputs generated by the intermediate layer… drug candidate, ligand molecule…protein) to implement in the recited judicial exception and do not impart meaning to said recited judicial exception, such that it is applied in a practical manner. Further with respect to the additional elements in the instant claims, these steps direct to mere data gathering and handling with a database (datasets of molecules and associated molecule property values.… property of the drug candidate datasets has labels for a different characteristic of inputs of the dataset… ) to carry out the abstract idea without imposing any meaningful limitation on the abstract idea. Thereby these steps are insignificant extra-solution activity steps and are insufficient to integrate an abstract idea into a practical application (MPEP 2106.05(g)).
Further steps herein directed to additional non-abstract elements of computer components (computer-implemented, system, processors, non-transitory computer storage media) do not describe any specific computer components or architecture to perform or carry out the abstract idea. The steps are able to be performed in the mind, but for the recitation of the computer system. Other than reciting "computer-implemented", nothing in the claim element precludes the step from practically being performed in the human mind. There are no specifics in the claims that the above steps are only rooted in computer technology and are not able to be performed in human mind. The claims state nothing more than accessing generic computer elements (computer, processors, non-transitory computer storage media) or collecting data sets from online computer database, e.g. Tox21, (datasets of molecules and associated molecule property values) used as a tool in general processes of a generic neural network with generic input/intermediate/output layers [0081: specific processes for active transfer learning in accordance with embodiments of the invention are described above; however, one skilled in the art will recognize that any number of processes can be utilized as appropriate to the requirements of specific application…] to handle data and perform the functions (perform… processing… generating …) that constitute the abstract idea. Hence, these are mere instructions to apply the abstract idea using computer elements as a tool, and therefore the claim does not integrate that abstract idea into a practical application. The courts have weighed in and consistently maintained that when, for example, a memory, display, processor, machine, etc.… are recited so generically (FIG 4, [0084-0085]) that they represent no more than mere instructions to apply the judicial exception on a computer as a tool, not any particular improvement to the computer, and these limitations may be viewed as nothing more than generally linking the use of the judicial exception to the technological environment of a computer. (see MPEP 2106.05(f)). None of the recited dependent claims recite additional elements which would integrate a judicial exception into a practical application.
As such, the claims are lastly evaluated using the step (2B) analysis, wherein it is determined that because the claims recite abstract ideas, and do not integrate that abstract ideas into a practical application, the claims also lack a specific inventive concept. The judicial exception alone cannot provide the inventive concept or the practical application and that the identification of whether the additional elements amount to such an inventive concept requires considering the additional elements individually and in combination to determine if they provide significantly more than the judicial exception. (MPEP 2106.05.A i-vi).
With respect to the instant claims, the additional elements of data gathering, instructions, and field of use limitations described above do not rise to the level of significantly more than the judicial exception. As directed in the Berkheimer memorandum of 19 April 2018 and set forth in the MPEP, determinations of whether or not additional elements (or a combination of additional elements) may provide significantly more and/or an inventive concept rests in whether or not the additional elements (or combination of elements) represents well-understood, routine, conventional activity. Said assessment is made by a factual determination stemming from a conclusion that an element (or combination of elements) is widely prevalent or in common use in the relevant industry, which is determined by either a citation to an express statement in the specification or to a statement made by an applicant during prosecution that demonstrates a well-understood, routine or conventional nature of the additional element(s); a citation to one or more of the court decisions as discussed in MPEP 2106(d)(II) as noting the well-understood, routine, conventional nature of the additional element(s); a citation to a publication that demonstrates the well-understood, routine, conventional nature of the additional element(s); and/or a statement that the examiner is taking official notice with respect to the well-understood, routine, conventional nature of the additional element(s).
With respect to the instant recitations, the claims recite the following additional elements considered for inventive concepts:
Claims 67, 85, and 86: computer, processors, non-transitory computer storage media, instructions… system… one or more computers; and one or more storage devices communicatively coupled to the one or more computers, wherein the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers…
Claims 67, 85, and 86: obtaining training data that comprises one or more datasets of molecules and associated molecule property values.
Claim 72: datasets has labels for a different characteristic of inputs of the dataset.
Claim 81: wherein the drug candidate is a ligand molecule.
Claim 82: wherein the ligand molecule targets a protein.

These additional elements do not contribute significantly more to well-known and conventional steps to obtain compound structure or chemical data, performed with routine laboratory equipment, and analyzed by one with ordinary skill in the art as of the effective filing date. The instant claims recite methods (active learning transfer, multitasking dfNN) known in the art by chemoinformaticists with active learning transfer taught by Reker et al. (Reker, D & Schneider G (2015) Active-learning strategies in computer-assisted drug discovery. Drug discovery today, 20(4), 458-465.; PTO 892 cited, herein Reker) and multitask neural network/MTNN featurization by Ramsundar B et al (Ramsundar B et al. (2015) Massively multitask networks for drug discovery." arXiv preprint arXiv:1502.02072). Reker teaches active learning transfer in drug candidate selection with machine learning algorithms which adapt the structure–activity landscapes and select data points for testing and feeding back into the model [Reker at Abstract]. The further step of deep featurization with repeated linear and nonlinear transformations in multitask neural networks and on multiple chemical dataset inputs as fixed training, validation, and test sets is well-understood, routine and conventional activity as taught by Ramsundar [Abstract, p2 (3.4.)].
There are no active steps of drug candidate discovery by active learning transfer and MTNN deep featurization that are unconventional as reviewed by Reker and Ramsundar. Data (molecular datasets collected from open databases, associated molecule property values, property of the drug candidate) are merely manipulated data to be used in the judicial exception. The additional elements do not comprise an inventive concept when considered individually or as an ordered combination, as evidenced by the cited references teaching the combination of elements as well as the individual elements themselves, that transforms the claimed judicial exception into a patent-eligible application of the judicial exception.
With respect to the instant claims, the steps (analyzing for drug candidates) and additional elements (dfNN/orthogonal models…ligand/protein) are mere instructions or conventional field of use limitations for processing input (training a deep featurizer neural network/orthogonal model oMLM on datasets …) to the judicial exceptions (mathematical concepts such as freezing weights of the dfNN, computing out of bag scores/validation scores, machine learning algorithms- random forest, a support vector machine…) and mental processes of drug candidate prediction (training…perform… process… validating… predicts… identify…), and so, do not comprise an inventive concept when considered individually or as an ordered combination that transforms the claimed judicial exception into a patent-eligible application of the judicial exception. Therefore, the claims do not amount to significantly more than the judicial exception itself (Step 2B: No). As such, claims 67-70, 72-75, 77-82, 84-89 are not patent eligible.

Response to Remarks: 101

The Applicant’s remarks (12/30/2024 Remarks p.2-4] have been fully considered and are not persuasive, as discussed below. Any newly applied rejection/portion is necessitated by instant application amendment.
Applicant explains “the claims recite a two-stage machine learning training procedure. The first stage involves training a deep featurizer neural network to perform a plurality of molecule property prediction tasks. The second stage involves training an orthogonal machine learning model to process intermediate outputs generated by the deep featurizer neural network to perform an orthogonal molecule property prediction task.” Under Step 2A, Prong 1 analysis of recitation of judicial exceptions, Applicant asserts “machine learning training may be based on mathematical concepts, none of those mathematical concepts are recited in the claims.” Further, Applicant asserts “the claim does not recite a mental process because, the human mind cannot practically train a machine learning model,” similar to Example 39 of the 2019 USPTO Subject Matter Eligibility Examples.
However, it is respectfully submitted that Applicant’s assertion is not persuasive because claims recite mathematical concepts and computations, such as freezing weights of the dfNN (claims 68 and 87), computing out of bag scores/validation scores (claims 74-75), machine learning algorithms- random forest, a support vector machine…(claim 77). Instant application is based on machine learning models and a neural network. A neural network is, in the simplest terms, a series of mathematical functions used to sequentially transform input values. Training a neural network for a given task comprises sequentially evaluating said functions for given data and adjusting function parameters accordingly, until error values (i.e., calculated differences between produced outputs and target outputs) are minimized. These steps are repeated until the arrangement of functions produce output with sufficient correspondence to target output (i.e., error values below a given threshold). Hence, the instant claims are directed to and recite mathematical concepts.
Further, instant claims are directed to mental processes and are not analogous to the fact pattern of Example 39. Example 39 considers a hypothetical method for facial detection neural network, in which a neural network is trained with digital facial images, resulting in a facial recognition system. The exemplified claim comprises modifying a set of digital facial images by performing digital transformation steps (e.g., smoothing and contrast reduction) and training a neural network in two stages using unmodified facial images, modified facial images, and non-facial images. The instant claims recite “obtaining training data that comprises one or more datasets of molecules and associated molecule property values … [and] training the deep featurizer neural network to perform a plurality of different molecule property prediction tasks…training an orthogonal machine learning model to perform an orthogonal molecule property prediction task… processing a representation of the training molecule using the deep featurizer neural network to generate a set of one or more intermediate outputs from an intermediate layer of the deep featurizer neural network to generate a predicted molecule property value of the training molecule;” (claim 67). These features may comprise such elements as “absorption, distribution, metabolism, elimination, toxicity, solubility”, i.e., molecular feature values [claim 80]. Given numeric input (e.g., derived feature values), as evidenced by Crisp et al. (The Journal of Undergraduate Neuroscience Education 14(1): A13-A22; published Fall 2015), “construct[ing], train[ing] and test[ing] artificial neural networks by hand on paper” (pg. A13, Abstract) is indeed possible. Training of a neural network, then, is both an act of calculation (i.e., mathematical concept) and mental process, which can be performed by the human mind.
Under Step 2A, Prong 2 of the eligibility analysis for practical integration, Applicant asserts an improvement to the functioning of a computer (by training a machine learning model to the challenge of a molecule property prediction task trained on a small amount of training data without over-fitting), and an improvement to the field of computational prediction of molecule properties with a technical solution (by a two-stage machine learning training procedure).
However, it is respectfully submitted that Applicant’s assertion for improvement to functioning of a computer is not persuasive because instant claims are not analogous to the fact pattern of Example 39, in which a neural network (with a particular computer architecture and environment) is trained (modified or transformed) with digital facial images which could not be performed by the human mind with pen & paper, resulting in a particularized facial recognition system. Instant claims of a deep featurizer neural network or the orthogonal machine learning model do not recite any particularity to the computer architecture or environment aside from an input/intermediate/output layer of the dfNN, and an oMLM sequentially performed with products of the dfNN. Instant claims do not provide a particular machine or particular improvement to the functioning of a computer. “Reducing overfitting” and use of smaller training dataset are intended use or field of use limitations, and can be interpreted as improvements to the judicial exceptions, a better mental and mathematical process. In contrast to instant claims, Example 39 recites training steps of a neural network on input digital images which is specifically not possible by hand, nor the digital transformation steps by a human mind. The eligibility in Example 39 arises from the particular features of the claimed process, and should not be generalized as an axiom regarding eligibility of claims that recite training of neural networks. Moreover, prosecutorial guidance provided in the MPEP incorporates, and supersedes, that provided in the 2019 PEG.
Furthermore, according to the updated July 2024 guidance on AI models, an improvement in the judicial exception (a mathematical machine learning model used to determine molecular features) itself is not an improvement in AI technology (the processor or the neural network). For example, in MPEP 2106.05(a), subsection II), in In re Board of Trustees of Leland Stanford Junior University, 989 F.3d 1367, 1370, 1373 (Fed. Cir. 2021) ( Stanford I ), the applicant claimed methods of resolving a haplotype phase involving steps of determining an inheritance state based on received allele data using a Hidden Markov Model (i.e. a machine learning model). The applicant further claimed determining a haplotype phase (i.e. a molecule property prediction task) based on the pedigree data, the earlier-calculated inheritance state, transition probability data, and population linkage disequilibrium data (i.e. datasets of molecules and associated molecule property values) using a computer system (i.e. processor/system). The Applicant argued that the claimed process was an improvement over prior processes because it “yields a greater number of haplotype phase predictions,” (i.e. to generate a predicted molecule property value/drug candidate) but the court found it was not “an improved technological process” and instead was an improved “mathematical process” (reduced overfitting, used a smaller training dataset). The court explained that such claims were directed to an abstract idea because they describe “mathematically calculating alleles' haplotype phase,” like the “mathematical algorithms for performing calculations” (neural network implemented machine learning model) in prior cases. Notably, the Federal Circuit found that the claims did not reflect an improvement to a technological process, which would render the claims eligible. As such, instant claims 67-70, 72-75, 77-82, 84-89 are not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. § 102 and § 103 (or as subject to pre-AIA 35 U.S.C. § 102 and § 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in § 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. § 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or
nonobviousness.

Note: citations from the instant application are italicized in the following section.

A. Claim 67-70, 72-75, 77-82, 84-89 are rejected under 35 U.S.C. 103 as being unpatentable over
Ramsundar et al. (Ramsundar B et al. (2015) Massively multitask networks for drug discovery. arXiv preprint arXiv:1502.02072; PTO 892 document, herein Ramsundar), in view of Baldi et al. (Baldi P, Sadowski P, & Whiteson D. (2014). Deep learning in high-energy physics: Improving the search for exotic particles. arXiv preprint arXiv, 1402: 9 pages; PTO 892 cited, herein Baldi). Any newly applied rejection/portion is necessitated by instant application amendment.

Note: citations from the instant application are italicized in the following section.

Regarding instant independent claims 67 (method), 85 (system), and 86 (CRM), instant application recites:
a computer-implemented method
a system
a non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations
obtaining training data that comprises one or more datasets of one or more molecules and associated molecule property values;
training a deep featurizer neural network on the training data, wherein training the deep featurizer neural network comprises training the deep featurizer neural network to perform a plurality of different molecule property prediction tasks:
training an orthogonal machine learning model to perform an orthogonal molecule property prediction task using the deep featurizer neural network, wherein: the orthogonal machine learning model is a non-differentiable model that is separate from the deep featurizer neural network and has a lower variance than the deep featurizer neural network; and wherein training the orthogonal machine learning model comprises, for each of a plurality of training molecules:
processing a representation of the training molecule using the deep featurizer neural network to generate a set of one or more intermediate outputs from an intermediate layer of the deep featurizer neural network which is between an input layer and an output layer of the deep featurizer neural network,

The prior art to Ramsundar discloses:
multitask neural networks/MTNN (deep featurizer neural network-herein, dfNN with orthogonal machine learning model-here in oMLM) for drug candidate [Abstract] virtual screening [p2 Col 1] with multiple chemical datasets [Appendix A. and p2 (3.1: “models were trained on datasets gathered from publicly available data…datasets were divided into four groups: PCBA, MUV, DUD-E, and Tox21”)]. Dataset for PubChem Assays/PCBA, includes dose-response features (drug candidate properties from assayed chemical biology [Appendix A]), MUV, DUD-E, and Tox21”(toxicity)] (obtaining training data that comprises one or more datasets of one or more molecules and associated molecule property values).
a neural network is a nonlinear classifier that performs repeated linear and nonlinear transformations on its input (dfNN). Let xi represent the input to the i-th layer (multiple layers) of the network (where x0 is simply the feature vector). The transformation performed…where Wi and bi are respectively the weight matrix and bias for the i-th layer, and a nonlinearity with a rectified linear unit [p2: 3.1] (training a deep featurizer neural network on the training data, wherein training the deep featurizer neural network comprises training the deep featurizer neural network to perform a plurality of different molecule property prediction tasks).
Ramsunder utilized pyramidal single task models, for example RF (random forest), (oMLM/orthogonal models based on RF is analogous to single task models which are nondifferentiable, low variance model) as well as multitask networks/MTNN (dfNN is analogous to pyramidal multitask networks) [p4: 4.1 and 4.4, and p25 C: “the pyramidal single-task networks were trained with the same settings, but for 100K steps. The vanilla single-task networks were trained with learning rate .001 for 100 K steps”] (training an orthogonal machine learning model to perform an orthogonal molecule property prediction task using the deep featurizer neural network, wherein: the orthogonal machine learning model is a non-differentiable model that is separate from the deep featurizer neural network and has a lower variance than the deep featurizer neural network; and wherein training the orthogonal machine learning model comprises, for each of a plurality of training molecules)
after L such transformations, the final layer of the network xL is then fed to a simple linear classifier, such as the softmax (orthogonal [machine learning] model/linear regression), which predicts the probability that the input x0 has label j. Reference label predictions of inputs are analogous to predicting associated molecular values (as in instant claim 70: labels for a different characteristic of inputs of the dataset (training a deep featurizer neural network on the training data, wherein training the deep featurizer neural network comprises training the deep featurizer neural network to perform a plurality of different molecule property prediction tasks: training an orthogonal machine learning model to perform an orthogonal molecule property prediction task using the deep featurizer neural network) [p2: 3.1 and FIG 1].
massively multitask neural architectures/MTNN (composed of trained dfNN/ orthogonal models) provide a learning framework for drug discovery that synthesizes information from many distinct biological sources [Abstract]. Nearly every target class (drug candidate groups) realized gains, suggesting that the multitask framework is applicable to experimental data from multiple target classes. (processing a representation of the training molecule using the deep featurizer neural network to generate a set of one or more intermediate outputs from an intermediate layer of the deep featurizer neural network which is between an input layer and an output layer of the deep featurizer neural network, training the orthogonal machine learning model to process a model input that comprises the intermediate outputs generated by the intermediate layer of the deep featurizer neural network to generate a predicted molecule property value of the training molecule) [Table A.3-4, and p7:4.5.2].

However, Ramsundar is silent to explicitly teaching computer processors, system, and medium used in neural network discovery.
The prior art to Baldi discloses machines with 16 Intel Xeon cores, a graphics processor, and 64 GB memory [p8: Computation].
Therefore, it would have been obvious to someone of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Ramsundar’s multitask neural networks/MTNN for drug candidate virtual screening to incorporate Baldi’s computer system to deep transfer learning networks to speed up stochastic gradient descent algorithms applied to huge drug compound datasets with graphics processors/GPUs (Baldi at [p2 Col 1 para 4]). One of ordinary skill in the art would be motivated to combine these prior art elements because Baldi teaches “the high dimensionality of data, referred to as the feature space, makes it intractable to generate enough simulated collisions (i.e. drug candidate property assays and searches) to describe the relative likelihood in the full feature space, and machine-learning tools are used for dimensionality reduction. Machine-learning classifiers such as neural networks provide a powerful way to solve this learning problem” (Baldi at [p2 Col 1 para 2]). One of ordinary skill in the art would predict that Baldi’s GPU-based systems could be readily added to Ramsundar’s MTNNs with reasonable expectation of success, as both Baldi and Ramsundar teach neural network based search discovery (target molecules and particles) in computational chemistry and physics. The invention is therefore prima facie obvious.

Regarding instant claims 68 and 87, instant application recites:
prior to training the orthogonal machine learning model, freezing weights of the dfNN.

The prior art to Ramsundar teaches controlling the weights of the networks before outputs generated, setting them at “total weight equal to the number of inactives for that dataset” [p25 C].

Regarding instant claims 69 and 88, instant application recites:
wherein training the dfNN comprises training the dfNN for one or more epochs.

The prior art to Ramsundar teaches “the networks used in Figure 3 and Figure 4 were trained with leaning rate .003 for 500 epochs plus a constant 3 million steps. The constant factor was introduced after we observed that the smaller multitask networks required more epochs than the larger networks to stabilize” [p25 C].

Regarding instant claims 70 and 89, instant application recites:
wherein each epoch comprises training the deep featurizer neural network on the one or more datasets of molecules and associated molecule property values.

The prior art to Ramsundar teaches the networks [FIGs 3-4] were trained with learning rate .003 for 500 epochs plus a constant 3 million steps, and epochs were analyzed for smaller multitask networks (oMLM) and larger networks (dfNN) [p25 C]. for drug candidate [Abstract] virtual screening [p2 Col 1] with multiple chemical datasets [Appendix A. and p2 (3.1: “models were trained on datasets gathered from publicly available data…datasets were divided into four groups: PCBA, MUV, DUD-E, and Tox21”)]. Dataset for PubChem Assays/PCBA, includes dose-response features (drug candidate with associated molecule property values from assayed chemical biology [Appendix A]), MUV, DUD-E, and Tox21”(toxicity)] (training data that comprises one or more datasets of one or more molecules and associated molecule property values).

Regarding instant claim 72, instant application recites:
wherein each dataset of the one or more datasets has labels for a different characteristic of inputs of the dataset.

The prior art to Ramsundar teaches MTNN is a nonlinear classifier that performs repeated linear and nonlinear transformations on its input (deep featurizer/dfNN/oMLM model) and in the final layer fed into a simple linear classifier, to predict the probability that the input x0 has label j, where M is the number of possible labels (here M=2) [p2: 3.4 formula]. Overall, 259 datasets were utilized [Table A1], covering a wide range of target classes and assay types, including in MUV, Tox21, PBCA [p11A: in PBCA compounds had labels such as “Active,” “Inconclusive”].

Regarding instant claim 73, instant application recites:
validating the dfNN and the set of orthogonal models.

The prior art to Ramsundar teaches validation with various schemes and metrics based on training/testing splits, including K-fold cross-validation and ROC/AUC [p3: 3.3].

Regarding instant claim 74, instant application recites:
computing an out of bag score for the orthogonal model.

The prior art to Ramsundar teaches estimation of machine learning generalization with held in/held out randomly selected portions of MUV, Tox21, PBCA datasets (out of bag score) by computing the average AUC for the tasks/models [p4: 4.2, 4.4, and p5 Col1, FIG 3, Table A.3-A.4: Held-out datasets]. Table 2 compares the AUCs for the various linear ensemble machine learning methods, e.g. logistic regression, random forest/RF (orthogonal models).

Regarding instant claim 75, instant application recites:
wherein validating the orthogonal model comprises: (a) training the dfNN on a master data set comprising a training data set and a validation data set; (b) training the orthogonal model on the training data set; and (c) computing a validation score for the orthogonal models based on the validation data set.

The prior art to Ramsundar teaches estimation of machine learning generalization with held in/held out randomly selected portions of MUV, Tox21, PBCA datasets (out of bag score) by computing the average AUC for the tasks/models [p4: 4.2, 4.4, and p5 Col1, FIG 3, Table A.3-A.4: Held-out datasets]. Table 2 compares the AUCs for the various linear ensemble machine learning methods, e.g. logistic regression, random forest/RF (orthogonal models) compared with NN (single-task NNs and multitask MTNNs). Further, Ramsundar teaches validation with various schemes and metrics based on training/testing splits, including K-fold cross-validation and ROC/AUC [p3: 3.3].

Regarding instant claim 77, instant application recites:
wherein the orthogonal model comprises at least one of: a random forest, a support vector machine, XGBoost, linear regression, nearest neighbor, naive bayes, decision trees, or k-means clustering.

The prior art to Ramsundar Table 2 compares the AUCs for the various linear ensemble machine learning methods (orthogonal models), e.g. logistic regression, random forest/RF utilized with the same datasets, PBCA, MUV, Tox21, compared with those of the neural networks/NN (single-task NNs and multitask MTNNs) [p5].

Regarding instant claim 78, instant application recites:
compositing the dfNN and the orthogonal model as a composite model to classify a new set of inputs.

The prior art to Ramsundar FIG 1 discloses a composited model incorporating a MTNN with a linear regression softmax model [p2: 3.1: “after L such transformations, the final layer of the network xL is then fed to a simple linear classifier, such as the softmax (orthogonal model/linear regression), which predicts the probability that the input x0 has label j”].

Regarding instant claim 79, instant application recites:
wherein the trained orthogonal model predicts a property of a drug candidate.

The prior art to Ramsundar discloses multitask neural networks/MTNN (deep featurizer/dfNN/oMLM) for drug candidate [Abstract] virtual screening [p2 Col 1] with multiple chemical datasets [Appendix A. and p2 (3.1: “models were trained on datasets gathered from publicly available data…datasets were divided into four groups: PCBA, MUV, DUD-E, and Tox21”)] ((a) obtaining training data …one or more datasets of molecules and associated molecule property values …). Ramsundar discloses multitask neural networks/MTNN (deep featurizer/dfNN/oMLM) for drug candidate [Abstract] virtual screening [p2 Col 1] with multiple chemical datasets [Appendix A. and p2 (3.1: “models were trained on datasets gathered from publicly available data…datasets were divided into four groups: PubChem Assays/PCBA (drug candidate properties from assayed chemical biology, e.g. dose-response features [Appendix A]), MUV, DUD-E, and Tox21”(toxicity)] (property of the drug candidate comprises at least one of the group consisting of absorption, distribution, metabolism, elimination, toxicity, solubility…).

Regarding instant claim 80, instant application recites:
wherein the property of the drug candidate comprises at least one of the group consisting of absorption, distribution, metabolism, elimination, toxicity, solubility, metabolic stability, in vivo endpoints, ex vivo endpoints, molecular weight, potency, lipophilicity, hydrogen bonding, permeability, selectivity, pKa, clearance, half-life, volume of distribution, plasma concentration, and stability.

The prior art to Ramsundar discloses multitask neural networks/MTNN (dfNN/oMLM) for drug candidate [Abstract] virtual screening [p2 Col 1] with multiple chemical datasets [Appendix A. and p2 (3.1: “models were trained on datasets gathered from publicly available data…datasets were divided into four groups: PubChem Assays/PCBA (drug candidate properties from assayed chemical biology, e.g. dose-response features [Appendix A]), MUV, DUD-E, and Tox21”(toxicity)] (property of the drug candidate comprises at least one of the group consisting of absorption, distribution, metabolism, elimination, toxicity, solubility…).

Regarding instant claim 81, instant application recites:
wherein the drug candidate is a ligand molecule.

The prior art to Ramsundar features target molecules including enzyme proteins [FIG A.3] from multiple chemical datasets [Appendix A. and p2 (3.1: “models were trained on datasets gathered from publicly available data…datasets were divided into four groups: PCBA, MUV, DUD-E, and Tox21”)].

Regarding instant claim 82, instant application recites:
wherein the ligand molecule targets a protein.

The prior art to Ramsundar discloses target molecules including enzyme proteins [FIG A.3] from multiple chemical datasets.

Regarding instant claim 84, instant application recites:
using the trained orthogonal model to identify a drug candidate.

The prior art to Ramsundar discloses target molecules including enzyme proteins [FIG A.3] from multiple chemical datasets [Appendix A. and p2: 3.1]. Overall, 259 datasets were utilized [Table A1], covering a wide range of target classes (feature sets), and assay types, in MUV, Tox21, PBCA databases [p11A: in PBCA compounds had labels such as “Active,” “Inactives” “Inconclusive”] derived from NN and non-NN analyses, with evidence of multitask application improvements in AUC across different target classes (feature sets) [FIG. 7 and FIG A.3] and failed featurization data sets [Table A.2].
Response to Remarks: 102/103

The Applicant’s assertions (12/30/2024 Remarks p.5) in view of instant application amendment have been fully considered. Applicant asserts “the claims have been amended, and Applicant respectfully submits that the cited references do not teach or suggest the combination of features of the claims as amended.” This is not persuasive as no arguments were presented on the merits of the rejection, as discussed above. Any newly applied rejection/portion is necessitated by instant application amendment.

Conclusion
No claims are allowed.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

E-mail Communications Authorization
Per updated USPTO Internet usage policies, Applicant and/or applicant’s representative is encouraged to authorize the USPTO examiner to discuss any subject matter concerning the above application via Internet e-mail communications. See MPEP 502.03. To approve such communications, Applicant must provide written authorization for e-mail communication by submitting following form via EFS-Web or Central Fax (571-273-8300): PTO/SB/439. Applicant is encouraged to do so as early in prosecution as possible, so as to facilitate communication during examination.
Written authorizations submitted to the Examiner via e-mail are NOT proper. Written authorizations must be submitted via EFS-Web or Central Fax (571-273-8300). A paper copy of e-mail correspondence will be placed in the patent application when appropriate. E-mails from the USPTO are for the sole use of the intended recipient, and may contain information subject to the confidentiality requirement set forth in 35 USC § 122. See also MPEP 502.03.

Inquiries
Papers related to this application may be submitted to Technical Center 1600 by facsimile transmission. Papers should be faxed to Technical Center 1600 via the PTO Fax Center. The faxing of such papers must conform to the notices published in the Official Gazette, 1096 OG 30 (November 15, 1988), 1156 OG 61 (November 16, 1993), and 1157 OG 94 (December 28, 1993) (See 37 CFR § 1.6(d)). The Central Fax Center Number is (571) 273-8300.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Vy Rossi, whose telephone number is (703) 756-4649. The examiner can normally be reached on Monday-Friday from 8:30AM to 5:30PM ET.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Olivia Wise can be reached on (571) 272-2249. Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to (571) 272-0547.
Patent applicants with problems or questions regarding electronic images that can be viewed in the Patent Application Information Retrieval system (PAIR) can now contact the USPTO’s Patent Electronic Business Center (Patent EBC) for assistance. Representatives are available to answer your questions daily from 6 am to midnight (EST). The toll free number is (866) 217-9197. When calling please have your application serial or patent number, the type of document you are having an image problem with, the number of pages and the specific nature of the problem. The Patent Electronic Business Center will notify applicants of the resolution of the problem within 5-7 business days. Applicants can also check PAIR to confirm that the problem has been corrected. The USPTO’s Patent Electronic Business Center is a complete service center supporting all patent business on the Internet. The USPTO’s PAIR system provides Internet-based access to patent application status and history information. It also enables applicants to view the scanned images of their own application file folder(s) as well as general patent information available to the public.

/VR/
Examiner
Art Unit 1685

/OLIVIA M. WISE/Supervisory Patent Examiner, Art Unit 1685