Patent Application 16523391 - CLASSIFICATION IN HIERARCHICAL PREDICTION DOMAINS - Rejection
Appearance
Patent Application 16523391 - CLASSIFICATION IN HIERARCHICAL PREDICTION DOMAINS
Title: CLASSIFICATION IN HIERARCHICAL PREDICTION DOMAINS
Application Information
- Invention Title: CLASSIFICATION IN HIERARCHICAL PREDICTION DOMAINS
- Application Number: 16523391
- Submission Date: 2025-04-10T00:00:00.000Z
- Effective Filing Date: 2019-07-26T00:00:00.000Z
- Filing Date: 2019-07-26T00:00:00.000Z
- National Class: 706
- National Sub-Class: 012000
- Examiner Employee Number: 95804
- Art Unit: 2122
- Tech Center: 2100
Rejection Summary
- 102 Rejections: 0
- 103 Rejections: 3
Cited Patents
No patents were cited in this rejection.
Office Action Text
DETAILED ACTION This action is in response to the claims filed 12/11/2024 for Application number 16/523,391. Claims 1, 8, 15, and 21 has been amended. Thus, claims 1-19 and 21 are currently pending. Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Information Disclosure Statement The information disclosure statements (IDSs) submitted on 10/10/2024 and 02/21/2025 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-19 and 21 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Regarding claim 1, Step 1 Analysis: Claim 1 is directed to a process, which falls within one of the four statutory categories. Step 2A Prong 1 Analysis: Claim 1 recites, in part, The limitations of: (i) the one or more structurally hierarchical predictions are based at least in part on a hierarchical prediction domain, comprising a plurality of nodes and defining one or more hierarchical relationships between each of the plurality of nodes can be considered to be an evaluation in the human mind, a first node of the plurality of nodes is associated with a first hierarchical position within the plurality of nodes can be considered to be a mental process with the aid of pen and paper, a second node of the plurality of nodes is associated with a second hierarchical position within the plurality of nodes can be considered to be a mental process with the aid of pen and paper, the first hierarchical position is associated with a higher semantic detail relative to the second hierarchical position can be considered to be a mental process with the aid of pen and paper, a first structurally hierarchical prediction of the one or more structurally hierarchical predictions is generated for the first node before the second node can be considered to be a mental process with the aid of pen and paper, and generating a predictive output that identifies a subset of hierarchical nodes from the plurality of nodes can be considered to be an evaluation in the human mind generating a hierarchically-expanded training data object set [for the online machine learning model and from the single user selection input] by (i) assigning a first training predictive label to the prediction input that identifies the selected node to generate a first training data object of the hierarchically-expanded training data object set (i.e. merely assigning a node with a label) can be considered to be a mental process with the aid of pen and paper and (ii) assigning a second training predictive label to the prediction input that identifies an unselected parent node of the selected node to generate a second training data object of the hierarchically-expanded training data object set (i.e. merely assigning each node with a label) can be considered to be a mental process with the aid of pen and paper These limitations as drafted, are processes that, under broadest reasonable interpretation, covers the performance of the limitation in the mind or using pen and paper which falls within the âMental Processesâ grouping of abstract ideas. Additionally, the limitation of wherein the first training data object is stored in the sparse vector by converting the first training data object to a numeric value and applying the hashing mechanism to the numeric value to determine an encoding location in the sparse vector can be considered to be a mathematical concept. Accordingly, the claim recites an abstract idea. Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. Additionally, the claim recites the â âinputting⊠a prediction input to an online machine learning to receiveâŠâ and âone or more processorsâ. Thus, the elements in the claim are recited at a high level of generality such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim further recites: receiving a single user selection input corresponding to a selected node from the subset of hierarchical nodes; storing, using a hashing mechanism, the first training data object and the second training data object in a sparse vector; training, using a modified reward function and the sparse vector, one or more parameters of the online machine learning model based at least in part on the single user selection input, wherein the modified reward function is configured to reward the single user selection input corresponding to the selected node and not penalize a lack of selection of the unselected parent node. These limitations are insignificant extra-solution activities. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim as a whole is directed to an abstract idea. Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing an online machine learning model and the one or more processors amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Furthermore, the limitation of receiving a single user selection input corresponding to a selected node from the subset of hierarchical nodes is well-understood, routine, and conventional as evidenced by the instant specification (âFor example, online learning algorithms are typically utilized to generate recommendations for a user (e.g., promotional recommendations for a user), where the user reaction to the recommendation is in turn utilized to update a ML modelâŠâ [¶0046]). Also the limitation of training, using a modified reward function and the sparse vector, one or more parameters of the online machine learning model based at least in part on the single user selection input, wherein the modified reward function is configured to reward the single user selection input corresponding to the selected node and not penalize a lack of selection of the unselected parent node. These limitations are insignificant extra-solution activities is well-understood, routine, and conventional as evidenced by the instant specification (âIn some online learning algorithms, a positive user reaction (e.g., a selection of a link corresponding to a recommendation) is used to change model parameters in a manner that increases a likelihood of future generation of the recommendation and decreases a likelihood of future generation of other recommendations, while a negative user reaction (e.g., lack of selection of a link corresponding to a recommendation) is used to change model parameters in a manner that decreases a likelihood of future generation of the recommendation and increases a likelihood of future generation of other recommendations. âŠâ [¶0046, See also [¶0136]]). Finally, the limitation of storing, using a hashing mechanism, the first training data object and the second training data object in a sparse vector is well-understood, routine, and conventional as evidenced by MPEP §2106.05(d)(II)(iv), âStoring and retrieving information in memoryâ. These limitations therefore remain insignificant extra-solution activities even upon reconsideration, and does not amount to significantly more. Even when considered in combination, these additional elements amount to mere instructions to apply the exception using generic computer components and insignificant extra-solution activity, which cannot provide an inventive concept. The claim is not patent eligible. Regarding Claims 8 and 15, they recite features similar to claim 1 and is rejected for at least the same reasons therein. They additionally recite the additional elements of âone or more processorsâ and ânon-transitory computer storage mediumâ however these elements are recited at a high-level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. Regarding dependent claims 2, 9, and 16, the rejection of their respective similar independent claims is incorporated, and further, the claims recite: generating, [using a co-occurrence analysis machine learning model], one or more structurally non-hierarchical predictions for the prediction input, (i) the co-occurrence analysis machine learning model is configured to perform a [co-occurrence machine learning analysis] associated with the hierarchical prediction domain based at least in part on the prediction input to generate the one or more structurally non- hierarchically predictions, ii) the co-occurrence analysis machine learning model is associated with a predictive co-occurrence score between a feature-node pair of (a) a predictive feature set of one or more predictive feature sets and (b) a particular node of the plurality of nodes (iii) a structurally non-hierarchical prediction of the one or more structurally non- hierarchical predictions is generated without regard to a hierarchical position of the particular node and generating the predictive output based at least in part on the one or more structurally non- hierarchical predictions and the one or more structurally hierarchical predictions. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception. The claim does recite the additional element of âco-occurrence analysis machine learning modelâ, however it does not amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 1 above. The claim is not patent eligible. Regarding dependent claims 3, 10, and 17, the rejection of their respective similar independent claims is incorporated, and further, the claims recite: receiving one or more structurally non-hierarchically predictions, wherein a structurally non-hierarchical prediction of the one or more structurally non-hierarchical predictions is generated without regard to a hierarchical predictive position of a particular node generating, using a structured fusion machine learning model, one or more structure-based predictions, based at least in part on the one or more structurally hierarchical predictions and the one or more structurally non-hierarchical predictions generating the predictive output based at least in part on the one or more structure-based predictions. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception. The claim does recite the additional element of âstructured fusion machine learning modelâ, however it does not amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 1 above. The claim is not patent eligible. Regarding dependent claims 4, 11, and 18, the rejection of their respective similar independent claims is incorporated, and further, the claims recite: the prediction input comprise one or more unstructured prediction inputs generating the predictive output further comprises: generating one or more non-structure-based predictions based at least in part on the one or more unstructured prediction inputs; generating, [using an unstructured fusion machine learning model], one or more unstructured-fused predictions, wherein: (i) [the unstructured fusion machine learning model is configured to perform an unstructured fusion machine learning analysis] based at least in part on the one or more structure-based predictions and the one or more non-structure- based predictions and (ii) [the unstructured fusion machine learning analysis comprises retraining the online machine learning model] based at least in part on the one or more non-structure-based predictions; and generating the predictive output based at least in part on the one or more unstructured-fused predictions. This claim recites additional mental steps in addition to the judicial exception identified in the rejection of claim 1, thus recites a judicial exception. The claim does recite the additional element of âunstructured fusion machine learning model and retraining the online machine learning model (i.e. analogous to training using a modified reward function)â, however it does not amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception, for the reasons set forth in connection with the rejection of claim 1 above. The claim is not patent eligible. Regarding dependent claims 5, 12, and 19, the rejection of their respective similar independent claims is incorporated, and further, the claims recite wherein: the prediction input comprise one or more medical feature inputs for a patient profile; the predictive output comprises at least one human phenotype ontology label prediction for the patient profile; and the one or more hierarchical relationships comprise one or more human phenotype ontology dependency relationships. This limitation amounts to indicating a field of use or technological environment in which the judicial exception is performed. Please see MPEP 2106.05(h) The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Regarding dependent claims 6 and 13, the rejection of their respective similar independent claims is incorporated, and further, the claims recite wherein the online machine learning model is a follow-the-regularized leader machine learning model This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above. The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Regarding claim 7, the rejection of claim 1 is further incorporated, and further, the claim recites: the at least one of the plurality of nodes comprises a threshold number of selected prediction nodes and the threshold number of nodes is determined based at least in part on one or more online machine learning parameters of the online machine learning model. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above. The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Regarding claim 21, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein the reward function is configured to reward the selection of the at least one selected node and eliminate a bias term configured to penalize a lack of selection of the unselected parent node. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above. The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows: 1. Determining the scope and contents of the prior art. 2. Ascertaining the differences between the prior art and the claims at issue. 3. Resolving the level of ordinary skill in the pertinent art. 4. Considering objective evidence present in the application indicating obviousness or nonobviousness. Claims 1-3, 6-10, 13, 15-17, and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Notaro et al. ("Prediction of Human Phenotype Ontology Terms by Means Of Hierarchical Ensemble Methods, cited by Applicant in the IDS filed 10/14/2021, hereinafter "Notaro") in view of McMahan et al. ("A Survey of Algorithms and Analysis for Adaptive Online Learning", cited by Applicant in the IDS filed 04/25/2023, hereinafter "McMahan") in view of Wu et al. ("Returning is Believing: Optimizing Long-term User Engagement in Recommender Systems", hereinafter "Wu") and further in view of Dirac et al. ("US 10970629 B1", hereinafter "Dirac"). Regarding claim 1, Notaro teaches A computer-implemented method comprising: inputting, by one or more processors (See pg. 13, right col, bottom para: âThe empirical computational time of hierarchical ensemble methods is significantly lower than that of state of-the-art joint kernel structured output methods⊠using an Intel Xeon CPU E5-2630 2.60GHzâ), a prediction input to [an online machine learning model] to receive one or more structurally hierarchical predictions based at least in part on one or more prediction inputs (âMore precisely, the HTD-DAG algorithm modifies through a unique run across the nodes of the graph the flat scores according to the hierarchy of a DAG.â [pg. 4, right col, ¶2; See further: Fig 3, pg. 6 states: âThe bottom-up step introduces a correction of the flat scores by taking into account the scores of the children of each node. This procedure is methodically repeated from the bottom to the top nodes of the DAGâ]), wherein: (i) the one or more structurally hierarchical predictions are based at least in part on a hierarchical prediction domain (âWe present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPOâ [Abstract; See pg. 2, left col, bottom para: âTo properly handle the hierarchical relationships between terms that characterize the HPO, we can apply two main classes of structured output methods, i.e. methods able to exploit in the learning process the hierarchical structure of terms]) comprising a plurality of nodes and defining one or more hierarchical relationships between each of the plurality of nodes (See Fig. 1 and Fig. 4 shows connections between plurality of nodes.), (ii) a first node of the plurality of nodes is associated with a first hierarchical position within the plurality of nodes (See Fig. 1 and Fig. 3, â PNG media_image1.png 281 691 media_image1.png Greyscale â As evidenced by Fig. 1, it shows the âhierarchical positions of a plurality of nodesâ), (iii) a second node of the plurality of nodes is associated with a second hierarchical position within the plurality of nodes (See Fig. 1 and Fig. 3, and further evidenced by: âThe maximum path length from the root is used to define the node levels.â where the node level is the hierarchical position]), (iv) the first hierarchical position is associated with a higher semantic detail relative to the second hierarchical position (âThe proposed True Path Rule for DAG (TPR-DAG) adopts this bottom-up flow of information, to take into account the predictions of the most specific HPO terms, but also the opposite flow from top to bottom to consider the predictions of the least specific terms. Figure 3 provides a pictorial toy example of the operating mode of the TPR-DAG algorithmâ [pg. 5, right col, ¶3; more specific HPO terms corresponds to âhigher semantic detail.â]), and (v) a first structurally hierarchical prediction of the one or more structurally hierarchical predictions is generated for the first node before the second node (âBy considering the opposite flow of information âfrom bottom to topâ, we can construct the prediction of the ensemble by recursively propagating the predictions provided by the most specific nodes toward their parents and ancestors.â [pg. 5, right col, ¶2]); and generating, by the one or more processors, based at least in part on the one or more structurally hierarchical predictions, a predictive output, that identifies a subset of hierarchical nodes from the plurality of nodes (â PNG media_image2.png 75 370 media_image2.png Greyscale â, the predictive output Ć· is based on the inputs with a plurality of nodes shown in Fig. 4]) generating a hierarchically-expanded training data object set [for the online machine learning model and from the single user selection input] by (i) assigning a first training predictive label to the prediction input that identifies the selected node to generate a first training data object of the hierarchically-expanded training data object set and (ii) assigning a second training predictive label to the prediction output that identifies an unselected parent node of the selected node to generate a second training data object of the hierarchically-expanded training data object set (â PNG media_image3.png 394 802 media_image3.png Greyscale â [pg. 8, Fig. 5, Notaro explicitly teaches assigning a first training predictive label to the prediction input that identifies the selected node to generate a first training data object of the hierarchically-expanded training data object as shown in Fig. 5 (See Phenotypic abnormality -> abnormality of the immune system -> abnormality of immune system physiology) Additionally Notaro also teaches (ii) assigning a second training predictive label to the prediction output that identifies an unselected parent node of the selected node to generate a second training data object of the hierarchically-expanded training data object set as shown in Fig. 5 (Glomerulonephritis (prediction output) -> abnormality of the glomerulus -> ⊠abnormality of the genitourinary system (Abnormality of the immune system would be considered an unselected parent node) The examiner is interpreting these limitations in light of Fig. 7-9 and ¶0134 of the instant specification thus it appears the disclosure from Notaroâs Fig. 5 would correspond to these limitations as noted above. However Notaro fails to explicitly teach using an online machine learning model McMahan teaches using an online machine learning model (âWe present tools for the analysis of Follow-The-Regularized-Leader (FTRL), Dual Averaging, and Mirror Descent algorithms when the regularizer (equivalently, proxfunction or learning rate schedule) is chosen adaptively based on the data. Adaptivity can be used to prove regret bounds that hold on every round, and also allows for data-dependent regret bounds as in AdaGrad-style algorithms (e.g., Online Gradient Descent with adaptive per-coordinate learning rates). [Abstract; FTRL model corresponds to an online machine learning model]) Notaro and McMahan are both in the same field of endeavor of machine learning and thus are analogous. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Notaroâs teachings in order to implement the online machine learning model as taught by McMahan. One would have been motivated to make this modification in order to use a machine learning model that can perform multiple prediction and learning tasks. [pg. 2, ¶1-2, McMahan] However Notaro/McMahan fails to explicitly teach receiving a user selection input corresponding to a selected node from the subset of hierarchical nodes training, using a modified reward function and [the sparse vector], one or more parameters of the online machine learning model based at least in part on the single user selection input, wherein the modified reward function is configured to reward the single user selection input corresponding to the selected node and not penalize a lack of selection of the unselected parent node. Wu teaches receiving a single user selection input corresponding to a selected node from the one or more nodes (âIn this problem, the agentâs goal is to maximize the cumulative reward it receives from users in a given period of time by making recommendations. Because a âbadâ recommendation might cause a user to return less often, or leave the system, the agent needs to balance the immediate reward of a recommendation (i.e., clicks) and expected reward from usersâ future interactions as a result of user return. Moreover, as the influence of a recommendation on a userâs click and return decisions is unknown to the agent beforehand, the agent needs to maintain estimates of them on the flyâ [pg. 1928, Methodology, user clicking a recommendation corresponds to receiving a single user selection input corresponding to a selected node.]) training, using a modified reward function (pg. 1928, left col, ¶3, âThis choice of reward functions provides us a closed form assessment of model estimation confidence, which enables an efficient exploration strategy for our online model learning based on the Upper Confidence Bound principleâ, Wuâs reward function improves the algorithm estimation quality by calculating rewards over time, thus corresponds to a modified reward function) and [the sparse vector], one or more parameters of the online machine learning model based at least in part on the single user selection input (âIn a modern recommender system, recommendation candidates are usually described by a set of contextual features. And utilizing such contextual features for payoff estimation has been proved to be effective [11, 19]. In this work, we parameterize the estimation of click and return probabilities via generalized linear modelâ [¶1930, top left para]), wherein the modified reward function is configured to reward the single user selection input corresponding to the selected node (âWe formalize the optimization of usersâ long-term engagement as a sequential decision making problem, in which an agent maximizes the reward collected from a set of users in a given period of time by making recommendations. In every round of interaction, the agent faces the risk of losing a user because of making a âbadâ recommendation, as the userâs click and return depend on the recommendation; but the dependency is unknown to the agent apriori⊠Specifically, we consider user click as immediate reward to a recommendationâ [pg. 1928, top left para]) and not penalize a lack of selection of the unselected parent node (âTo optimize usersâ long-term engagement, a good recommender system should maximize the total number of clicks from a population of users in a given period of time. If a recommendation drives a user to leave the system early when alternative exists, regret will accumulate linearly over such users and time. To reduce the expected regret of a recommendation decision, one has to not only predict its influence on a userâs immediate click, but also to project it onto future clicks if the recommendation would attract the user to return. This makes the recommendation decisions dependent over time.â [¶pg. 1927, §1 Introduction, ¶3; a user leaving the system early would mean that there would be âone or more unselected nodesâ]) It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Notaroâs/McMahanâs teachings by allowing a user to select a predictive output (i.e. a recommendation) and train a model using a reward function as taught by Wu. One would have been motivated to make this modification in order to capture user interaction towards recommendations and optimize a userâs long-term engagement. [pg. ,1936, § 5. Conclusion, Wu] However Notaro/McMahan/Wu fails to explicitly teach storing, using a hashing mechanism, the first training data object and the second training data object in a sparse vector, wherein the first training data object is stored in the sparse vector by converting the first training data object to a numeric value and applying the hashing mechanism to the numeric value to determine an encoding location in the sparse vector Dirac teaches storing, using a hashing mechanism, the first training data object and the second training data object in a sparse vector, wherein the first training data object is stored in the sparse vector by converting the first training data object to a numeric value (âBriefly, a training data input vector 210 may be encoded (corresponds to âstoringâ) into an encoded training data input vector 214 that is M dimensional using a plurality of k hash functions (corresponds to âhashing mechanismâ), such as the hash functions shown in equation [2] above.â [col 9, lines 33-37; See further: âAdvantageously, by encoding training data input vectors and reference data output vectors into encoded training data input vectors and encoded reference data output vectors, the sparsity of the vectors used in training a machine learning model decreases.â [col 13, lines 35-39)]) and applying the hashing mechanism to the numeric value to determine an encoding location in the sparse vector (âThe position of each of the at most seven elements may be determined using one of the seven hash functions.â [col 9, lines 42-43]) It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Notaroâs/McMahanâs/Wuâs teachings, specifically, storing Notaroâs hierarchically training data into a sparse vector as taught by Dirac. One would have been motivated to make this modification as the sparsity of vectors used in training a machine learning model decreases which consequently improves the accuracy of the output of the machine learning model. [col 13, lines 35-40, Dirac] Regarding claim 2, Notaro/McMahan/Wu/Dirac teaches The computer-implemented method of claim 1, where Notaro teaches wherein generating the predictive output further comprises: generating, using a co-occurrence analysis machine learning model, one or more structurally non-hierarchical predictions for the prediction input (âTo this end any supervised or semi-supervised base predictor can be used, including also flat binary classifiers. Indeed both learners able to provide a probability or a score related to the likelihood that a gene is annotated with a HPO termâ [pg. 3, right col, ¶4; note: The base predictors of Notaro including the DAG is considered to be a machine learning model; Furthermore âco-occurrence analysisâ corresponds to âa flat predictor provides a score yËi â[ 0, 1] that represents the likelihood that a given gene belongs to a given node/HPO termâ. The prediction input corresponds to: âfor a given example x â X, where X is a suitable input space for the predictor f) [pg. 2, ¶3]]), wherein: (i) the co-occurrence analysis machine learning model is configured to perform a co-occurrence machine learning analysis associated with the hierarchical prediction domain based at least in part on the prediction input to generate the one or more structurally non- hierarchically predictions (âfor a given example x â X, where X is a suitable input space for the predictor fâ[pg. 3, left col, bottom para; This is used to generate structurally non-hierarchical predictions. The machine learning analysis is taught by the citation above: âTo this end any supervised or semi-supervised base predictor can be usedâŠâ)]),, (ii) the co-occurrence analysis machine learning model is associated with a predictive co-occurrence score between a feature-node pair of (a) a predictive feature set of one or more predictive feature sets and (b) a particular node of the plurality of nodes (âIn other words a flat predictor provides a score yËi â[ 0, 1] that represents the likelihood that a given gene belongs to a given node/HPO term i â V of the DAG G, and yË =< yË1, yË2, ... , yË|V| >. The flat classifier is composed of a plurality of âper-class predictors f1, f2,⊠f|v| that are trained [right col, ¶4] Therefore, either Ć·i or the score y - i, as shown in Fig. 2 and Fig. 4 is a co-occurrence score between each respective pair, wherein each pair includes a predictive feature set x, and a âper-classâ predictor that constitutes a prediction node of a plurality of nodes.) and (iii) a structurally non-hierarchical prediction of the one or more structurally non- hierarchical predictions is generated without regard to a hierarchical position of the particular node (âFlat learning of the terms of the ontology: each base classifier learns a specific and individual class (HPO term) resulting in a set of dichotomic classification problems which are independent of each other.â [pg. 2, left col, item 1; âIndependentâ corresponds to without regard of hierarchical predictive position]); and generating the predictive output based at least in part on the one or more structurally non- hierarchical predictions and the one or more structurally hierarchical predictions (â PNG media_image2.png 75 370 media_image2.png Greyscale â, The predictive output y - I is based on the components y - 1, y - 2âŠwhich are based on the input original flat inputs.]) Regarding claim 3, Notaro/McMahan/Wu/Dirac teaches The computer-implemented method of claim 1, where Notaro teaches wherein generating the predictive output further comprises: receiving one or more structurally non-hierarchical predictions, wherein a structurally non-hierarchical prediction of the one or more structurally non-hierarchical predictions is generated without regard to a hierarchical predictive position of a particular node (âFlat learning of the terms of the ontology: each base classifier learns a specific and individual class (HPO term) resulting in a set of dichotomic classification problems which are independent of each other.â [pg. 2, left col, item 1; âIndependentâ corresponds to without regard of hierarchical predictive position]); generating, using a structured fusion machine learning model, one or more structure-based predictions, based at least in part on the one or more structurally hierarchical predictions and the one or more structurally non-hierarchical predictions (âNotaro, Fig 4, line 10 teaches that y - i and y - j thus teaching that the overall model (Algorithm disclosed in Fig. 4) in which the intermediate results y - i and y - j are fused. Note: The specification does not clearly define what a âstructured fusionâ model is, therefore the examiner interprets âfusionâ as a combination/utilization of two components within a machine learning model. Therefore, y - i and y - j is a combination of y - i and generating the predictive output based at least in part on the one or more structure-based predictions. (This limitation is met by the calculation of y - I values based on y - i and y - j, See Fig. 4]) Regarding claim 6, Notaro/McMahan/Wu/Dirac teaches The computer-implemented method of claim 1, where McMahan teaches wherein the online machine learning model is a follow-the-regularized leader machine learning model. (âWe present tools for the analysis of Follow-The-Regularized-Leader (FTRL), Dual Averaging, and Mirror Descent algorithms when the regularizer (equivalently, proxfunction or learning rate schedule) is chosen adaptively based on the data. Adaptivity can be used to prove regret bounds that hold on every round, and also allows for data-dependent regret bounds as in AdaGrad-style algorithms (e.g., Online Gradient Descent with adaptive per-coordinate learning rates).â [Abstract]) Same motivation to combine the teachings of Notaro, McMahan, Wu and Dirac as claim 1. Regarding claim 7, Notaro/McMahan/Wu/Dirac teaches The computer-implemented method of claim 1 where Notaro teaches wherein: the at least one of the plurality of nodes comprises a threshold number of selected prediction nodes (âFor instance using the maximum, we could likely improve the sensitivity, but with a likely decrement of the precision. Different strategies to select the âpositiveâ children Ïi can be applied, according to the usage of a specific threshold to separate positive from negative examples: 1. Constant Threshold (T) strategy. For each node the same threshold t - is a priori selectedâ [pg. 6, left col, ¶1]); and the threshold number of nodes is determined based at least in part on one or more online machine learning parameters of the online machine learning model. (pg. 11 bottom right col â pg. 12 top right col teaches âtuning of the learning parametersâ thus would be used to determine the threshold number of nodes.â See also Fig. 4) Claim 8 recites features similar to claim 1 and is rejected for at least the same reasons therein. Claim 8 additionally requires A computing system comprising at least one or more processors and at least one non-transitory memory comprising program code, wherein the at least one non-transitory memory and the program code are configured to, with the at least one or more processors, cause the computing system to perform operations configured to (Notaro, See pg. 13, right col, bottom para: âThe empirical computational time of hierarchical ensemble methods is significantly lower than that of stateof-the-art joint kernel structured output methods⊠using an Intel Xeon CPU E5-2630 2.60GHzâ) Regarding claims 9-10, and 13, they are substantially similar to claims 1-2 and 6 respectively, and are rejected in the same manner, the same art, and reasoning applying. Regarding claim 15, it is substantially similar to claims 1 and 8 respectively, and is rejected in the same manner, the same art, and reasoning applying. Regarding claims 16-17, they are substantially similar to claims 1-2 and 9-10 respectively, and are rejected in the same manner, the same art, and reasoning applying. Regarding claim 21, Notaro/McMahan/Wu/Dirac teaches The computer-implemented method of claim 1, where Wu teaches wherein the modified reward function is configured to eliminate a bias term configured to penalize the lack of selection of the unselected parent node (âTo optimize usersâ long-term engagement, a good recommender system should maximize the total number of clicks from a population of users in a given period of time. If a recommendation drives a user to leave the system early when alternative exists, regret will accumulate linearly over such users and time. To reduce the expected regret of a recommendation decision, one has to not only predict its influence on a userâs immediate click, but also to project it onto future clicks if the recommendation would attract the user to return. This makes the recommendation decisions dependent over time.â [¶pg. 1927, §1 Introduction, ¶3; reducing regret of a recommendation decision corresponds to eliminating a bias term]) Same motivation to combine the teachings of Notaro, McMahan, Wu and Dirac as claim 1. Claims 4, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Notaro in view of McMahan, Wu and Dirac and further in view of Scheurwegs et al. ("Data integration of structured and unstructured sources for assigning clinical codes to patient stays" cited by Applicant in the IDS filed 03/22/2023, hereinafter "Scheurwegs") and further in view of Pisani et al. ("Adaptive Biometric Systems Using Ensembles", cited in the IDS filed 03/22/2023, hereinafter "Pisani"). Regarding claim 4, Notaro/McMahan/Wu/Dirac teaches The computer-implemented method of claim 1, where Notaro further teaches generating one or more structure-based predictions based at least in part on the one or more structurally hierarchical predictions ([In Notaro, generating the y - 1, y - 2, etc. corresponds to generating a structurally hierarchical prediction because the algorithms in Fig. 2 and Fig. 4 rely on the flat predictions]); however fails to explicitly teaches where teaches wherein: the prediction input comprise one or more unstructured prediction inputs; and generating the predictive output further comprises: generating one or more non-structure-based predictions based at least in part on the one or more unstructured prediction inputs; generating, using an unstructured fusion machine learning model, one or more unstructured-fused predictions, wherein: (i) the unstructured fusion machine learning model is configured to perform an unstructured fusion machine learning analysis based at least in part on the one or more structure-based predictions and the one or more non-structure- based predictions and (ii) the unstructured fusion machine learning analysis comprises retraining the online machine learning model based at least in part on the one or more non-structure-based predictions; generating the predictive output based at least in part on the one or more unstructured-fused predictions. Scheurwegs teaches the prediction input comprise one or more unstructured prediction inputs ([pg. e12, âUnstructured data sourcesâ lists various type of unstructured prediction inputs] Scheurwegs further teaches in a context analogous to âgenerating the predictive outputâ [see abstract: âcombines these predictions with a meta-learner]); and generating the predictive output further comprises: generating one or more non-structure-based predictions based at least in part on the one or more unstructured prediction inputs (âpg. e14, Data Integration: âLate data integration (Figure 4B) is an ensemble method in which the prediction results from separate models, trained on each distinct source, are used as input for a second (meta-) classifier or by means of composite methods such as voting, weighing, stacking, or averaging. We opted for training a meta-classifier that takes the predictions and class probabilities of the individual models as input for classification within the same fold. This second classifier is a Bayesian network, structured learning is performed with hill climbing. This approach proved to perform consistently better than Random Forests and Naive Bayes (results not shown).â As shown in Fig. 4B a model is used for the âletterâ unstructured prediction input to generate a respective prediction]); generating, using an unstructured fusion machine learning model, one or more unstructured-fused predictions (The use of the meta-classifier to generate the prediction corresponds to âusing an unstructured fusion machine learning modelâ), wherein: (i) the unstructured fusion machine learning model is configured to perform an unstructured fusion machine learning analysis based at least in part on the one or more structure-based predictions and the one or more non-structure-based predictions (The output of the meta-classifier is analogous to a predictive output, As such, this limitation is taught by the combination of reference set forth below) and generating the predictive output based at least in part on the one or more unstructured-fused predictions. (The use of the meta-classifier to generate the prediction corresponds to âgenerating the predictive outputâ based on unstructured-fused predictions) Notaro, McMahan, Wu. Dirac and Scheurwegs are all in the same field of endeavor of machine learning. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Notaroâs/McMahanâs/Wuâs/Diracâs teachings by implement structured/unstructured based prediction features as taught by Scheurwegs. One would have been motivated to make this modification in order to integrate heterogenous data types to improve prediction strength. [Scheurwegs, Abstract] The combination of references thus far does not teach in particular âretraining the online machine learning modelâ Pisani teaches (ii) the unstructured fusion machine learning analysis comprises retraining the online machine learning model based at least in part on the one or more non-structure-based predictions (âpg. 22, bottom para: âThe base classifiers are adaptive and therefore change over time, it is reasonable to also adapt the meta classifier to account for these changes. To implement the proposed approach, we proposed to retrain the classifier using a gallery, or set of samplesâ); Notaro, McMahan, Wu, Dirac, Scheurwegs and Pisani are all in the same field of endeavor of machine learning. It would have been obvious to a person of ordinary skill in the art before the effective filing date to modify Notaroâs/McMahanâs/Wuâs/Diracâs/Scheurwegsâ teachings to implement retraining of the online machine learning model as taught by Pisani. One would have been motivated to make this modification in order to adapt the model to changes in other models. [pg. 22, bottom, Pisani] Regarding claims 11 and 18, they are substantially similar to claim 4 respectively, and are rejected in the same manner, the same art, and reasoning applying. Claims 5, 12, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Notaro in view of McMahan, Wu and Dirac and further in view of Wang et al. ("US 20210375407 A1", cited by Applicant in the IDS filed 03/22/2023, hereinafter "Wang"). Regarding claim 5, Notaro/McMahan/Wu/Dirac teaches The computer-implemented method of claim 1, where Notaro teaches wherein: the prediction input comprise one or more medical feature inputs⊠(As noted in the rejection of claim 1, the input x is a âgiven exampleâ such as a âgiven geneâ [pg. 3, left col, bottom para] which corresponds to a medical feature input); the predictive output comprises at least one human phenotype ontology label predictionâŠ(âIn Notaro, the predictive output Ć· comprises a set of scores that represents the likelihood that a given gene belongs to respective nodes); and the one or more hierarchical relationships comprise one or more human phenotype ontology dependency relationships (Abstract: âWe present two hierarchical ensemble methods that we formally prove to provide biologically consistent predictions according to the hierarchical structure of the HPO.â Note: âHPOâ refers to human phenotype ontologyâ which is an ontology of dependency relationships. See further pg. 1, right col: âThe Human Phenotype Ontology (HPO) project provides a standard categorization of the human abnormal phenotypes and of their semantic relationshipsâ See also pg. 2, left col, bottom para: âTo properly handle the hierarchical relationships between termsâ). However Notaro/McMahan/Wu/Dirac fails to explicitly teach the predictions are âfor a patient profileâ Wang teaches the above limitation: (âdiagnostic genomic predictions based on electronic health record dataâ (see title) involving machine learning models (see [¶0047]). Wang is in the same field of endeavor as the claimed invention, and is also pertinent to healthcare of machine learning models. In particular Wang teaches input data âfor a patient profileâ [¶0008: âaccessing electronic health record data for a patientâ, [¶0064]: âThe procedure 300 includes accessing 310 electronic health record data for a patient.â As shown in FIG. 3, the end results is âidentifying 350 based on the normalized phenotype terms one or more candidate genes [¶0068]), which is based on, and thus âforâ the patient profile]â) It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Notaro/McMahan/Wu/Dirac in order to implement the medical feature inputs and human phenotype ontology label prediction to be âfor a patient profileâ as taught by Wang. One would have been motivated to make this modification in order to perform predictions based on electronic health record data. [Wang, ¶0035] Regarding claims 12 and 19, they are substantially similar to claim 5 respectively, and are rejected in the same manner, the same art, and reasoning applying. Response to Arguments Applicant's arguments filed 12/11/2024 have been fully considered but they are not persuasive. Regarding the 35 U.S.C. §101 Rejection: In response to applicantâs arguments regarding Step 2A â Prong One of the 101 Rejection Applicant asserts the claim does not recite abstract ideas and the claims are directed to a training technique for a machine learning model which would not be considered to be a mental process. Examiner respectfully disagrees. The claims as currently recited are not directed towards any specific training of a machine learning model. Rather the claims recite several steps such as: generating a predictive outputâŠ(i.e. coming up with a prediction) and generating a hierarchically expanded data object set by assigning a first/second predictive label (merely assigning each node with a label) can be considered be evaluations in the human mind. Thus, the claim does recite an abstract idea. In response to the comparison between the instant claims and example 39, the only training step recited in the claims is in the last limitation however under BRI, this limitation does not provide any specific details of the actual training process of the machine learning model. Instead, the limitation broadly uses a modified reward function to train (also could be interpreted as updating/modifying) one or more parameters of the machine learning model. The claim limitation fails to clearly and explicitly reflect any specific training of the machine learning model rather it is merely just modifying/changing the parameters of the model. Therefore, the examiner asserts the claims do recite an abstract idea and are directed to a judicial exception. In response to applicantâs arguments regarding Step 2A â Prong Two of the 101 Rejection Applicant asserts the steps of âreceiving a user selectionâ, and trainingâŠthe online machine learning model based on the user selectionâ and a âmodified reward functionâ are the central focus of the claim thus are not insignificant extra-solution activities. Examiner respectfully disagrees. The claim does not explicitly and clearly tie these specific limitations into any practical application of the judicial exception. These limitations do not reflect any improvement in the functioning of a computer or hardware processor nor any specific training process of a machine learning model. As noted in the 101 rejection, these steps are considered to be insignificant extra-solution activity (See MPEP 2106.05(g)) and further analyzed under Step 2B to be well-understood, routine, and conventional steps. Applicant further asserts that the claims recite limitations similar to Hannun, No. 2018-003323, pgs. 10-11 and thus recite features that are designed to achieve an improved technological result and provide improvements to that technical field. Examiner respectfully disagrees. The claims as currently recited are directed towards an improvement to an abstract idea (i.e. improving hierarchical predictions). Improvements to an abstract idea are still considered to be abstract ideas. Applicant further asserts the claim is directed to a specific improvement to online machine learning models. Examiner respectfully disagrees. The claimed online machine learning is merely used in the claim to receive an input, use a modified reward function as part of the updating/modification of the parameters of the model, and output a prediction. There is nothing to suggest that the training of the machine learning model is improved in any way rather the online machine learning is just nominally claimed. Applicant further asserts the instant specification as filed at ¶[0046-0050], [0125], [0136], [0138], [0155], [0159], and [0227] relates to the reliability of the predictive analysis using online learning and address the efficiency and reliability challenges related to utilizing online learning algorithms. Although the specification does mention several technological improvements, these technological improvements are not clearly reflected in the claims as currently recited. As noted above, the claims are directed to an improvement to an abstract idea (i.e. improving hierarchical predictions). Improvements to an abstract idea are still considered to be abstract ideas. In response to applicantâs arguments regarding Step 2B of the 101 Rejection Applicant appears to assert that the added elements cannot be considered to be well-understood, routine, or known within the industry because they do not appear to be taught by the references of record. Examiner respectfully disagrees with this assertion. As discussed above, the additional elements are generic computer components merely used to apply the judicial exception. Furthermore, the specification appears to include evidence to support that these limitations are well-understood, routine, and conventional steps. (See (âFor example, online learning algorithms are typically utilized to generate recommendations for a user (e.g., promotional recommendations for a user), where the user reaction to the recommendation is in turn utilized to update a ML modelâŠâ [¶0046] and ¶0136, âconventional FTRL ML modelsâ). Please see the updated 101 rejection above. Regarding the 35 U.S.C. §103 Rejection: First, applicant argues Notaro fails to teach expanding the training data by generating two training data objects from a single user selection input. Examiner respectfully disagrees. As noted in the 103 rejection above, the combination of Notaro/Wu explicitly teaches this concept. The disclosure from Notaroâs Fig. 5 appears to teach generating a first training data object and a second training data object. As noted in the rejection, these limitations are interpreted in light of Fig. 7-9 and ¶0134 of the instant specification. As noted from ¶0134 of specification, âa first hierarchically-expanded training data object includes a feature A and the following prediction labels:âŠâ Notaro explicitly teaches this concept as shown in Fig. 5. Thus, applicantâs arguments are not persuasive. Second, applicant argues Wu does not address hierarchical data is silent on a subset of hierarchical nodes as recited by claim 1. The prior art of Wu is relied upon to teach a âsingle user selectionâ as a âuser clickâ on a recommendation would be considered a single user selection. This would fall in the scope and BRI of the limitation as recommendation systems could be used in a hierarchical prediction domain as disclosed in ¶0120 of the specification. (i.e. clicking on a link corresponding to a recommendation (corresponds to a selected node) would lead to future generation of the recommendation (corresponds to subset of hierarchical nodes)). Furthermore, applicant argues Wu does not teach a modified reward function for hierarchical-expanded training data. Examiner respectfully disagrees. The claim does not explicitly require the use of a modified reward function is for hierarchical-expanded training data rather the claim merely uses the modified reward function as part of âtrainingâ of the machine learning model. Wuâs disclosure explicitly teaches the use of a modified reward function (i.e. calculating rewards over time) in the context of training (see pg. 1928, left col, ¶3) In response to applicantâs argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art. See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007). In this case, the motivation to combine the teachings of the prior art can be found in the prior art of Wu. As noted above, obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion or motivation to do so found in the reference themselves. Thus, the examiner asserts it would have been obvious to one of ordinary skill in the art to modify Notaroâs/McMahanâs teachings to allow a user to select a predictive output (i.e. a recommendation) and train a machine learning model using a reward function as taught by Wu. Applicant further asserts Notaro and Wu are not analogous because they are in completely different fields of machine learning which are traditionally incompatible. Examiner respectfully disagrees. There is no evidence to suggest hierarchical machine learning and online learning are incompatible. ¶0048 of the specification merely says the present invention addresses efficiency and reliability challenges related to utilizing online learning algorithms to generate predictions related to hierarchical prediction domains. If anything, the specification further supports that online learning algorithms and hierarchical predictions are related fields and ¶0048 merely discloses challenges to address efficiency and reliability. Third, applicant argues Dzhulgakov is completely silent to a hashing mechanism in general however this argument is moot because the newly applied prior art of Dirac teaches this hashing mechanism in addition to the newly amended features which further narrow the application of a hashing mechanism. Please see the updated 103 rejection above. Applicantâs arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims. Conclusion Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examinerâs supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /M.H.H./Examiner, Art Unit 2122 /KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122