Patent Application 17550551 - BOOSTING CLASSIFICATION AND REGRESSION TREE

Title: BOOSTING CLASSIFICATION AND REGRESSION TREE PERFORMANCE WITH DIMENSION REDUCTION

Application Information

Invention Title: BOOSTING CLASSIFICATION AND REGRESSION TREE PERFORMANCE WITH DIMENSION REDUCTION
Application Number: 17550551
Submission Date: 2025-05-13T00:00:00.000Z
Effective Filing Date: 2021-12-14T00:00:00.000Z
Filing Date: 2021-12-14T00:00:00.000Z
National Class: 706
National Sub-Class: 045000
Examiner Employee Number: 100924
Art Unit: 2125
Tech Center: 2100

Rejection Summary

102 Rejections: 1
103 Rejections: 5

Cited Patents

No patents were cited in this rejection.

Office Action Text

DETAILED ACTION
Notice of Pre-AIA or AIA Status
1. The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1 – 20 are pending and have been examined.
Information Disclosure Statement
2. The information disclosure statement (IDS) submitted on 12/14/2021 is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
3. The drawings are objected to because the label referred to as "storage device 18" in paragraph [0048] as described in the specification for figure 6, which has the label “STORAGE SYSTEM”. Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
4. The disclosure is objected to because of the following informalities:
Paragraph [0020]: "where there can be multi-modal type of data" – the word type should be plural types
Paragraph [0029]: “308, 310, 312, 314. 316” – the period should be a comma for clarity
Paragraph [0031]: “a dimesion reduction for a solver can be a funciton of” – the typographical errors for the words ‘dimension’, and ‘function’ should be corrected.
Paragraph [0048]: “storage device 18” – is labelled as “STORAGE SYSTEM” in the drawings on figure 6.
Appropriate correction is required.

Claim Objections
5. Claims 6, 10, 12, 18 and 19 are objected to because of the following informalities:
Claims 6 and 18 recite: “an orthogonlaity regularizer” – the correct spelling is orthogonality.
Claim 10 recites: “method of claim 1, wherien the” – the correct spelling is wherein.
Claim 12 recites: “a model accuracy performance neasurement includes” – the correct spelling is measurement.
Independent Claim 19 recites: “wherein the decision includes routing nodes and leaf nodes” – which is interpreted to be “wherein the decision tree includes routing nodes and leaf nodes”.
Claim 20 which depends directly on claim 19 is also objected to for the reason stated above for claim 19.
. Appropriate correction is required.

Claim Rejections - 35 USC § 112
6. The following is a quotation of 35 U.S.C. 112(b):
(b) CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

7. The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

8. Claim 8 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA 35 U.S.C. 112, the applicant), regards as the invention.
Claim 8 recites the limitation "wherein the regularizer" in “The computer-implemented method of claim 1, wherein the regularizer includes a single routing regularizer.” There is insufficient antecedent basis for this limitation in the claim.
Specifically, claim 8 depends directly from independent claim 1, not from claim 5, and said “regularizer” was introduced in claim 5 and is not stated in independent claim 1.
Accordingly, claim 8 is rejected.

9. The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

10. The following is a quotation of pre-AIA 35 U.S.C. 112, fourth paragraph:
Subject to the following paragraph [i.e., the fifth paragraph of pre-AIA 35 U.S.C. 112], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.

11. Claim 11 rejected under 35 U.S.C. 112(d) or pre-AIA 35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends. Claim 11 does not further limit claim 1, because the limitation: "wherein the nodes of the decision tree include at least routing nodes and leaf nodes, wherein the dimension reduction is performed with optimization at each of the routing nodes and leaf nodes" of claim 11, is essentially the same as the recited "wherein the decision tree includes routing nodes and leaf nodes;" and "and dimension reduction at every node of the decision tree simultaneously by an optimization algorithm" in Claim 1. Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Claim Rejections - 35 USC § 101
12. 35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

13. Claims 1-20 are rejected under 35 U.S.C. 101 because the claims are directed to an abstract idea without significantly more.

Regarding Independent Claims 1, 13 and 19:
Step 1: Claim 1 is directed to a method (process), claim 13 is directed to a computer program product comprising a computer readable storage medium, corresponding to an article of manufacture, and claim 19 is directed to a system comprising: a processor; and a memory device coupled with the processor, corresponding to a machine, which are each one of the statutory categories.
Step 2A, Prong 1: The following limitations are directed to the abstract idea of a mental process [see MPEP 2106.04(a)(2) III. C.]. In particular, the claims recite mental processes that are concepts performed in the human mind or with pen and paper (including an observation, evaluation, judgement, or opinion).
The claims recite using respective similar language:
constructing a decision tree in machine learning, comprising:
initializing the decision tree by constructing a root node
and training a root solver with the training set;
growing the decision tree by iteratively splitting nodes of the decision tree,
and performing training for routing functions at the routing nodes, solvers at the leaf nodes, and dimension reduction at every node of the decision tree simultaneously by an optimization algorithm
Step 2A, Prong 2: There are no additional elements in these claims that integrate the judicial exception into a practical application.
The following additional elements are directed to insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(g)].
The claims recite using respective similar language:
receiving a training set;
for routing to another node of the decision tree,
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein at a node of the decision tree, dimension reduction is performed on features of data of the training set received at the node, (Mathematical concept- dimension reduction involves mathematical calculations- see MPEP § 2106.04(a)))
wherein the dimension reduction and the split is performed together at the node, (Mathematical concept- dimension reduction involves mathematical calculations- see MPEP § 2106.04(a)))
and the data having reduced dimension is split based on a routing function
wherein the decision tree includes routing nodes and leaf nodes;
Step 2B: There are no additional elements in these claims that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein at a node of the decision tree, dimension reduction is performed on features of data of the training set received at the node, (Mathematical concept- dimension reduction involves mathematical calculations- see MPEP § 2106.04(a)))
wherein the dimension reduction and the split is performed together at the node, (Mathematical concept- dimension reduction involves mathematical calculations- see MPEP § 2106.04(a)))
and the data having reduced dimension is split based on a routing function
wherein the decision tree includes routing nodes and leaf nodes;
The following additional element is directed to receiving or transmitting data over a network. The courts (as per Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362) have recognized receiving or transmitting data over a network as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(d) II.] and therefore fails to amount to significantly more than the judicial exception.
The claims recite using respective similar language:
receiving a training set;
for routing to another node of the decision tree,

Regarding Claims 2 and 14:
Step 1: The claims are directed to a method and an article of manufacture, respectively.
Step 2A, Prong 1: The claims recite the same abstract ideas as in claim 1 and 13 respectively.
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional elements are directed to insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(g)].
The claims recite using respective similar language:
further including: receiving a predetermined topology for the decision tree;
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
and wherein the nodes are iteratively split until the predetermined topology is obtained
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element is directed to receiving or transmitting data over a network. The courts (as per Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362) have recognized receiving or transmitting data over a network as well-understood, routine, and conventional functions when they are claimed in a merely generic manner (e.g., at a high level of generality) or as insignificant extra-solution activity to the judicial exception [see MPEP 2106.05(d) II.] and therefore fails to amount to significantly more than the judicial exception.
The claims recite using respective similar language:
further including: receiving a predetermined topology for the decision tree;
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
and wherein the nodes are iteratively split until the predetermined topology is obtained

Regarding Claim 3 and 15:
Step 1: The claims are directed to a method and an article of manufacture, respectively.
Step 2A, Prong 1: The claims recite the same abstract ideas as in claim 1 and 13 respectively.
The following limitation remains directed to the abstract idea of a mathematical concept
[see MPEP 2106.04(a)(2) C.]. In particular, the claim recites mathematical calculations that are
mathematical operations (such as multiplication) or an act of calculating using mathematical
methods to determine a variable or number, e.g., performing an arithmetic operation.
The claims recite using respective similar language:
wherein the leaf nodes of the decision tree include the solvers that return a predicted target value
The term solver implies using a mathematical concept to produce an output, see MPEP § 2106.04(a).
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein the leaf nodes of the decision tree include the solvers that return a predicted target value
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein the leaf nodes of the decision tree include the solvers that return a predicted target value

Regarding Claim 4 and 16:
Step 1: The claims are directed to a method and an article of manufacture, respectively.
Step 2A, Prong 1: The claim recites the same abstract ideas as in claim 1 and 13 respectively.
The following limitation remains directed to the abstract idea of a mathematical concept
[see MPEP 2106.04(a)(2) C.]. In particular, the claim recites mathematical calculations that are
mathematical operations (such as multiplication) or an act of calculating using mathematical
methods to determine a variable or number, e.g., performing an arithmetic operation.
The claims recite using respective similar language:
wherein the leaf nodes of the decision tree include a regression model that returns a predicted target value
The term “regression model that returns a predicted target value”, implies using a mathematical concept to produce an output, see MPEP § 2106.04(a).
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein the leaf nodes of the decision tree include a regression model that returns a predicted target value
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein the leaf nodes of the decision tree include a regression model that returns a predicted target value

Regarding Claim 5 and 17:
Step 1: The claims are directed to a method and an article of manufacture, respectively.
Step 2A, Prong 1: The claim recites the same abstract ideas as in claim 1 and 13 respectively.
The following limitation remains directed to the abstract idea of a mathematical concept
[see MPEP 2106.04(a)(2) C.]. In particular, the claim recites mathematical calculations that are
mathematical operations (such as multiplication) or an act of calculating using mathematical
methods to determine a variable or number, e.g., performing an arithmetic operation.
The claims recite using respective similar language:
further including optimizing the decision tree using a regularizer
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
further including optimizing the decision tree using a regularizer
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
further including optimizing the decision tree using a regularizer

Regarding Claim 6 and 18:
Step 1: The claims are directed to a method and an article of manufacture respectively.
Step 2A, Prong 1: The claim recites the same abstract ideas as in claim 1 and 13 respectively.
The following limitation remains directed to the abstract idea of a mathematical concept
[see MPEP 2106.04(a)(2) C.]. In particular, the claim recites mathematical calculations that are
mathematical operations (such as multiplication) or an act of calculating using mathematical
methods to determine a variable or number, e.g., performing an arithmetic operation.
The claims recite using respective similar language:
wherein the regularizer includes an orthogonlaity regularizer1
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein the regularizer includes an orthogonlaity regularizer
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein the regularizer includes an orthogonlaity regularizer

Regarding Claim 7 and 18:
Step 1: The claims are directed to a method and an article of manufacture respectively.
Step 2A, Prong 1: The claims recite the same abstract ideas as in claim 1 and 13 respectively.
The following limitation remains directed to the abstract idea of a mathematical concept
[see MPEP 2106.04(a)(2) C.]. In particular, the claim recites mathematical calculations that are
mathematical operations (such as multiplication) or an act of calculating using mathematical
methods to determine a variable or number, e.g., performing an arithmetic operation.
The claims recite using respective similar language:
wherein the regularizer includes a diversification regularizer2

Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein the regularizer includes a diversification regularizer
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein the regularizer includes a diversification regularizer

Regarding Claim 8 and 18:
Step 1: The claims are directed to a method and an article of manufacture respectively.
Step 2A, Prong 1: The claim recites the same abstract ideas as in claim 1 and 13 respectively.
The following limitation remains directed to the abstract idea of a mathematical concept
[see MPEP 2106.04(a)(2) C.]. In particular, the claim recites mathematical calculations that are
mathematical operations (such as multiplication) or an act of calculating using mathematical
methods to determine a variable or number, e.g., performing an arithmetic operation.
The claims recite using respective similar language:
wherein the regularizer includes a single routing regularizer3
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein the regularizer includes a single routing regularizer (Mathematical concept- using a regularizer involves a mathematical formula or calculation- see MPEP § 2106.04(a)))
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
The claims recite using respective similar language:
wherein the regularizer includes a single routing regularizer

Regarding Claim 9:
Step 1: The claim is directed to a method.
Step 2A, Prong 1: The claim recites the same abstract ideas as in claim 1.
The following limitation remains directed to the abstract idea of a mathematical concept
[see MPEP 2106.04(a)(2) C.]. In particular, the claim recites mathematical calculations that are
mathematical operations (such as multiplication) or an act of calculating using mathematical
methods to determine a variable or number, e.g., performing an arithmetic operation.
The claims recite using respective similar language:
wherein the decision tree is trained to solve a regression problem
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
wherein the decision tree is trained to solve a regression problem
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
wherein the decision tree is trained to solve a regression problem

Regarding Claim 10:
Step 1: The claim is directed to a method.
Step 2A, Prong 1: The claim recites the same abstract ideas as in claim 1.
The following limitation remains directed to the abstract idea of a mathematical concept
[see MPEP 2106.04(a)(2) C.]. In particular, the claim recites mathematical calculations that are
mathematical operations (such as multiplication) or an act of calculating using mathematical
methods to determine a variable or number, e.g., performing an arithmetic operation.
The claims recite using respective similar language:
wherein the decision tree is trained to solve a classification problem
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
wherein the decision tree is trained to solve a classification problem
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
wherein the decision tree is trained to solve a classification problem

Regarding Claim 11:
Step 1: The claim is directed to a method.
Step 2A, Prong 1: The claim recites the same abstract ideas as in claim 1.
The following limitation remains directed to the abstract idea of a mathematical concept
[see MPEP 2106.04(a)(2) C.]. In particular, the claim recites mathematical calculations that are
mathematical operations (such as multiplication) or an act of calculating using mathematical
methods to determine a variable or number, e.g., performing an arithmetic operation.
The claims recite using respective similar language:
wherein the dimension reduction is performed with optimization at each of the routing nodes and leaf nodes
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
wherein the nodes of the decision tree include at least routing nodes and leaf nodes,
wherein the dimension reduction is performed with optimization at each of the routing nodes and leaf nodes
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
wherein the nodes of the decision tree include at least routing nodes and leaf nodes,
wherein the dimension reduction is performed with optimization at each of the routing nodes and leaf nodes

Regarding Claim 12:
Step 1: The claim is directed to a method.
Step 2A, Prong 1: The claim recites the same abstract ideas as in claim 1.
The following limitation remains directed to the abstract idea of a mathematical concept
[see MPEP 2106.04(a)(2) C.]. In particular, the claim recites mathematical calculations that are
mathematical operations (such as multiplication) or an act of calculating using mathematical
methods to determine a variable or number, e.g., performing an arithmetic operation.
The claims recite using respective similar language:
wherein the leaf nodes of the decision tree include a regression model that returns a predicted target value
Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
wherein the training set includes imbalanced datasets
and a model accuracy performance neasurement includes nonlinear metrics.
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
wherein the training set includes imbalanced datasets
and a model accuracy performance neasurement includes nonlinear metrics.

Regarding Claim 20:
Step 1: The claim is directed to a system.
Step 2A, Prong 1: The claim recites the same abstract ideas as in claim 19.

Step 2A, Prong 2: There are no additional elements in this claim that integrate the judicial exception into a practical application.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
wherein the leaf nodes of the decision tree include a regression model that returns a predicted target value
Step 2B: There are no additional elements in this claim that amount to significantly more than the judicial exception.
The following additional element does not meaningfully limit the judicial exception [see MPEP 2106.05(e)]. The claim simply recites additional information regarding the characteristics of the decision tree construction. Therefore, the additional element does not integrate the abstract ideas into a practical application.
wherein the leaf nodes of the decision tree include a regression model that returns a predicted target value

Therefore, claims 1-20 are deemed ineligible.

Claim Rejections - 35 USC § 103
14. In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
15. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

16. The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

17. Claim(s) 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Carreira-Perpiñán (US20220318641A1; hereinafter Perpinan) in view of Passera (US5909681A; hereinafter Passera).
Regarding Independent Claims 1, Perpinan teaches receiving a training set; (see, e.g., Perpinan paragraphs [0019] “Generally, the method comprises inputting an initial decision tree and a training set of instances” and [0218] “At step 606, a training set of instances is input [i.e. receiving training set as input]. The training set may be any conventional set of instances used to train decision trees, or may be less convention training sets, including but not limited to…”).
initializing the decision tree by constructing a root node (see, e.g., Perpinan paragraphs [0291-0293]: “Generally, we take the initial tree to be complete with random parameters4, of large enough depth…We do this by starting from the root, setting its parameters, and recursively setting its children's parameters…It may also be a previous TAO tree, as in warm-start initialization in a regularization path or in an inner-loop optimization”).
and training a root solver with the training set; (see, e.g., Perpinan paragraphs [0087] “The tree computes the output for an input x by following a path from the root to exactly one leaf and outputting the latter's prediction for x. Crucially, note that the decision functions are crisp (not stochastic): exactly one child is chosen for a given input x (unlike for soft decision trees, which define a distribution over the children).” and [0240] “TRAIN - DECISION ( f , R ) trains the decision function model f ( with parameters ) on the training set instances indexed in the set R , with corresponding output labels ( pseudolabels ” ) computed internally , and returns the model's parameters 0. This is used for the decision function models at the decision nodes” [i.e., the decision function model is trained with the training set instances, and are used by root nodes]).
grow the decision tree by iteratively splitting nodes of the decision tree, (see, e.g., Perpinan paragraph [0320] “…However, we can combine this with growing the tree structure itself. The basic idea is that we interleave growing the tree larger (by splitting each leaf) with optimizing (and pruning) the current tree with TAO. Growing the tree makes it…”).
wherein at a node of the decision tree dimension reduction is performed on features of data of the training set received at the node (see, e.g., Perpinan paragraphs [0326]: “The previous algorithm generates a sequence (collection) of trees of different structure, from the first one consisting of a single node to the last one, each optimized on the training set.” and [0385]: “A particularly useful and well-known case is that of sparsity regularizers [i.e., a known type of form of dimension reduction], which encourage the parameters of the node to contain few nonzero values (which in turn makes the model simpler and possibly more robust); it is also commonly used for feature selection. TAO can reuse existing sparse optimization algorithms to solve the optimization problem at a decision node or leaf…”).
and the data having reduced dimension is split based on optimizing a routing function for routing to another node of the decision tree (see, e.g., Perpinan paragraphs [0335]: “Hence, and importantly, the objective function that TAO optimizes at each split-optimize-prune step still has the form (eq. 4), i.e., a loss and a regularization term. This makes it possible to learn sparse weight vectors at the nodes, and to prune the tree as we grow it.” and [339]: “at step 1405, the leaf does not achieve the minimal loss, then at step 1406 the leaf is split, becomes a decision node) and random parameters are assigned to the children of split leaf. The method then moves to step 1407 and determines if all current leaves have been grown.”).
wherein the dimension reduction and the split is performed together at the node, (see, e.g., Perpinan paragraphs [0335]: “Hence, and importantly, the objective function that TAO optimizes at each split-optimize-prune step still has the form (eq. 4), i.e., a loss and a regularization term. This makes it possible to learn sparse weight vectors at the nodes, and to prune the tree as we grow it” and [0385]: “A particularly useful and well-known case is that of sparsity regularizers [i.e., a known type of form of dimension reduction], which encourage the parameters of the node to contain few nonzero values (which in turn makes the model simpler and possibly more robust); it is also commonly used for feature selection. TAO can reuse existing sparse optimization algorithms to solve the optimization problem at a decision node or leaf…”).
wherein the decision tree includes routing nodes and leaf nodes; (see, e.g., Perpinan paragraphs [0079] “For inference, the role of the decision nodes is to route the instance, a high-dimensional vector, to its corresponding predictor, which then outputs the prediction [i.e., decision nodes work as routing nodes]”, and [0090] “we first route x to a leaf i=λ(x; Θ) via the decision nodes, and then the leaf makes the actual prediction g.sub.i(x; θ.sub.i).”).
and perform training for the routing functions at the routing nodes, (see, e.g., Perpinan paragraphs [0019] “…TAO applies to many different types of loss functions, regularization terms and constraints, and types of models at both the decision nodes [i.e., routing nodes] and the leaves, and makes it possible to learn better decision trees than with traditional algorithms, and to learn trees for problems where traditional algorithms do not apply” and [0085] “The role of a decision node is to route an input x to one of the node's children. A decision node i∈
PNG
media_image1.png
42
33
media_image1.png
Greyscale
uses a decision function ƒi(x; θi):
PNG
media_image2.png
38
42
media_image2.png
Greyscale
→Ci, with parameters θi, to map an input x to one of its children” [i.e., the decision function works as a routing function at the node]”).
solvers at the leaf nodes, (see, e.g., Perpinan paragraph [0090] “This notation makes it clear that, in order to compute the prediction for an input x, we first route x to a leaf i=λ(x; Θ) via the decision nodes, and then the leaf makes the actual prediction g.sub.i(x; θ.sub.i). The routing function partitions the input space custom-character into regions, one per leaf, where a specific predictor operates. FIGS. 1A-1C show an example of a decision tree and the partition of the space it defines.” [i.e., predictions are computed at the leaf nodes, acting as solvers]).

However, Perpinan fails to explicitly teach the limitation and dimension reduction at every node of the decision tree simultaneously by an optimization algorithm.
In the same field, analogous art Passera teaches and dimension reduction at every node of the decision tree simultaneously by an optimization algorithm (see, e.g., Passera, col. 13 lines 20-35: “ For example, the task of training non-terminal nodes and using them to partition data for the training of leaf node neural networks should be parallelized if it will significantly increase the speed with which the tree can be built and trained. This would be the case if the number of non-terminal nodes becomes very large, or if the amount of computation associated with training each of them becomes large. For example, when the non-terminal nodes have hidden layers, as in FIG. 20, parallelization will tend to be more appropriate.”).
Perpinan and Passera are analogous art because they are each directed to Tree-organized classifiers (see, e.g., Perpinan, paragraph [0017] and Passera, column 2, lines 38-44).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Perpinan that use sparse optimization algorithms to solve optimization problems at a decision nodes or leaf nodes, with the teachings of Passera, encouraging the parallelization and partitioning of data and training non-terminal nodes for improved efficiency in decision tree model. Doing so would have allowed Perpinan to incorporate Passera’s combination of parallel training of nodes in order to "significantly increase the speed with which the tree can be built and trained” (see, e.g., Passera, col. 13 lines 20-35).

Regarding claim 2, as discussed above the combination of Perpinan and Passera teaches the method of claim 1. Perpinan further teaches further including: receiving a predetermined topology for the decision tree; (see, e.g., Perpinan paragraph [0084] “The graph of the decision tree is a directed rooted tree, which defines the tree structure or topology as shown in FIG. 2. The tree graph will often be binary (i.e., each decision node has two children) and complete (i.e., the leaves can only appear in the deepest level of the tree, hence a complete binary tree of depth Δ has 2.sup.Δ−1 decision nodes and 2.sup.Δ leaves), but not necessarily…We take this structure to be fixed and determined ahead of time.” [i.e., the topology is predetermined]).
and wherein the nodes are iteratively split until the predetermined topology is obtained (see, e.g., Perpinan paragraph [0325]: “We repeat the above iteration until either we reach a set number of iterations or the tree after the iteration has the same structure as before the iteration [i.e., a predetermined topology] (which means every split node was pruned back) [i.e., iteratively splitting the nodes] and its parameters (throughout the tree) changed less than a set tolerance.”)

Regarding claim 3, as discussed above the combination of Perpinan and Passera teaches the method of claim 1. Perpinan further teaches wherein the leaf nodes of the decision tree include the solvers that return a predicted target value (see, e.g., Perpinan paragraphs [0405]: “Since the tree prediction occurs at the leaves, then during training with TAO such constraints apply in the leaf optimization only. And, since the leaf predictor is typically a simple model, solving such constrained optimization is usually easy”, and [0433]: “which can be solved exactly in some cases by simply solving over z.sub.n for each leaf predictor model of the tree and picking the one having the lowest objective value (a “tree-leaves projection” algorithm) [i.e., the prediction is made by selecting the leaf node with the lowest objective value, which corresponds to the predicted target value]”).

Regarding claim 4, as discussed above the combination of Perpinan and Passera teaches the method of claim 1. Perpinan further teaches wherein the leaf nodes of the decision tree include a regression model that returns a predicted target value (see, e.g., Perpinan paragraphs [0088]: “Each leaf predictor can be a classifier (for classification trees), regressor (for regression trees), and so on depending on the machine learning task being solved.)” and [0405]: “… And, since the leaf predictor is typically a simple model, solving such constrained optimization is usually easy. For example, if the leaf predictor is a constant value, then the least-squares regression solution is the largest of zero and the average of the data.”).

Regarding claim 5, as discussed above the combination of Perpinan and Passera teaches the method of claim 1. Perpinan further teaches further including optimizing the decision tree using a regularizer (see, e.g., Perpinan paragraph [0385]: “A particularly useful and well-known case is that of sparsity regularizers, which encourage the parameters of the node to contain few nonzero values (which in turn makes the model simpler and possibly more robust); it is also commonly used for feature selection. A large amount of theoretical and computational results have been developed in the last decades for sparse learning, mostly with linear models. TAO can reuse existing sparse optimization algorithms to solve the optimization problem at a decision node or leaf”).

Regarding claim 8, as discussed above the combination of Perpinan and Passera teaches the method of claim 1. Perpinan further teaches wherein the regularizer includes as single routing regularizer5 (see, e.g., Perpinan paragraphs [0090]: “The routing function of the tree, λ(x; Θ):
PNG
media_image2.png
38
42
media_image2.png
Greyscale
→
PNG
media_image3.png
38
25
media_image3.png
Greyscale
, maps an input x to exactly one leaf of the tree, according to the decision functions. Strictly speaking, the parameters of λ are {θi
PNG
media_image4.png
33
46
media_image4.png
Greyscale
⊂Θ, i.e., the decision functions' parameters”, and [0333]: “Choice of regularization terms in eq. 4: again we have a choice, for each new node created. This can be done by deciding ahead of time the type of regularization term for a decision node and leaf and the value of the hyperparameter α, valid for every node throughout the tree growth” [i.e., the regularization term is used in decision nodes, can influence the routing function used at each decision node which routes input to exactly one leaf]).

Regarding claim 9, as discussed above the combination of Perpinan and Passera teaches the method of claim 1. Perpinan further teaches wherein the decision tree is trained to solve a regression problem (see, e.g., Perpinan paragraph [0415]: “] …The step over Θ involves only the terms on α and μ and satisfies the assumptions of section 4.3, where the “loss” is the μ term, additively separable over training instances (effectively, a least-squares regression problem on a training set {(x.sub.n, z.sub.n)}.sub.n=1.sup.N). Hence it can be solved with TAO” [i.e., training a regression model to minimize loss, such as with least-squares regression]).

Regarding claim 10, as discussed above the combination of Perpinan and Passera teaches the method of claim 1. Perpinan further teaches wherien the decision tree is trained to solve a classification problem (see, e.g., Perpinan paragraphs [0439-440]: “Not only does TAO produce much better trees in traditional scenarios (axis-aligned or oblique trees for classification and regression), but it makes it possible to apply trees to many new scenarios. These can involve: [0440] Different loss functions for classification, regression and beyond, such as semi-supervised learning, dimensionality reduction or structured inputs” [i.e., TAO enable training of decision trees through different loss functions involving classification learning tasks]).

Regarding claim 11, as discussed above the combination of Perpinan and Passera teaches the method of claim 1. Perpinan further teaches wherein the nodes of the decision tree include at least routing nodes and leaf nodes, (see, e.g., Perpinan paragraphs [0079]: “For inference, the role of the decision nodes is to route the instance, a high-dimensional vector, to its corresponding predictor, which then outputs the prediction” and [0084]: “A decision tree with parameters Θ={θ.sub.i: i∈N} (for the decision nodes and leaves) defines a tree predictive function T(x; Θ)”) [i.e., involving both routing nodes and nodes]).
Perpinan in further teaches wherein the dimension reduction is performed with optimization at each of the routing nodes and leaf nodes (see, e.g., Perpinan paragraphs [0209]: “… TAO applies to any type of loss and regularization functions [i.e., including dimension reduction referenced in paragraph [0385]], and to any type of model at each node (both in the decision nodes and leaves), as long as the corresponding subproblems can be solved; and to trees where a decision node may have more than two children and be complete or not”).

Regarding Independent Claim 13, Perpinan teaches a computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions readable by a device to cause the device to: (see, e.g., Perpinan paragraph [0555], “… If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a non-transitory computer-readable medium. Computer-readable media may include both computer storage media and nontransitory communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that can be accessed by a general purpose or special purpose computer. By way of example, and not limitation, such non-transitory computer readable media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that can be used to carry or store desired …”).
receive a training set; (see, e.g., Perpinan paragraphs [0019] “Generally, the method comprises inputting an initial decision tree and a training set of instances” and [0218] “At step 606, a training set of instances is input [i.e. receiving training set as input]. The training set may be any conventional set of instances used to train decision trees, or may be less convention training sets, including but not limited to…”).
initialize the decision tree by constructing a root node (see, e.g., Perpinan paragraphs [0291-0293]: “Generally, we take the initial tree to be complete with random parameters6, of large enough depth…We do this by starting from the root, setting its parameters, and recursively setting its children's parameters…It may also be a previous TAO tree, as in warm-start initialization in a regularization path or in an inner-loop optimization”).
and training a root solver with the training set; (see, e.g., Perpinan paragraphs [0087] “The tree computes the output for an input x by following a path from the root to exactly one leaf and outputting the latter's prediction for x. Crucially, note that the decision functions are crisp (not stochastic): exactly one child is chosen for a given input x (unlike for soft decision trees, which define a distribution over the children).” and [0240] “TRAIN - DECISION ( f , R ) trains the decision function model f ( with parameters ) on the training set instances indexed in the set R , with corresponding output labels ( pseudolabels ” ) computed internally , and returns the model's parameters 0. This is used for the decision function models at the decision nodes” [i.e., the decision function model is trained with the training set instances, and are used by root nodes]).
grow the decision tree by iteratively splitting nodes of the decision tree, (see, e.g., Perpinan paragraph [0320] “…However, we can combine this with growing the tree structure itself. The basic idea is that we interleave growing the tree larger (by splitting each leaf) with optimizing (and pruning) the current tree with TAO. Growing the tree makes it…”).
wherein at a node of the decision tree dimension reduction is performed on features of data of the training set received at the node, (see, e.g., Perpinan paragraphs [0326]: “The previous algorithm generates a sequence (collection) of trees of different structure, from the first one consisting of a single node to the last one, each optimized on the training set.” and [0385]: “A particularly useful and well-known case is that of sparsity regularizers [i.e., a known type of form of dimension reduction], which encourage the parameters of the node to contain few nonzero values (which in turn makes the model simpler and possibly more robust); it is also commonly used for feature selection. TAO can reuse existing sparse optimization algorithms to solve the optimization problem at a decision node or leaf…”).
and the data having reduced dimension is split based on optimizing a routing function, for routing to another node of the decision tree (see, e.g., Perpinan paragraphs [0335]: “Hence, and importantly, the objective function that TAO optimizes at each split-optimize-prune step still has the form (eq. 4), i.e., a loss and a regularization term. This makes it possible to learn sparse weight vectors at the nodes, and to prune the tree as we grow it.” and [339]: “at step 1405, the leaf does not achieve the minimal loss, then at step 1406 the leaf is split, becomes a decision node) and random parameters are assigned to the children of split leaf. The method then moves to step 1407 and determines if all current leaves have been grown.”).
wherein the dimension reduction and the split is performed together at the node, (see, e.g., Perpinan paragraphs [0335]: “Hence, and importantly, the objective function that TAO optimizes at each split-optimize-prune step still has the form (eq. 4), i.e., a loss and a regularization term. This makes it possible to learn sparse weight vectors at the nodes, and to prune the tree as we grow it” and [0385]: “A particularly useful and well-known case is that of sparsity regularizers [i.e., a known type of form of dimension reduction], which encourage the parameters of the node to contain few nonzero values (which in turn makes the model simpler and possibly more robust); it is also commonly used for feature selection. TAO can reuse existing sparse optimization algorithms to solve the optimization problem at a decision node or leaf…”).
wherein the decision tree includes routing nodes and leaf nodes; (see, e.g., Perpinan paragraphs [0079] “For inference, the role of the decision nodes is to route the instance, a high-dimensional vector, to its corresponding predictor, which then outputs the prediction [i.e., decision nodes work as routing nodes]”, and [0090] “we first route x to a leaf i=λ(x; Θ) via the decision nodes, and then the leaf makes the actual prediction g.sub.i(x; θ.sub.i).”).
and perform training for the routing functions at the routing nodes, (see, e.g., Perpinan paragraphs [0019] “…TAO applies to many different types of loss functions, regularization terms and constraints, and types of models at both the decision nodes [i.e., routing nodes] and the leaves, and makes it possible to learn better decision trees than with traditional algorithms, and to learn trees for problems where traditional algorithms do not apply” and [0085] “The role of a decision node is to route an input x to one of the node's children. A decision node i∈
PNG
media_image1.png
42
33
media_image1.png
Greyscale
uses a decision function ƒi(x; θi):
PNG
media_image2.png
38
42
media_image2.png
Greyscale
→Ci, with parameters θi, to map an input x to one of its children” [i.e., the decision function works as a routing function at the node]”).
solvers at the leaf nodes, (see, e.g., Perpinan paragraph [0090] “This notation makes it clear that, in order to compute the prediction for an input x, we first route x to a leaf i=λ(x; Θ) via the decision nodes, and then the leaf makes the actual prediction g.sub.i(x; θ.sub.i). The routing function partitions the input space custom-character into regions, one per leaf, where a specific predictor operates. FIGS. 1A-1C show an example of a decision tree and the partition of the space it defines.” [i.e., predictions are computed at the leaf nodes, acting as solvers]).

However, Perpinan fails to explicitly teach the limitation and dimension reduction at every node of the decision tree simultaneously by an optimization algorithm.
In the same field, analogous art Passera teaches and dimension reduction at every node of the decision tree simultaneously by an optimization algorithm (see, e.g., Passera, col. 13 lines 20-35: “ For example, the task of training non-terminal nodes and using them to partition data for the training of leaf node neural networks should be parallelized if it will significantly increase the speed with which the tree can be built and trained. This would be the case if the number of non-terminal nodes becomes very large, or if the amount of computation associated with training each of them becomes large. For example, when the non-terminal nodes have hidden layers, as in FIG. 20, parallelization will tend to be more appropriate.”).
Perpinan and Passera are analogous art because they are each directed to Tree-organized classifiers (see, e.g., Perpinan, paragraph [0017] and Passera, column 2, lines 38-44).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Perpinan that use of sparse optimization algorithms to solve optimization problems at a decision nodes or leaf nodes, with the teachings of Passera, encouraging the parallelization and partitioning of data and training non-terminal nodes for improved efficiency in decision tree model. Doing so would have allowed Perpinan to incorporate Passera’s combination of parallel training of nodes in order to "significantly increase the speed with which the tree can be built and trained” (see, e.g., Passera, col. 13 lines 20-35).

Regarding claim 14, as discussed above the combination of Perpinan and Passera teaches the method of claim 13. Perpinan further teaches wherein the nodes are iteratively split until a predetermined topology is obtained (see, e.g., Perpinan paragraph [0084] “The graph of the decision tree is a directed rooted tree, which defines the tree structure or topology as shown in FIG. 2. The tree graph will often be binary (i.e., each decision node has two children) and complete (i.e., the leaves can only appear in the deepest level of the tree, hence a complete binary tree of depth Δ has 2.sup.Δ−1 decision nodes and 2.sup.Δ leaves), but not necessarily…We take this structure to be fixed and determined ahead of time.” [i.e., the topology is predetermined] and [0325]: “We repeat the above iteration until either we reach a set number of iterations or the tree after the iteration has the same structure as before the iteration [i.e., a predetermined topology] (which means every split node was pruned back) [i.e., iteratively splitting the nodes] and its parameters (throughout the tree) changed less than a set tolerance.”).

Regarding claim 15 as discussed above the combination of Perpinan and Passera teaches the method of claim 13. Perpinan further teaches wherein the leaf nodes of the decision tree include the solvers that return a predicted target value (see, e.g., Perpinan paragraphs [0405]: “Since the tree prediction occurs at the leaves, then during training with TAO such constraints apply in the leaf optimization only. And, since the leaf predictor is typically a simple model, solving such constrained optimization is usually easy”, and [0433]: “which can be solved exactly in some cases by simply solving over z.sub.n for each leaf predictor model of the tree and picking the one having the lowest objective value (a “tree-leaves projection” algorithm) [i.e., the prediction is made by selecting the leaf node with the lowest objective value, which corresponds to the predicted target value]”).

Regarding claim 16, as discussed above the combination of Perpinan and Passera teaches the method of claim 13. Perpinan further teaches wherein the leaf nodes of the decision tree include a regression model that returns a predicted target value (see, e.g., Perpinan paragraphs [0088]: “Each leaf predictor can be a classifier (for classification trees), regressor (for regression trees), and so on depending on the machine learning task being solved.)” and [0405]: “… And, since the leaf predictor is typically a simple model, solving such constrained optimization is usually easy. For example, if the leaf predictor is a constant value, then the least-squares regression solution is the largest of zero and the average of the data.”).

Regarding claim 17, as discussed above the combination of Perpinan and Passera teaches the method of claim 13. Perpinan further teaches wherein the device is further caused to optimize the decision tree using a regularizer (see, e.g., Perpinan paragraph [0385]: “A particularly useful and well-known case is that of sparsity regularizers, which encourage the parameters of the node to contain few nonzero values (which in turn makes the model simpler and possibly more robust); it is also commonly used for feature selection. A large amount of theoretical and computational results have been developed in the last decades for sparse learning, mostly with linear models. TAO can reuse existing sparse optimization algorithms to solve the optimization problem at a decision node or leaf”).

Regarding Independent claim 19, Perpinan teaches a system comprising: a processor; and a memory device coupled with the processor (see, e.g., Perpinan paragraph [0554], “The steps of a method or algorithm described in connection with the disclosure herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. A distributed computing system may also be utilized.”).
receive a training set; (see, e.g., Perpinan paragraphs [0019] “Generally, the method comprises inputting an initial decision tree and a training set of instances” and [0218] “At step 606, a training set of instances is input [i.e. receiving training set as input]. The training set may be any conventional set of instances used to train decision trees, or may be less convention training sets, including but not limited to…”).
initialize the decision tree by constructing a root node (see, e.g., Perpinan paragraphs [0291-0293]: “Generally, we take the initial tree to be complete with random parameters7, of large enough depth…We do this by starting from the root, setting its parameters, and recursively setting its children's parameters…It may also be a previous TAO tree, as in warm-start initialization in a regularization path or in an inner-loop optimization”).
and training a root solver with the training set; (see, e.g., Perpinan paragraphs [0087] “The tree computes the output for an input x by following a path from the root to exactly one leaf and outputting the latter's prediction for x. Crucially, note that the decision functions are crisp (not stochastic): exactly one child is chosen for a given input x (unlike for soft decision trees, which define a distribution over the children).” and [0240] “TRAIN - DECISION ( f , R ) trains the decision function model f ( with parameters ) on the training set instances indexed in the set R , with corresponding output labels ( pseudolabels ” ) computed internally , and returns the model's parameters 0. This is used for the decision function models at the decision nodes” [i.e., the decision function model is trained with the training set instances, and are used by root nodes]).
grow the decision tree by iteratively splitting nodes of the decision tree, (see, e.g., Perpinan paragraph [0320] “…However, we can combine this with growing the tree structure itself. The basic idea is that we interleave growing the tree larger (by splitting each leaf) with optimizing (and pruning) the current tree with TAO. Growing the tree makes it…”).
wherein at a node of the decision tree dimension reduction is performed on features of data of the training set received at the node, (see, e.g., Perpinan paragraphs [0326]: “The previous algorithm generates a sequence (collection) of trees of different structure, from the first one consisting of a single node to the last one, each optimized on the training set.” and [0385]: “A particularly useful and well-known case is that of sparsity regularizers [i.e., a known type of form of dimension reduction], which encourage the parameters of the node to contain few nonzero values (which in turn makes the model simpler and possibly more robust); it is also commonly used for feature selection. TAO can reuse existing sparse optimization algorithms to solve the optimization problem at a decision node or leaf…”).
and the data having reduced dimension is split based on optimizing a routing function, for routing to another node of the decision tree (see, e.g., Perpinan paragraphs [0335]: “Hence, and importantly, the objective function that TAO optimizes at each split-optimize-prune step still has the form (eq. 4), i.e., a loss and a regularization term. This makes it possible to learn sparse weight vectors at the nodes, and to prune the tree as we grow it.” and [339]: “at step 1405, the leaf does not achieve the minimal loss, then at step 1406 the leaf is split, becomes a decision node) and random parameters are assigned to the children of split leaf. The method then moves to step 1407 and determines if all current leaves have been grown.”).
wherein the dimension reduction and the split is performed together at the node, (see, e.g., Perpinan paragraphs [0335]: “Hence, and importantly, the objective function that TAO optimizes at each split-optimize-prune step still has the form (eq. 4), i.e., a loss and a regularization term. This makes it possible to learn sparse weight vectors at the nodes, and to prune the tree as we grow it” and [0385]: “A particularly useful and well-known case is that of sparsity regularizers [i.e., a known type of form of dimension reduction], which encourage the parameters of the node to contain few nonzero values (which in turn makes the model simpler and possibly more robust); it is also commonly used for feature selection. TAO can reuse existing sparse optimization algorithms to solve the optimization problem at a decision node or leaf…”).
wherein the decision tree includes routing nodes and leaf nodes; (see, e.g., Perpinan paragraphs [0079] “For inference, the role of the decision nodes is to route the instance, a high-dimensional vector, to its corresponding predictor, which then outputs the prediction [i.e., decision nodes work as routing nodes]”, and [0090] “we first route x to a leaf i=λ(x; Θ) via the decision nodes, and then the leaf makes the actual prediction g.sub.i(x; θ.sub.i).”).
and perform training for the routing functions at the routing nodes, (see, e.g., Perpinan paragraphs [0019] “…TAO applies to many different types of loss functions, regularization terms and constraints, and types of models at both the decision nodes [i.e., routing nodes] and the leaves, and makes it possible to learn better decision trees than with traditional algorithms, and to learn trees for problems where traditional algorithms do not apply” and [0085] “The role of a decision node is to route an input x to one of the node's children. A decision node i∈
PNG
media_image1.png
42
33
media_image1.png
Greyscale
uses a decision function ƒi(x; θi):
PNG
media_image2.png
38
42
media_image2.png
Greyscale
→Ci, with parameters θi, to map an input x to one of its children” [i.e., the decision function works as a routing function at the node]”).
solvers at the leaf nodes, (see, e.g., Perpinan paragraph [0090] “This notation makes it clear that, in order to compute the prediction for an input x, we first route x to a leaf i=λ(x; Θ) via the decision nodes, and then the leaf makes the actual prediction g.sub.i(x; θ.sub.i). The routing function partitions the input space custom-character into regions, one per leaf, where a specific predictor operates. FIGS. 1A-1C show an example of a decision tree and the partition of the space it defines.” [i.e., predictions are computed at the leaf nodes, acting as solvers]).

However, Perpinan fails to explicitly teach the limitation and dimension reduction at every node of the decision tree simultaneously by an optimization algorithm.
In the same field, analogous art Passera teaches and dimension reduction at every node of the decision tree simultaneously by an optimization algorithm (see, e.g., Passera, col. 13 lines 20-35: “ For example, the task of training non-terminal nodes and using them to partition data for the training of leaf node neural networks should be parallelized if it will significantly increase the speed with which the tree can be built and trained. This would be the case if the number of non-terminal nodes becomes very large, or if the amount of computation associated with training each of them becomes large. For example, when the non-terminal nodes have hidden layers, as in FIG. 20, parallelization will tend to be more appropriate.”).
Perpinan and Passera are analogous art because they are each directed to Tree-organized classifiers (see, e.g., Perpinan, paragraph [0017] and Passera, column 2, lines 38-44).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have combined the teachings of Perpinan that use of sparse optimization algorithms to solve optimization problems at a decision nodes or leaf nodes, with the teachings of Passera, encouraging the parallelization and partitioning of data and training non-terminal nodes for improved efficiency in decision tree model. Doing so would have allowed Perpinan to incorporate Passera’s combination of parallel training of nodes in order to "significantly increase the speed with which the tree can be built and trained” (see, e.g., Passera, col. 13 lines 20-35).

Regarding claim 20, as discussed above the combination of Perpinan and Passera teaches the system (machine) of claim 19. Perpinan further teaches wherein the leaf nodes of the decision tree include a regression model that returns a predicted target value (see, e.g., Perpinan paragraph [0405]: “… Since the tree prediction occurs at the leaves, then during training with TAO such constraints apply in the leaf optimization only. And, since the leaf predictor is typically a simple model, solving such constrained optimization is usually easy. For example, if the leaf predictor is a constant value, then the least-squares regression solution is the largest of zero and the average of the data.” [i.e., the least-squares regression solution is a predicted target value provided by the regression model at the leaf node]).

18. Claim 6 is further rejected under 35 U.S.C. 103 as being unpatentable over the combination of Perpinan and Passera, and further in view of Zhou (US20120166366A1; hereinafter Zhou).

Regarding Claim 6, the combination of Perpinan and Passera teaches the limitations of claim 1, as discussed above. Said combination fails to explicitly teach wherein the regularizer includes an orthogonlaity regularizer.
Nevertheless, in the same field, analogous art Zhou teaches wherein the regularizer includes an orthogonlaity regularizer8 (see, e.g., Zhou, paragraph [0023] “The classifier at each node of the hierarchical structure may be encouraged to be as different as possible from the classifiers at the parent node. Mathematically, regularization may be used to force the classifying hyperplane at each node to be near-orthogonal9 to the hyperplane at the parent node. Under typical conditions on the regularization parameters, training such a hierarchical SVM may be a convex optimization problem”).
Perpinan, Passera and Zhou are analogous art because they are each directed Tree-organized classifiers (see, e.g., Perpinan, paragraph [0017], Passera column 2, lines 38-44, and Zhou paragraph [0004]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perpinan in view of Passera to incorporate the teachings of Zhou to provide techniques for utilizing orthogonality regularizers in tree classifiers. Doing so would have allowed Perpinan in view of Passera to incorporate Zhou’s method in order to "maximize the orthogonality of a normal vector in relation to the normal vector of the first level” (see, e.g., Zhou, paragraph [0048]) for optimizing classifiers at each node.

19. Claim 7 is further rejected under 35 U.S.C. 103 as being unpatentable over the combination of Perpinan and Passera, and further in view of Yu (US20210374612A1; hereinafter Yu). Yu was filed on 12/02/2021, and this date is before the earliest effective filing date of this application, i.e., 12/14/2021. Therefore, Yu constitutes prior art under 35 U.S.C. 102(a)(2).

Regarding Claim 7, the combination of Perpinan and Passera teaches the limitations of claim 1, as discussed above. Said combination fails to explicitly teach wherein the regularizer includes a diversification regularizer10.
Nevertheless, in the same field, analogous art Yu teaches wherein the regularizer includes a diversification regularizer (see, e.g., Yu, paragraph [0046] “The second term is for interpretability where an evidence regularization is used to encourage each prototypical option embedding to be as close to an encoded instance as possible. The third term is a diversity regularization term to learn diversified options, where d.sub.min is a threshold that classifies or determines whether two prototypes are close or not. d.sub.min is set to 1.0 in exemplary embodiments. λ.sub.1, λ.sub.2, λ.sub.3∈[0, 1] are the weights used to balance the three loss terms.”).
Perpinan, Passera and Yu are analogous art because they are each directed to the improvement of machine learning model interpretability and performance (see, e.g., Perpinan, paragraph [0493], Passera col.2 lines 38-43, and Yu paragraph [0019]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perpinan in view of Passera to incorporate the teachings of Yu to provide techniques for utilizing a diversification regularizer term that learns diversified options, in combination with the learning methods done at the nodes of decision trees. Doing so would have allowed Perpinan in view of Passera to incorporate Zhou’s method for improving interpretability in machine learning methods as suggested by Yu (see, e.g., Yu paragraph [0090]: "For better interpretability, the exemplary methods define several criteria for constructing the prototypes, including option diversity and accuracy”).

20. Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Perpinan and Passera, and further in view of Goyal (US20220222931A1; hereinafter Goyal).

Regarding Claim 12, the combination of Perpinan and Passera teaches the limitations of claim 1, as discussed above. Said combination fails to explicitly teach wherein the training set includes imbalanced datasets (see, e.g., Goyal paragraph [0013]: “The present disclosure relates to machine learning-based classification, and more particularly, to ensemble learning based methods and systems for a binary classification task on an imbalanced dataset, wherein the dataset includes a minority class comprising positive samples and a majority class comprising negative samples”).
Perpinan, Passera and Goyal are analogous art because they are each directed to Tree-organized classifiers (see, e.g., Perpinan, paragraph [0017], Passera, column 2, lines 38-44, and Goyal paragraph [0007]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Perpinan and Passera to incorporate the teachings of Goyal to include imbalanced datasets . Doing so would have allowed Perpinan to use Goyal's method in order to “ to deal with imbalanced classification without considering a cost-sensitive approach and without relying on feature engineering.”, as suggested by Goyal (see, e.g., Goyal, paragraph [0023]).
and a model accuracy performance neasurement includes nonlinear metrics (see, e.g., Goyal paragraph [0029]: “In view of the above, embodiments of the invention obtain a final learned classifier that has a low false positive rate. Therefore, embodiments of the invention may rely on an evaluation of F1-Measure, Precision and Average Precision, which are defined as follows:
PNG
media_image5.png
46
778
media_image5.png
Greyscale
” and [0030]: “The confusion matrix that corresponds to the above evaluation metrics is shown in FIG. 1.”).
The motivation to combine is the same as the above limitation in this claim.
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Perpinan and Passera to incorporate the teachings of Goyal to include precision evaluation measurements in the learning process of nodes of a decision tree. Doing so would have allowed Perpinan to use Goyal's method in order to “obtain a final learned classifier that has a low false positive rate”, as suggested by Goyal (see, e.g., Goyal, paragraph [0029]).

21. Claim 18 is rejected under 35 U.S.C. 103 as being unpatentable over the combination of Perpinan and Passera, in view of Zhou and further in view of Yu.

Regarding claim 18, the combination of Perpinan and Passera teaches the limitations of claim 17, as discussed above. Perpinan further teaches wherein the regularizer includes at least one of an …and a single routing regularizer (see, e.g., Perpinan paragraphs [0090]: “The routing function of the tree, λ(x; Θ):
PNG
media_image2.png
38
42
media_image2.png
Greyscale
→
PNG
media_image3.png
38
25
media_image3.png
Greyscale
, maps an input x to exactly one leaf of the tree, according to the decision functions. Strictly speaking, the parameters of λ are {θi
PNG
media_image4.png
33
46
media_image4.png
Greyscale
⊂Θ, i.e., the decision functions' parameters”, and [0333]: “Choice of regularization terms in eq. 4: again we have a choice, for each new node created. This can be done by deciding ahead of time the type of regularization term for a decision node and leaf and the value of the hyperparameter α, valid for every node throughout the tree growth” [i.e., the regularization term is used in decision nodes, can influence the routing function used at each decision node which routes input to exactly one leaf]).
The combination of Perpinan and Passera fails to explicitly teach: wherein the regularizer includes at least one of an orthogonlaity regularizer…
Nevertheless, in the same field, analogous art Zhou teaches wherein the regularizer includes at least one of an orthogonlaity regularizer… (see, e.g., Zhou, paragraph [0023] “The classifier at each node of the hierarchical structure may be encouraged to be as different as possible from the classifiers at the parent node. Mathematically, regularization may be used to force the classifying hyperplane at each node to be near-orthogonal 11to the hyperplane at the parent node. Under typical conditions on the regularization parameters, training such a hierarchical SVM may be a convex optimization problem”).
Perpinan, Passera and Zhou are analogous art because they are each directed Tree-organized classifiers (see, e.g., Perpinan, paragraph [0017], Passera column 2, lines 38-44, and Zhou paragraph [0004]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perpinan in view of Passera to incorporate the teachings of Zhou to provide techniques for utilizing orthogonality regularizers in tree classifiers. Doing so would have allowed Perpinan in view of Passera to incorporate Zhou’s method in order to "maximize the orthogonality of a normal vector in relation to the normal vector of the first level” (see, e.g., Zhou, paragraph [0048]) for optimizing classifiers at each node.
The combination of Perpinan and Passera fails to explicitly teach: wherein the regularizer includes at least one of an … diversity regularizer…
Nevertheless, in the same field, analogous art Yu teaches wherein the regularizer includes at least one of an … diversity regularizer… (see, e.g., Yu, paragraph [0046] “The second term is for interpretability where an evidence regularization is used to encourage each prototypical option embedding to be as close to an encoded instance as possible. The third term is a diversity regularization term to learn diversified options, where d.sub.min is a threshold that classifies or determines whether two prototypes are close or not. d.sub.min is set to 1.0 in exemplary embodiments. λ.sub.1, λ.sub.2, λ.sub.3∈[0, 1] are the weights used to balance the three loss terms.”).
Perpinan, Passera and Yu are analogous art because they are each directed to the improvement of machine learning model interpretability and performance (see, e.g., Perpinan, paragraph [0493], Passera col.2 lines 38-43, and Yu paragraph [0019]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Perpinan in view of Passera to incorporate the teachings of Yu to provide techniques for utilizing a diversification regularizer term that learns diversified options, in combination with the learning methods done at the nodes of decision trees. Doing so would have allowed Perpinan in view of Passera to incorporate Zhou’s method for improving interpretability in machine learning methods as suggested by Yu (see, e.g., Yu paragraph [0090]: "For better interpretability, the exemplary methods define several criteria for constructing the prototypes, including option diversity and accuracy”).

Conclusion
22. Any inquiry concerning this communication or earlier communications from the examiner should be directed to JEREMY A HALZEL whose telephone number is (571)272-5290. The examiner can normally be reached Mon-Fri 7:30-5:00 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s
supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the
organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be
obtained from Patent Center. Unpublished application information in Patent Center is available
to registered users. To file and manage patent submissions in Patent Center, visit:
https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for
more information about Patent Center and https://www.uspto.gov/patents/docx for
information about filing in DOCX format. For additional questions, contact the Electronic
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO
Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JEREMY HALZEL/Examiner, Art Unit 2125

/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125

1 In light of the specification, paragraph [0024] recites: “an orthogonality regularizer can have convex penalty and can preserve a PCA structure by enforcing orthogonality between columns of C”, which is recited at a high level of generality, and is interpreted to be a term that enforces orthogonality between values.
2 In light of the specification, paragraph [0024] recites “A diversification regularizer can artificially induce hierarchical clustering of observed data; allow a model to fit training data more optimally while still generalizing; and can allow for clustering while learning simultaneously”, which is recited at a high level of generality, and is interpreted to be a term that promotes data clustering of observed similar values.

3 In light of the specification, paragraph [0024] recites “a single routing regularizer can encourage routing each observation to a single node”, and is interpreted accordingly.
4 Under the broadest reasonable interpretation BRI, initializing the initial tree to be complete with random parameters, forms the structure of the decision tree, and necessarily involves applying the determined parameters to a root node for the construction of a decision tree, which aligns with initializing the decision tree by constructing a root node.

5 As discussed above in the section 101-rejection, in light of the specification, paragraph [0024] recites “a single routing regularizer can encourage routing each observation to a single node”, and is interpreted accordingly.
6 Under the broadest reasonable interpretation BRI, initializing the initial tree to be complete with random parameters, forms the structure of the decision tree, and necessarily involves applying the determined parameters to a root node for the construction of a decision tree, which aligns with initializing the decision tree by constructing a root node.

7 Under the broadest reasonable interpretation BRI, initializing the initial tree to be complete with random parameters, forms the structure of the decision tree, and necessarily involves applying the determined parameters to a root node for the construction of a decision tree, which aligns with initializing the decision tree by constructing a root node.
8 As discussed in the 101-rejection section above, in light of the specification, “an orthogonality regularizer can have convex penalty and can preserve a PCA structure by enforcing orthogonality between columns of C”, which is recited at a high level of generality, and is interpreted to be a term that enforces orthogonality between values.

9 Under the BRI, the teaching of Zhou corresponds to the claimed orthogonality regularizer, because Zhou’s use of regularization mathematically enforces orthogonality, transforming classifying hyperplane boundaries at different nodes to differ in angles close to or at orthogonal measurements.

10 As discussed in the 101-rejection section above, in light of the specification, a diversification regularizer can “artificially induce hierarchical clustering of observed data; allow a model to fit training data more optimally while still generalizing; and can allow for clustering while learning simultaneously”
11 Under the BRI, the teaching of Zhou corresponds to the claimed orthogonality regularizer, because Zhou’s use of regularization mathematically enforces orthogonality, transforming classifying hyperplane boundaries at different nodes to differ in angles close to or at orthogonal measurements.