Patent Application 17571899 - DEEP LEARNING BASED EMAIL CLASSIFICATION

Title: DEEP LEARNING BASED EMAIL CLASSIFICATION
Application Information

Invention Title: DEEP LEARNING BASED EMAIL CLASSIFICATION
Application Number: 17571899
Submission Date: 2025-05-16T00:00:00.000Z
Effective Filing Date: 2022-01-10T00:00:00.000Z
Filing Date: 2022-01-10T00:00:00.000Z
National Class: 706
National Sub-Class: 020000
Examiner Employee Number: 99772
Art Unit: 2124
Tech Center: 2100
Rejection Summary

102 Rejections: 0
103 Rejections: 5
Cited Patents

No patents were cited in this rejection.
Office Action Text



    Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
This Office Action is responsive to Applicants' Amendment filed on February 5, 2025, in which claims 1, 8, 9, 13, and 15 are amended. Claims 1-20 are currently pending.

Response to Arguments
With regards to the rejections of claims 1-20 under 35 U.S.C. §101, Applicant’s arguments that the claims as amended overcome the rejection have been considered but are not persuasive. Applicant argues that any judicial exceptions recited by the claims are integrated into a practical application, specifically that independent claims 1 and 15 integrate into a practical application.
Applicant supports their argument with reference to the specification. Applicant identifies that currently within the technical field of email classification: “as noted in paragraph [0021] of the publication, ‘email traffic is often extremely imbalanced. For example, for each phishing email 10,000-100,000 non phishing emails may be received. This creates an issue using traditional classification models that aim to optimize the accuracy (e.g. classification models that are trained using a standard cross-entropy function)’”. Applicant states that their invention solves this problem at least by: “As further noted in paragraph [0022], to account for this risk of low precision, ‘the processor circuitry 22 trains the deep learning algorithm 12 using a loss function (also referred to as a regulation function) configured to compensate for the imbalance in at least two of the multiple categories of email’”.
Applicant states that amended claims 1 and 15 reflect this improvement at least by recitation of the limitations: when the deep learning algorithm outputs a classification score for one of the multiple categories identified as non-critical, a primary loss function is used; and and when the deep learning algorithm outputs a classification score for one of the multiple categories identified as critical, a critical loss function is used comprising a compensating function modifying the primary loss function.
Examiner acknowledges that Applicant’s invention provides a technical improvement to the field of classification of imbalanced datasets via an algorithm for addressing imbalance between different categories within training data, and that this technical improvement is reflected within claims 1 and 15 as amended. However, it is Examiner’s understanding that the recited technical improvement is to the abstract idea of classification, in conjunction with mathematical formulas and calculations, without additional limitations that integrate the recited abstract ideas into a practical application. MPEP 2106.04(d) states “Because a judicial exception alone is not eligible subject matter, if there are no additional claim elements besides the judicial exception, or if the additional claim elements merely recite another judicial exception, that is insufficient to integrate the judicial exception into a practical application”. As detailed in the rejections of the claims under 35 U.S.C. §101 below, all limitations, including those within claims 1 and 15, either do not meaningfully limit the claims due to being mere instructions to apply, insignificant extra-solution activity, etc. or are judicial exceptions themselves.
Additionally, within claims 1 and 15, Examiner does not find it clear how the recited technical improvement relates to the classification of emails beyond being generally linked to the field. The described algorithm seems to be applicable to any classification problem where there is a severe imbalance between categories in the data to be classified.  MPEP 2106.04(d) states “A claim that integrates a judicial exception into a practical application will apply, rely on, or use the judicial exception in a manner that imposes a meaningful limit on the judicial exception, such that the claim is more than a drafting effort designed to monopolize the judicial exception”. Claims 1 and 15 as currently amended do not appear to have any limitations that meaningfully apply the judicial exceptions to use with emails in particular, just mere recitation that the classification algorithm is to be used to classify emails into categories such as phishing, and that the training data includes email content such as text and metadata. 
Therefore, for at least the above reasons, claims 1-20 remain rejected under 35 U.S.C. §101.
With regards to the previous rejections of claims 1-8 and 13 under 35 U.S.C. §112(b), Applicant has amended the claims to make clear the previously identified indefinite language. However, in regards to the rejection of claims 15-20 under 35 U.S.C. §112(b), the amended claim 15 still recites the limitation a low score when the correct classification label of the email being classified identifies the email as being of the critical category, which as stated in the original rejection of claim 1, uses the indefinite term of degree “low”. Therefore claim 15 and its dependents 16-20 remain rejected under 35 U.S.C. §112(b) on the previous basis. Additionally, the amendments to claims 1 and 15 have resulted in the inclusion of new indefinite language, and therefore both claims and their dependents that do not expressly include marketing emails as a category, i.e. claims 1-7 and 9-20, are rejected under 35 U.S.C. §112(b) on new grounds.
With regards to the rejections of claims 1, 2, 4, 7, 15, 16, and 18-20 under 35 U.S.C. §103 as being unpatentable over Gansterer and Polz “E-Mail Classification for Phishing Defense”, in view of Baum and Kull “Cost-sensitive classification with deep neural networks”, further in view of Ryou et al. “Anchor Loss: Modulating Loss Scale based on Prediction Difficulty”, Applicant’s argument that the claims as amended overcome the rejections are persuasive. However, the argument is moot in view of new grounds of rejection, as presented below, that are necessitated by Applicant’s amendment.
Examiner additionally notes that in regards to the amended claims 1 and 15, many of the new limitations added on amendment can be found in the previous combination of references. The previous combination of references does not teach that phishing should be a critical category subject to increased costs, however it would be obvious to modify the system taught by the previous combination of references to change the category subject to increased costs from ham emails (as taught by Gansterer) to phishing emails, as Gangavarapu et al. “Applicability of Machine Learning in Spam and Phishing Email Filtering: Review and Approaches” teaches that privacy breaches from failing to detect phishing emails is a much more serious concern than failing to detect spam.

Claim Rejections - 35 USC § 112b
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-7, and 9-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.

Regarding claim 1,
Claim 1 recites the limitations wherein the multiple categories include phishing, spam, and clean and wherein the phishing category is identified as critical, and the spam, marketing, and clean categories are each identified as non-critical. A marketing category is not earlier recited as one of the possible categories of emails in the training system, and it is therefore unclear whether the scope of the claim is meant to be limited to categorization of emails into phishing, spam, marketing, and clean, or just phishing, spam, and marketing, thus the scope of the claim is indefinite. 
In reference to dependent claims 2-7 and 9-14, claims 2-7 and 9-14 do not cure the deficiencies noted in the rejection of independent claim 1. Therefore, these claims are rejected under the same rationale as claim 1. Dependent claim 8 expressly includes marketing as an additional category of email and thus cures the noted deficiency.

Regarding claim 15,
Claim 15 recites the limitation wherein the compensating function more harshly punishes false positives than false negatives by outputting: a low score when the correct classification label of the email being classified identifies the email as being of the critical category. It is not specified either elsewhere within the claim or within the specification what exactly is considered to be a “low score”, or how it relates to the later recited “lower value” within the limitation such that emails belonging to the critical category that are incorrectly classified as belonging to the non-critical category receive the lower value, making “low score” a relative term without sufficient boundaries, and thus the scope of the claim is indefinite. For examination purposes, a “low score” will be considered to be any value that is lower than the recited “higher value”.
Claim 15 additionally recites the limitations wherein the multiple categories include phishing, spam, and clean and wherein the phishing category is identified as critical, and the spam, marketing, and clean categories are each identified as non-critical. A marketing category is not earlier recited as one of the possible categories of emails in the training system, and it is therefore unclear whether the scope of the claim is meant to be limited to categorization of emails into phishing, spam, marketing, and clean, or just phishing, spam, and marketing, thus the scope of the claim is indefinite. 
In reference to dependent claims 16-20, claims 16-20 do not cure the deficiencies noted in the rejection of independent claim 15. Therefore, these claims are rejected under the same rationale as claim 15.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101.

Regarding claim 1,
Step 1 - “Is the claim to a process, machine, manufacture or composition of matter?”
Yes, the claim is directed towards a computer training device comprising a processor and a non-volatile memory, i.e. a machine.
Step 2A, Prong 1 - “Is the claim directed to a law of nature, a natural phenomenon (product of nature) or an abstract idea?”:
The limitation of and train the deep learning algorithm to determine multiple classification scores using a loss function configured to compensate for the imbalance in the at least two of the multiple categories of emails recites a mathematical calculation, which is a mathematical concept, which is an abstract idea.
The limitation of wherein the multiple categories include phishing, spam, and clean recites an observation, which is which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
The limitation of wherein the imbalance in the at least two of the multiple categories of emails is between the phishing category and the clean category, such that there are more emails in the clean category than in the phishing category recites an observation, which is which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
The limitation of wherein each of the multiple categories has a criticality identifying the category as non-critical or critical recites a judgement, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
The limitation of wherein the phishing category is identified as critical, and the spam, marketing, and clean categories are each identified as non-critical recites a judgement, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
The limitation of wherein each of the multiple classification scores: is associated with one of the multiple categories recites a judgement, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
The limitation of estimates a probability that the email falls into the category associated with the classification score; and is based on the email content recites evaluations, which are mental processes, which are abstract ideas, regardless of whether they’re implemented on a generic computer.
The limitation of wherein the loss function used to train the deep learning algorithm changes depending on the criticality of the score being determined, such that: when the deep learning algorithm outputs a classification score for one of the multiple categories identified as non-critical, a primary loss function is used; and when the deep learning algorithm outputs a classification score for one of the multiple categories identified as critical, a critical loss function is used recites a judgement, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
The limitation of wherein the compensating function more harshly punishes false positives than false negatives by outputting recites a judgement, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
The limitation of a lower value when the correct classification label of the email being classified identifies the email as being of the critical category, such that emails belonging to the critical category that are incorrectly classified as belonging to the non-critical category receive the lower value; and a higher value when the correct classification label of the email being classified identifies the email as being of the non-critical category, such that emails belonging to the non-critical category that are incorrectly classified as belonging to the critical category receive the higher value, wherein: the higher value output by the compensating function increases based on the classification score, such that a higher classification score signifying a higher probability of the email being classified as being of the critical category results in a higher value of the output of the compensating function recites a mathematical formula, which is a mathematical concept, which is an abstract idea.
Step 2A, Prong 2 - “Does the claim recite additional elements that integrate the judicial exception into a practical application?”:
The limitation of memory comprising a non-transitory computer readable medium and storing the deep learning algorithm; processor circuitry configured to recites computer components recited so generically that they amount to mere instructions to apply the exception on a generic computer, MPEP 2106.05(d) and 2106.05(f).
The limitation of receive training data for multiple emails, wherein for each of the multiple emails: the training data includes email content and a correct classification label; and the email content includes both text of the email and meta data for the email recites the mere extra-solution activity of data gathering, which does not integrate the exception into a practical application, MPEP 2106.05(d) and 2106.05(g).
The limitation of and to output at least one of the determined multiple classification scores recites the mere extra-solution activity of data outputting, which does not integrate the exception into a practical application, MPEP 2106.05(d) and 2106.05(g).
Step 2B - “Does the claim recite additional elements that amount to significantly more than the judicial exception?”:
The limitation of memory comprising a non-transitory computer readable medium and storing the deep learning algorithm; processor circuitry configured to recites computer components recited so generically that they amount to mere instructions to apply the exception on a generic computer, MPEP 2106.05(f).
The limitation of receive training data for multiple emails, wherein for each of the multiple emails: the training data includes email content and a correct classification label; and the email content includes both text of the email and meta data for the email recites receiving data over a network, which is well-understood, routine, and conventional, MPEP 2106.05(d).II.
The limitation of and to output at least one of the determined multiple classification scores recites transmitting data over a network, which is well-understood, routine, and conventional, MPEP 2106.05(d).II.
Therefore, claim 1 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 2,
Claim 2 adds the additional limitation of wherein the primary loss function is a cross-entropy function taking as an input each of the classification scores output by the deep learning algorithm to claim 1, which recites a mathematical formula for the primary loss function, which is a mathematical concept, which is an abstract idea.
Therefore, claim 2 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 3,
Claim 3 adds the additional limitation of wherein the compensating function is a tanh function that takes as an input each of the classification scores output by the deep learning algorithm to claim 1, which recites a mathematical formula for the compensating function, which is a mathematical concept, which is an abstract idea.
Therefore, claim 3 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 4,
Claim 4 adds the additional limitation of wherein the compensating function is added to the primary loss function to claim 1, which recites a mathematical formula for the loss function, which is a mathematical concept, which is an abstract idea.
Therefore, claim 4 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 5,
Claim 5 adds the additional limitations to claim 1:
before the passing of the email content to the deep learning algorithm, the processor circuitry is configured to apply a transform to embed the text of the email content, such that the embedded text is received with the email content by the deep learning algorithm recites a mathematical calculation, which is a mathematical concept, which is an abstract idea.
the transform applies a label to each word in the text of the received email, such that the applied label is based on the word and a context of the word determined based on text neighboring the word recites an evaluation, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
Therefore, claim 5 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 6,
Claim 6 adds the additional limitation of wherein the processor circuitry is further configured to preprocess the email content by combining the text of the email and the meta data for the email to create a coherent combined space using a neural ordinary differential equation (ODE) to claim 1, which recites a mathematical calculation, which is a mathematical concept, which is an abstract idea.
Therefore, claim 6 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 7,
Claim 7 adds the additional limitation of wherein the meta data for the received emails includes at least one of whether the received email includes links, whether the received email is sent using a virtual private network (VPN), a number of recipients of the received email, whether a domain of an email address of a sender of the received email is known, or a linkage between the domain of the email address of the sender to links inside a body of the email to claim 1, which recites mere additional information about the data that is gathered to perform the judicial exceptions performed by the system, and does not serve to integrate the judicial exceptions into a practical application.
Therefore, claim 7 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 8,
Claim 8 adds the additional limitations to claim 1:
the multiple categories additionally include marketing recites an observation, which is which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
and marketing category is identified as non-critical recites a judgement, which is which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
Therefore, claim 8 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 9,
Claim 9 adds the additional limitations to claim 1:
memory comprising a non-transitory computer readable medium and storing the deep learning algorithm; processor circuitry configured to recites computer components recited so generically that they amount to mere instructions to apply the exception on a generic computer, MPEP 2106.05(d) and 2106.05(f).
receive the incoming emails; for each of the received emails, preprocess email content of the received email; execute the trained deep learning algorithm configured to, for each of the received emails: receive the preprocessed email content recites the mere extra-solution activity of data gathering and data preprocessing, which does not integrate the exception into a practical application, MPEP 2106.05(d) and 2106.05(g), and recites transmitting data over a network, which is well-understood, routine, and conventional, MPEP 2106.05(d).II, and preprocessing email data for spam and phishing detection, which is also well-understood, routine, and conventional, (Gangavarapu Pg. 9) “Sample base or case base filtering techniques are popular in spam and phishing email filtering. Through an email collection model, all the emails, including ham, spam, and phishing, are extracted from every user's email. Then, preprocessing of the raw email data into a machine-processable form is facilitate through feature selection (extraction) and grouping the email data”.
determine multiple classification scores recites an evaluation, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
wherein each of the multiple classification scores: is associated with one of the multiple categories recites a judgement, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
includes a probability that the received email falls into the category associated with the classification score; is based on the email content recites evaluations, which are mental processes, which are abstract ideas, regardless of whether they’re implemented on a generic computer.
output at least one of the determined multiple classification scores recites the mere extra-solution activity of data outputting, which does not integrate the exception into a practical application, MPEP 2106.05(d) and 2106.05(g), and recites transmitting data over a network, which is well-understood, routine, and conventional, MPEP 2106.05(d).II.
classify the incoming emails based on the outputted at least one classification scores recites an evaluation, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
Therefore, claim 9 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 10,
Claim 10 adds the additional limitations to claim 9:
the deep learning algorithm outputs at least two of the determined multiple classification scores recites the mere extra-solution activity of data outputting, which does not integrate the exception into a practical application, MPEP 2106.05(d) and 2106.05(g), and transmitting data over a network is well-understood, routine, and conventional, MPEP 2106.05(d).II.
and for each of the received emails, the processor circuitry is further configured to classify the received email as one of the classification types based on the outputted at least two of the determined multiple classification scores recites an evaluation, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
Therefore, claim 10 is found to be ineligible subject matter under 35 U.S.C. 101.
	
Regarding claim 11,
Claim 11 adds the additional limitations to claim 9:
the processor circuitry is configured to preprocess the email content by applying a transform to embed the text of the email content, such that the embedded text is received with the email content by the deep learning algorithm recites a mathematical calculation, which is a mathematical concept, which is an abstract idea.
the transform applies a label to each word in the text of the received email, such that the applied label is based on the word and a context of the word determined based on text neighboring the word recites an evaluation, which is a mental process, which is an abstract idea, regardless of whether it’s implemented on a generic computer.
Therefore, claim 11 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 12,
Claim 12 adds the additional limitation of wherein the deep learning algorithm combines text data and tabular data to create a coherent combined space using a neural ordinary differential equation (ODE) to claim 9, which recites a mathematical calculation, which is a mathematical concept, which is an abstract idea.
Therefore, claim 12 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 13,
Claim 13 adds the additional limitation of wherein the email content includes both text of the email and meta data for the email, wherein the meta data for the received emails includes at least one of whether the received email includes links, whether the received email is sent using a virtual private network (VPN), a number of recipients of the received email, whether a domain of an email address of a sender of the received email is known, or a linkage between the domain of the email address of the sender to links inside a body of the email to claim 9, which recites mere additional information about the data that is gathered to perform the judicial exceptions performed by the system, and does not serve to integrate the judicial exceptions into a practical application.
Therefore, claim 13 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 14,
Claim 14 adds the additional limitation of the multiple categories include phishing, spam, marketing, and clean; the phishing category is identified as critical; the spam, marketing, and clean categories are each identified as non-critical to claim 9, which recites judgements, which are mental processes, which are abstract ideas, regardless of whether they’re implemented on a generic computer.
Therefore, claim 14 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 15,
Claim 15 recites a method for performing the function of the computer training device of claim 1 with substantially the same limitations, therefore the same analysis and rejection applies. Therefore, claim 15 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 16,
Claim 16 recites a method for performing the function of the computer training device of claim 2 with substantially the same limitations, therefore the same analysis and rejection applies. Therefore, claim 16 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 17,
Claim 17 recites a method for performing the function of the computer training device of claim 3 with substantially the same limitations, therefore the same analysis and rejection applies. Therefore, claim 17 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 18,
Claim 18 recites a method for performing the function of the computer training device of claim 4 with substantially the same limitations, therefore the same analysis and rejection applies. Therefore, claim 18 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 19,
Claim 19 recites a method for performing the function of the computer training device of claim 9. Claim 19 recites all the limitations of claim 9, for which the same analysis and rejection applies. Claim 19 additionally recites the limitation of performing the method of claim 15 for training the deep learning algorithm stored on the non-transitory computer readable medium, which recites mere instructions to apply the method recited in claim 15, MPEP 2106.05(d) and 2106.05(f). Therefore, claim 19 is found to be ineligible subject matter under 35 U.S.C. 101.

Regarding claim 20,
Claim 20 recites substantially the same limitations as those recited in claim 10, therefore the same analysis and rejection applies. Therefore, claim 20 is found to be ineligible subject matter under 35 U.S.C. 101.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4, 7, 9, 10, 13, 15, 16, and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Gansterer and Polz “E-Mail Classification for Phishing Defense”, hereinafter Gansterer, in view of Baum and Kull “Cost-sensitive classification with deep neural networks”, hereinafter Baum, further in view of Ryou et al. “Anchor Loss: Modulating Loss Scale based on Prediction Difficulty”, hereinafter Ryou, further in view of Gangavarapu et al. “Applicability of Machine Learning in Spam and Phishing Email Filtering: Review and Approaches”, hereinafter Gangavarapu.

Regarding claim 1,
Gansterer teaches:
A computer training device for training a [deep learning] algorithm to classify incoming emails as belonging to one of multiple categories and to avoid a low precision risk caused by a number of received emails for at least two of the multiple categories being imbalanced ((Gansterer Abstract) “Moreover, in contrast to classical binary classification approaches (spam vs. not spam), a more refined ternary classification approach for filtering e-mail data is investigated which automatically distinguishes three message types: ham (solicited e-mail), spam, and phishing”, (Gansterer Pg. 9) “For the SVM, we increased the costs for misclassified ham emails (false positives) to five times the costs of other types of misclassifications…On a correspondingly imbalanced test set the overall accuracy improved to 95,3%”, Gansterer does not teach deep learning)
the [deep learning] training device comprising: memory comprising a non-transitory computer readable medium and storing the deep learning algorithm; processor circuitry configured to ((Gansterer Pg. 7) “Our prototype system was tested on an Intel Duo Core E6600 system with 2 GB RAM and a Linux operating system”, Gansterer does not teach deep learning)
receive training data for multiple emails ((Gansterer Pg. 7) As training set we selected the oldest 4000 e-mails of each class)
wherein for each of the multiple emails: the training data includes email content and a correct classification label ((Gansterer Pg. 7) “A sample set of 11000 phishing messages from May 2007 was kindly made available to us by the Phishery...and from the 2007 TREC corpus...which consists of roughly 25000 ham and 52 000 spam messages. As training set we selected the oldest 4000 e-mails of each class”)
and the email content includes both text of the email and meta data for the email ((Gansterer Pg. 6) “For this purpose, five keywords are extracted from the message text using the automatic keyword creating algorithm of the classifier4J software”, (Pg. 4) “Link-domain differs from sender-domain (LinkDifSender) [2] counts how many links point to a different domain than the domain from where the e-mail was sent”, broadest reasonable interpretation of metadata includes domains of email links)
using a loss function configured to compensate for the imbalance in the at least two of the multiple categories of emails (Gansterer Pg. 9) “For the SVM, we increased the costs for misclassified ham emails (false positives) to five times the costs of other types of misclassifications…On a correspondingly imbalanced test set the overall accuracy improved to 95,3%”)
wherein the multiple categories include phishing, spam, and clean ((Gansterer Pg. 8) “With an imbalanced training set (1000 ham, 4000 spam, 500 phishing) the SVM classifier based on feature set F1 achieved an overall accuracy of 92,5% on a balanced test data set (1000 messages from each class). On a correspondingly imbalanced test set the overall accuracy improved to 95,3%”, the training data set has the categories of phishing, spam, and ham, ham is another term for clean)
wherein the imbalance in the at least two of the multiple categories of emails is between the phishing category and the clean category, such that there are more emails in the clean category than in the phishing category ((Gansterer Pg. 8) “With an imbalanced training set (1000 ham, 4000 spam, 500 phishing) the SVM classifier based on feature set F1 achieved an overall accuracy of 92,5% on a balanced test data set (1000 messages from each class). On a correspondingly imbalanced test set the overall accuracy improved to 95,3%”, there is an imbalance in the training set where there are more emails in the ham category than in the phishing category, ham is another term for clean)
wherein each of the [multiple] classification scores:...and is based on the email content ((Gansterer Pg. 10) “We considered phishing e-mails correctly classified if they reached a score of 5 points or more and thus were put into the spam category by SpamAssassin”, Gansterer only teaches one score)
wherein the compensating function more harshly punishes false positives than false negatives ((Gansterer Pg. 9) “For the SVM, we increased the costs for misclassified ham emails (false positives) to five times the costs of other types of misclassifications”, increasing misclassification cost for false positives is more harshly punishing false positives than false negatives)
Baum teaches the following further limitations that Gansterer does not explicitly teach:
training a deep learning algorithm ((Baum Abstract) “The used techniques involve making neural network cost-sensitive based on the output probabilities”)
and train the deep learning algorithm to determine multiple classification scores ((Baum Pg. 25) “The experiments involve the usage of multiple techniques that are compared using three datasets with different degrees of difficulty. Each dataset has a multiclass and binary version of it”, (Baum Pg. 9) “Another commonly used activation function is softmax and it is mostly used on the output of the network to get probabilities…However, the usage of softmax is voluntary. Sometimes we might prefer to get the raw prediction values for each class instead of probabilities. They are called logits”, broadest reasonable interpretation of classification scores include “the raw prediction values for each class”)
and to output at least one of the determined multiple classification scores ((Baum Pg. 9) “Another commonly used activation function is softmax and it is mostly used on the output of the network to get probabilities…However, the usage of softmax is voluntary. Sometimes we might prefer to get the raw prediction values for each class instead of probabilities. They are called logits”)
wherein each of the multiple categories has a criticality identifying the category as non- critical or critical ((Baum Pgs. 29-30) “Predicting one class mistakenly is a lot more expensive than wrongly predicting other classes”, critical and non-critical are interpreted to mean binary attributes of a category)
wherein each of the multiple classification scores: is associated with one of the multiple categories ((Baum Pg. 25) “The experiments involve the usage of multiple techniques that are compared using three datasets with different degrees of difficulty. Each dataset has a multiclass and binary version of it”, (Baum Pg. 9) “Sometimes we might prefer to get the raw prediction values for each class instead of probabilities. They are called logits”)
estimates a probability that [the email] falls into the category associated with the classification score ((Baum Pg. 9) “Another commonly used activation function is softmax and it is mostly used on the output of the network to get probabilities”, Baum does not teach categorization of emails, but Gansterer does)
wherein the loss function used to train the deep learning algorithm changes depending on the criticality of the score being determined, such that: when the deep learning algorithm outputs a classification score for one of the multiple categories identified as non-critical, a primary loss function is used; and when the deep learning algorithm outputs a classification score for one of the multiple categories identified as critical, a critical loss function is used comprising a compensating function modifying the primary loss function ((Baum Pgs. 29-30) “Predicting one class mistakenly is a lot more expensive than wrongly predicting other classes. Here one row contains cost values from 8-10. All the other wrong predictions have equal cost 1. Ten different matrices are generated from this type for experiments (dataset with 10 classes) so that the expensive predicted class would be always different”, (Baum Pg. 24) “Another technique to train a cost-sensitive neural network is to modify the loss functions [KK98] so that the misclassification costs are taken into consideration when calculating the loss. The easiest way to get a cost-sensitive loss is to first calculate class weights as we did in section 4.3.1. Then for each input, the loss can be calculated by multiplying the instance loss with true class weight”)
by outputting: a lower value when the correct classification label [of the email] being classified identifies [the email] as being of the critical category, such that [emails] belonging to the critical category that are incorrectly classified as belonging to the non-critical category receive the lower value ((Baum Pg. 24) “The easiest way to get a cost-sensitive loss is to first calculate class weights as we did in section 4.3.1. Then for each input, the loss can be calculated by multiplying the instance loss with true class weight”, (Baum Pgs. 29-30) “Predicting one class mistakenly is a lot more expensive than wrongly predicting other classes. Here one row contains cost values from 8-10. All the other wrong predictions have equal cost 1”, the corresponding matrix shows that instances of the critical category (second column) that are classified as a non-critical category (not second row) receive a lower cost of 1, emails taught by Gansterer)

    PNG
    media_image1.png
    150
    305
    media_image1.png
    Greyscale

and a higher value when the correct classification label [of the email] being classified identifies [the email] as being of the non-critical category, such that [emails] belonging to the non-critical category that are incorrectly classified as belonging to the critical category receive the higher value ((Baum Pg. 24) “The easiest way to get a cost-sensitive loss is to first calculate class weights as we did in section 4.3.1. Then for each input, the loss can be calculated by multiplying the instance loss with true class weight”, (Baum Pgs. 29-30) “Predicting one class mistakenly is a lot more expensive than wrongly predicting other classes. Here one row contains cost values from 8-10. All the other wrong predictions have equal cost 1”, the corresponding matrix shows that instances of the non-critical category (columns other than the second) that are classified as a critical category (second row) receive a higher cost between 8 and 10, emails taught by Gansterer)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer and Baum by modifying the system taught by Gansterer to use a deep learning algorithm that outputs multiple classification scores indicating the probability that an input (email) is in one of multiple categories, taught by Baum, as a deep neural network with softmax or logit outputs is a well-known classification algorithm in the art, and performs a similar function to the classifiers recited in Gansterer such as SVM, and substituting a deep neural network for the classifiers taught by Gansterer yields the predictable result of also classifying the emails. Such a substitution would be obvious. 
Additionally, at the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer and Baum to determine that a number of the categories are critical, and therefore the loss/cost of misclassifying inputs that should be correctly classified as belonging to a given critical category should be greater, which is accomplished by modifying a primary loss function with a compensating function. Baum contains a base deep learning classifier that is improved in this way, and Gansterer teaches comparable classifiers with loss/cost functions such as SVM, so it would be obvious to one of ordinary skill in that art to improve a classifier recited in Gansterer in the same way to yield the predictable result of a classifier that is better at correctly classifying inputs belonging to one, very important category, at the expense of reducing the overall accuracy of the classifier.
Further, Ryou teaches the following further limitation that neither Gansterer nor Baum explicitly teaches:
wherein the higher value output by the compensating function increases based on the classification score, such that a higher classification score signifying a higher probability of the email being classified as being of the [critical] category results in a higher value of the output of the compensating function ((Ryou Pg. 2) “the proposed loss function leverages the confidence gap between the target and non-target output values to modulate the loss scale of the samples in the training phase”, critical categories taught by Baum)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, and Ryou by modifying the system taught by Gansterer and Baum to include the compensating function, which modifies a primary loss function, to output a higher value when the probability of an incorrect classification is higher, as Ryou teaches: (Ryou Pg. 2) “We observe that our loss function encourages the separation gap between the true labeled score and the most competitive hypothesis”.
Further, Gangavarapu teaches the following further limitation that neither Gansterer, nor Baum, nor Ryou, explicitly teaches:
wherein the phishing category is identified as critical, and the spam, marketing, and clean categories are each identified as non-critical ((Gangavarapu Pg. 39) “When a spam email message is misclassified as a ham email, it causes a rather insignificant problem (user only needs to delete such an email). However, when ham emails are misclassified as spam or phishing emails, there is a possibility of losing vital information (specifically in scenarios where spam emails are deleted automatically), while phishing emails that are misclassified as ham emails result in a breach of privacy (a much more serious concern)”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, and Gangavarapu by modifying the system taught by Gansterer, Baum, and Ryou to include phishing emails within the critical category of emails, as Gangavarapu teaches: (Gangavarapu Pg. 39) “phishing emails that are misclassified as ham emails result in a breach of privacy (a much more serious concern)”.

Regarding claim 2,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer training device of claim 1.
Baum further teaches:
wherein the primary loss function is a cross-entropy function taking as an input each of the classification scores output by the deep learning algorithm ((Pgs. 10-11) “In this thesis, two different loss functions are used for calculating the loss of given predictions:...1. Cross-entropy...The formula for cross-entropy is [following formula] where q is the predicted probability”)

    PNG
    media_image2.png
    47
    231
    media_image2.png
    Greyscale

At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, and Gangavarapu to create the system of claim 1. One would have motivation to additionally use the cross-entropy loss function used in Baum, as Baum teaches: (Baum Pg. 52) “The generalization can be biased but Brier score seemed to be a little better choice when the dataset was really easy or extremely complicated. Otherwise, cross-entropy can be prefered. Also, Brier score seemed to perform better in binary datasets than it did in multiclass cases compared to cross-entropy. Hence, in the multiclass case, Brier score may not be worth a try”. Such a combination would be obvious.

Regarding claim 4,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer training device of claim 1.
Ryou further teaches:
wherein the compensating function is added to the primary loss function (Ryou Pg. 4, Equation 5 shows the compensating function, (1 + q – q*)γ being added to the primary loss function)

    PNG
    media_image3.png
    58
    361
    media_image3.png
    Greyscale

At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, and Gangavarapu to create the system of claim 1. One would have motivation to additionally add the compensating function to the primary loss function, as Ryou does, as it would be obvious to try adding two functions together, since there are a limited number of ways to mathematically combine two functions, and adding two loss functions together to get a final loss would predictably combine them to create a final loss.

Regarding claim 7,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer training device of claim 1.
Gansterer further teaches:
wherein: the meta data for the received emails includes at least one of whether the received email includes links, whether the received email is sent using a virtual private network (VPN), a number of recipients of the received email, whether a domain of an email address of a sender of the received email is known, or a linkage between the domain of the email address of the sender to links inside a body of the email ((Gansterer Pg. 4) “Link-domain differs from sender-domain (LinkDifSender) [2] counts how many links point to a different domain than the domain from where the e-mail was sent”)
It would be obvious to combine Gansterer, Baum, Ryou, and Gangavarapu for the parent claim of claim 7, claim 1. Because all additional limitations in claim 7 are present in Gansterer, no additional rationale for combination is necessary.

Regarding claim 9,
Gansterer, Baum, Ryou, and Gangavarapu teach A computer classification device for using a deep learning algorithm trained by the computer training device of claim 1 to classify incoming emails belonging to one of multiple categories where a number of received emails for two of the multiple categories is imbalanced, the classification device comprising:
Gansterer further teaches:
A computer classification device for using a trained [deep learning] algorithm to classify incoming emails belonging to one of multiple categories ((Gansterer Abstract) “Moreover, in contrast to classical binary classification approaches (spam vs. not spam), a more refined ternary classification approach for filtering e-mail data is investigated which automatically distinguishes three message types: ham (solicited e-mail), spam, and phishing”, Gansterer does not teach deep learning)
where a number of received emails for two of the multiple categories is imbalanced ((Gansterer Pg. 9) With an imbalanced training set (1000 ham, 4000 spam, 500 phishing) the SVM classifier based on feature set F1 achieved an overall accuracy of 92,5% on a balanced test data set (1000 messages from each class). On a correspondingly imbalanced test set the overall accuracy improved to 95,3%)
the classification device comprising: memory comprising a non-transitory computer readable medium and storing the [deep learning] algorithm ((Gansterer Pg. 7) “Our prototype system was tested on an Intel Duo Core E6600 system with 2 GB RAM and a Linux operating system”, Gansterer does not teach deep learning)
processor circuitry configured to: receive the incoming emails ((Gansterer Pg. 7) Test data were sent from an email collection in mbox format using Google Mail Loader)
for each of the received emails, preprocess email content of the received email ((Gansterer Pg. 6) “Distance of message text and linked domain (DifTextLink) tries to evaluate the ‘distance’ of the message text from the domain names which it links to. For this purpose, five keywords are extracted from the message text using the automatic keyword creating algorithm of the classifier4J software…These keywords are sent to a search engine individually and also in a combined query (several keywords combined with a logical AND). For each of these six queries, the domains of the ten highest ranked hits are compared to the domains linked from the e-mail. The feature value is defined as the number of links in the message which are not found in any of the top ten hits of the queries”)
execute the trained [deep learning] algorithm configured to, for each of the received emails: receive the preprocessed email content (Gansterer Fig. 1 shows that their classifier algorithm uses the preprocessed email content)
Baum further teaches:
a deep learning algorithm trained ((Baum Abstract) “The used techniques involve making neural network cost-sensitive based on the output probabilities”)
determine multiple classification scores, wherein each of the multiple classification scores: is associated with one of the multiple categories ((Baum Pg. 25) “The experiments involve the usage of multiple techniques that are compared using three datasets with different degrees of difficulty. Each dataset has a multiclass and binary version of it”, (Baum Pg. 9) “Another commonly used activation function is softmax and it is mostly used on the output of the network to get probabilities…However, the usage of softmax is voluntary. Sometimes we might prefer to get the raw prediction values for each class instead of probabilities. They are called logits”, broadest reasonable interpretation of classification scores include “the raw prediction values for each class”)
includes a probability that the [received email] falls into the category associated with the classification score ((Baum Pg. 9) “Another commonly used activation function is softmax and it is mostly used on the output of the network to get probabilities”, Baum does not teach categorization of received emails, but Gansterer does)
output at least one of the determined multiple classification scores ((Baum Pg. 9) “Another commonly used activation function is softmax and it is mostly used on the output of the network to get probabilities…However, the usage of softmax is voluntary. Sometimes we might prefer to get the raw prediction values for each class instead of probabilities. They are called logits”)


    PNG
    media_image4.png
    505
    490
    media_image4.png
    Greyscale

wherein each of the [multiple classification] scores:…is based on the email content ((Gansterer Pg. 10) “We considered phishing e-mails correctly classified if they reached a score of 5 points or more and thus were put into the spam category by SpamAssassin”, Gansterer only teaches one score)
classify the incoming emails based on the outputted at least one classification score[s] ((Gansterer Pg. 10) “We considered phishing e-mails correctly classified if they reached a score of 5 points or more and thus were put into the spam category by SpamAssassin”, Gansterer only teaches one score)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, and Gangavarapu by modifying the system taught jointly by Gansterer, Baum, Ryou and Gangavarapu to use a deep learning algorithm that outputs multiple classification scores indicating the probability that an input (email) is in one of multiple categories, taught by Baum, as a deep neural network with softmax or logit outputs is a well-known classification algorithm in the art, and performs a similar function to the classifiers recited in Gansterer such as SVM, and substituting a deep neural network for the classifiers taught by Gansterer yields the predictable result of also classifying the emails. Such a substitution would be obvious. 

Regarding claim 10,
	Gansterer, Baum, Ryou, and Gangavarapu teach The computer classification device of claim 9, wherein:
Gansterer further teaches:
and for each of the received emails, the processor circuitry is further configured to classify the received email as one of the classification types ((Gansterer Abstract) “Moreover, in contrast to classical binary classification approaches (spam vs. not spam), a more refined ternary classification approach for filtering e-mail data is investigated which automatically distinguishes three message types: ham (solicited e-mail), spam, and phishing”
Baum further teaches:
the deep learning algorithm outputs at least two of the determined multiple classification scores ((Baum Pg. 25) “The experiments involve the usage of multiple techniques that are compared using three datasets with different degrees of difficulty. Each dataset has a multiclass and binary version of it”, (Baum Pg. 9) “Another commonly used activation function is softmax and it is mostly used on the output of the network to get probabilities…However, the usage of softmax is voluntary. Sometimes we might prefer to get the raw prediction values for each class instead of probabilities. They are called logits”, a deep learning algorithm with softmax activations or logit outputs that is trained to predict labels for a multiclass dataset will output at least two scores)
the processor circuitry is further configured to classify…based on the outputted at least two of the determined multiple classification scores ((Baum Pg. 25) “The experiments involve the usage of multiple techniques that are compared using three datasets with different degrees of difficulty. Each dataset has a multiclass and binary version of it”, (Baum Pg. 9) “Another commonly used activation function is softmax and it is mostly used on the output of the network to get probabilities…However, the usage of softmax is voluntary. Sometimes we might prefer to get the raw prediction values for each class instead of probabilities. They are called logits”, a deep learning algorithm with softmax activations or logit outputs that is trained to predict labels for a multiclass dataset will output at least two scores)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, and Gangavarapu by modifying the system jointly taught by Gansterer, Baum, Ryou, and Gangavarapu to classify emails using the deep learning algorithm that outputs at least two classification scores taught by Baum, as a deep neural network with softmax or logit outputs is a well-known classification algorithm in the art, and performs a similar function to the classifiers recited in Gansterer such as SVM, and substituting a deep neural network for the classifiers taught by Gansterer yields the predictable result of also classifying the emails. Such a substitution would be obvious. 

Regarding claim 13,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer classification device of claim 9,
Gansterer further teaches:
wherein the email content includes both text of the email and meta data for the email, ((Gansterer Pg. 6) “For this purpose, five keywords are extracted from the message text using the automatic keyword creating algorithm of the classifier4J software”, (Pg. 4) “Link-domain differs from sender-domain (LinkDifSender) [2] counts how many links point to a different domain than the domain from where the e-mail was sent”, broadest reasonable interpretation of metadata includes domains of email links)
wherein: the meta data for the received emails includes at least one of whether the received email includes links, whether the received email is sent using a virtual private network (VPN), a number of recipients of the received email, whether a domain of an email address of a sender of the received email is known, or a linkage between the domain of the email address of the sender to links inside a body of the email ((Gansterer Pg. 4) “Link-domain differs from sender-domain (LinkDifSender) [2] counts how many links point to a different domain than the domain from where the e-mail was sent”)
It would be obvious to combine Gansterer, Baum, Ryou, and Gangavarapu for the parent claim of claim 13, claim 9. Because all additional limitations in claim 13 are present in Gansterer, no additional rationale for combination is necessary.

Regarding claim 15,
Claim 15 recites a method for performing the function of the computer training device of claim 1 with substantially the same limitations, therefore the same analysis and rejection applies.

Regarding claim 16,
Claim 16 recites a method for performing the function of the computer training device of claim 2 with substantially the same limitations, therefore the same analysis and rejection applies.

Regarding claim 18,
Claim 18 recites a method for performing the function of the computer training device of claim 4 with substantially the same limitations, therefore the same analysis and rejection applies.

Regarding claim 19,
Claim 19 recites a method for performing the function of the computer classification device of claim 9 with substantially the same limitations, therefore the same analysis and rejection applies.

Regarding claim 20,
Claim 20 recites a method for performing the function of the computer classification device of claim 10 with substantially the same limitations, therefore the same analysis and rejection applies.

Claims 3 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Gansterer, in view of Baum, further in view of Ryou, further in view of Gangavarapu, further in view of Du et al. “Scale-Sensitive IOU Loss: An Improved Regression Loss Function in Remote Sensing Object Detection”, hereinafter Du.

Regarding claim 3,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer training device of claim 1.
Du teaches the following further limitation that neither Gansterer, nor Baum, nor Ryou, nor Gangavarapu explicitly teaches:
wherein the compensating function is a tanh function that takes as an input each of the classification scores output by the deep learning algorithm (Du Pg. 4 Equations 9 and 10 show a compensating function γ, that is a tanh function, taking the area difference score as input)

    PNG
    media_image5.png
    290
    398
    media_image5.png
    Greyscale

At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, Gangavarapu, and Du by modifying the system taught by Gansterer, Baum, Ryou, and Gangavarapu to use a tanh function as Du teaches (Du Pg. 5) “as a regulating parameter, γ should be valued at [0, 1], so as not to affect the value of the original loss function. Based on this, the hyperbolic tangent function tanh(x) function is adopted as the basic function”. Such a combination would be obvious, and it is mentioned within the specification that tanh is used in the instant application in order to exploit the same mathematical property (Pgs. 6-7): “The constants μ and b may be selected to map the probabilities’ interval represented by the classification scores (e.g. [0,1]) to the slope of tanh”.

Regarding claim 17,
Claim 17 recites a method for performing the function of the computer training device of claim 3 with substantially the same limitations, therefore the same analysis and rejection applies.

Claims 5 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over Gansterer, in view of Baum, further in view of Ryou, further in view of Gangavarapu, further in view of Lee et al. “D-Fence: A Flexible, Efficient, and Comprehensive Phishing Email Detection System”, hereinafter Lee.

Regarding claim 5,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer training device of claim 1.
Lee teaches the following further limitation that neither Gansterer, nor Baum, nor Ryou, nor Gangavarapu explicitly teaches:
the processor circuitry is configured to preprocess the email content by applying a transform to embed the text of the email content, such that the embedded text is received with the email content by the deep learning algorithm ((Lee Pg. 4) “D-Fence uses pretrained transformers, specifically BERT (bidirectional encoder representations from transformers) [20], for extracting contextual embeddings from readable texts in emails”)
the transform applies a label to each word in the text of the received email, such that the applied label is based on the word and a context of the word determined based on text neighboring the word ((Lee Pg. 4) “BERT is a state-of-the-art technique that encodes a sentence in a bidirectional manner. BERT learns relations between sentences and between tokens by jointly conditioning on both left-to-right and right-to-left directions. The key advantage of BERT and similar techniques (e.g., ELMo [56] and OpenAI GPT [59]) is contextual text embedding. BERT provides different word embedding values for each word based on the input sentences. This contextual text embedding enables the model to differentiate the meaning of sentences comprising of similar words”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, Gangavarapu, and Lee by substituting the keyword extraction in Gansterer with using a transform to create text embeddings, as text embeddings are a well-known method in the art for preprocessing data to be given to a predictive model, fulfilling the same function as the keyword extraction of transforming the raw text data to a form that is easier for the predictive model to work with.

Regarding claim 11,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer classification device of claim 9.
Lee teaches the following further limitation that neither Gansterer, nor Baum, nor Ryou, nor Gangavarapu explicitly teaches:
the processor circuitry is configured to preprocess the email content by applying a transform to embed the text of the email content, such that the embedded text is received with the email content by the deep learning algorithm ((Lee Pg. 4) “D-Fence uses pretrained transformers, specifically BERT (bidirectional encoder representations from transformers) [20], for extracting contextual embeddings from readable texts in emails”)
the transform applies a label to each word in the text of the received email, such that the applied label is based on the word and a context of the word determined based on text neighboring the word ((Lee Pg. 4) “BERT is a state-of-the-art technique that encodes a sentence in a bidirectional manner. BERT learns relations between sentences and between tokens by jointly conditioning on both left-to-right and right-to-left directions. The key advantage of BERT and similar techniques (e.g., ELMo [56] and OpenAI GPT [59]) is contextual text embedding. BERT provides different word embedding values for each word based on the input sentences. This contextual text embedding enables the model to differentiate the meaning of sentences comprising of similar words”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, Gangavarapu, and Lee by substituting the keyword extraction in Gansterer with using a transform to create text embeddings, as text embeddings are a well-known method in the art for preprocessing data to be given to a predictive model, fulfilling the same function as the keyword extraction of transforming the raw text data to a form that is easier for the predictive model to work with.

Claim 6 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Gansterer, in view of Baum, further in view of Ryou, further in view of Gangavarapu, further in view of Park et al. (U.S. Patent Application Publication No. 20230196071), hereinafter Park.

Regarding claim 6,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer training device of claim 1.
Gangavarapu further teaches:
wherein the processor circuitry is further configured to preprocess the email content by combining the text of the email and the meta data for the email to create a coherent combined space ((Gangavarapu Pg. 17) “As explained in Section 4.2, we need to extract forty features (refer Table 4) from the collected raw email data. Before extracting the features, it is vital to parse the email to obtain the email body, subject line, sender address, reply-to address, modal URL, and all the links…The implementations in the Python email library provide extensive support to handle and parse email data and multipurpose internet mail extensions. First, we extracted the raw email data from the string format into the email format, which was then utilized to extract various parts of the email”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, and Gangavarapu to include preprocessing the email content by combining the email text and metadata into a single combined format, as there is a design incentive to preprocess data into features that predictive models can use effectively, which combining text from the email body and metadata into a single format predictably does. Such a combination would be obvious.
Park teaches the following further limitation that neither Gansterer, nor Baum, nor Ryou, nor Gangavarapu explicitly teaches:
using a neural ordinary differential equation (ODE) ((Park [0017]) “According to embodiments of the present disclosure, an apparatus for an artificial intelligence neural network based on co-evolving neural ordinary differential equations (NODEs) includes a main NODE module configured to provide a downstream machine learning task”, (Park [0163]) “to show the efficacy of the method according to the present disclosure, in-depth experiments were performed using various modern NODEs models for various downstream tasks ranging from image classification to time-series prediction”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, Gangavarapu, and Park by using a neural ordinary differential equation as described in Park to perform the data preprocessing method as a downstream machine learning task, as substituting the data preprocessing method used in Gangavarapu with one involving a neural ordinary differential equation provides the predictable result of also preprocessing the data.

Regarding claim 12,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer classification device of claim 9,.
Gangavarapu further teaches:
wherein the deep learning algorithm combines text data and tabular data to create a coherent combined space ((Gangavarapu Pg. 17) “As explained in Section 4.2, we need to extract forty features (refer Table 4) from the collected raw email data. Before extracting the features, it is vital to parse the email to obtain the email body, subject line, sender address, reply-to address, modal URL, and all the links…The implementations in the Python email library provide extensive support to handle and parse email data and multipurpose internet mail extensions. First, we extracted the raw email data from the string format into the email format, which was then utilized to extract various parts of the email”, tabular data corresponds to feature data within a table, email body corresponds to text data)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, and Gangavarapu to include preprocessing the email content by combining the email data into a single combined format, as there is a design incentive to preprocess data into features that predictive models can use effectively, which combining text from text data and tabular data into a single format predictably does. Such a combination would be obvious.

Park teaches the following further limitation that neither Gansterer, nor Baum, nor Ryou, nor Gangavarapu explicitly teaches:
using a neural ordinary differential equation (ODE) ((Park [0017]) “According to embodiments of the present disclosure, an apparatus for an artificial intelligence neural network based on co-evolving neural ordinary differential equations (NODEs) includes a main NODE module configured to provide a downstream machine learning task”, (Park [0163]) “to show the efficacy of the method according to the present disclosure, in-depth experiments were performed using various modern NODEs models for various downstream tasks ranging from image classification to time-series prediction”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, Gangavarapu, and Park by using a neural ordinary differential equation as described in Park to perform the data preprocessing method as a downstream machine learning task, as substituting the data preprocessing method used in Gangavarapu with one involving a neural ordinary differential equation provides the predictable result of also preprocessing the data.

Claims 8 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Gansterer, in view of Baum, further in view of Ryou, further in view of Gangavarapu, further in view of Srivastava (U.S. Patent No. 10,181,957), hereinafter Srivastava.

Regarding claim 8,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer training device of claim 1.
Srivastava teaches the following further limitation that neither Gansterer, nor Baum, nor Ryou, nor Gangavarapu explicitly teaches:
wherein: the multiple categories additionally include marketing ((Srivastava Col. 10, Lines 58-61) “Message derived attributes may include a classification of the message as spam, email marketing, newsletter, suspicious, malicious (e.g., phishing, spear phishing, slow-and-low-attack, malicious link, malicious attachment) etc.”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, Gangavarapu, and Srivastava to include a marketing category as one of the possible categories, as there is a predictable design incentive to separately classify marketing emails from both ham and spam. Users are potentially interested in the products being sold by legitimate firms, particularly if they have voluntarily chosen to subscribe to receive such offers, so that these emails should not be sent to a spam folder where they may be missed. However, certain features of these emails, such as language intended to coax a user into clicking on links, are likely similar to those of spam emails, and placing them into a separate category from ham emails thus assist in distinguishing ham from spam emails to a finer degree. This is important, since a user is likely to not read emails caught by a spam filter, and mistakenly classifying a marketing email as spam is less consequential than mistakenly classifying a proper ham email, which is much more likely to contain important information. Such a combination would be obvious.
Gangavarapu further teaches:
and marketing category is identified as non-critical ((Gangavarapu Pg. 39) “When a spam email message is misclassified as a ham email, it causes a rather insignificant problem (user only needs to delete such an email). However, when ham emails are misclassified as spam or phishing emails, there is a possibility of losing vital information (specifically in scenarios where spam emails are deleted automatically), while phishing emails that are misclassified as ham emails result in a breach of privacy (a much more serious concern)”, marketing has very similar characteristics to spam and can be considered under the same analysis)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, Gangavarapu, and Srivastava to designate the correct categorization of marketing emails as less important compared to phishing emails, as Gangavarapu teaches: (Gangavarapu Pg. 39) “phishing emails that are misclassified as ham emails result in a breach of privacy (a much more serious concern)”.

Regarding claim 14,
Gansterer, Baum, Ryou, and Gangavarapu teach The computer classification device of claim 1.
Gansterer further teaches:
the multiple categories include phishing, spam, [marketing], and clean ((Gansterer Abstract) “Moreover, in contrast to classical binary classification approaches (spam vs. not spam), a more refined ternary classification approach for filtering e-mail data is investigated which automatically distinguishes three message types: ham (solicited e-mail), spam, and phishing”, Gansterer does not explicitly teach a marketing category)
Srivastava teaches the following further limitation that neither Gansterer, nor Baum, nor Ryou, nor Gangavarapu explicitly teaches:
the multiple categories include…marketing ((Srivastava Col. 10, Lines 58-61) “Message derived attributes may include a classification of the message as spam, email marketing, newsletter, suspicious, malicious (e.g., phishing, spear phishing, slow-and-low-attack, malicious link, malicious attachment) etc.”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, and Srivastava to include a marketing category as one of the possible categories, as there is a predictable design incentive to separately classify marketing emails from both ham and spam. Users are potentially interested in the products being sold by legitimate firms, particularly if they have voluntarily chosen to subscribe to receive such offers, so that these emails should not be sent to a spam folder where they may be missed. However, certain features of these emails, such as language intended to coax a user into clicking on links, are likely similar to those of spam emails, and placing them into a separate category from ham emails thus assist in distinguishing ham from spam emails to a finer degree. This is important, since a user is likely to not read emails caught by a spam filter, and mistakenly classifying a marketing email as spam is less consequential than mistakenly classifying a proper ham email, which is much more likely to contain important information. Such a combination would be obvious.
Gangavarapu further teaches:
the phishing category is identified as critical; and the spam, marketing, and clean categories are each identified as non-critical ((Gangavarapu Pg. 39) “When a spam email message is misclassified as a ham email, it causes a rather insignificant problem (user only needs to delete such an email). However, when ham emails are misclassified as spam or phishing emails, there is a possibility of losing vital information (specifically in scenarios where spam emails are deleted automatically), while phishing emails that are misclassified as ham emails result in a breach of privacy (a much more serious concern)”)
At the time of filing, one of ordinary skill in the art would have motivation to combine Gansterer, Baum, Ryou, Gangavarapu, and Srivastava to designate the correct categorization of phishing emails as singularly important compared to spam, marketing, and ham emails, as Gangavarapu teaches: (Gangavarapu Pg. 39) “phishing emails that are misclassified as ham emails result in a breach of privacy (a much more serious concern)”.















Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Bhatti et al. “Email Classification using LSTM: A Deep Learning Technique” uses deep learning to classify emails into one of four different categories: fraudulent, suspicious, harassment, and normal, with the input dataset for training the neural network being imbalanced.
Hina et al. “SeFACED: Semantic-Based Forensic Analysis and Classification of E-Mail Data Using Deep Learning” uses deep learning to classify emails into one of three different categories: fraudulent, harassing, and suspicious, and uses preprocessing techniques including text embeddings to make their textual data suitable for input into their neural network.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to VICTOR A NAULT whose telephone number is (703) 756-5745. The examiner can normally be reached M - F, 12 - 8.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached at (571) 270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/V.A.N./Examiner, Art Unit 2124                                                                                                                                                                                                        

/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2142