Patent Application 17889166 - SYSTEMS AND METHODS FOR ZERO-SHOT TEXT - Rejection
Appearance
Patent Application 17889166 - SYSTEMS AND METHODS FOR ZERO-SHOT TEXT
Title: SYSTEMS AND METHODS FOR ZERO-SHOT TEXT CLASSIFICATION WITH A CONFORMAL PREDICTOR
Application Information
- Invention Title: SYSTEMS AND METHODS FOR ZERO-SHOT TEXT CLASSIFICATION WITH A CONFORMAL PREDICTOR
- Application Number: 17889166
- Submission Date: 2025-05-16T00:00:00.000Z
- Effective Filing Date: 2022-08-16T00:00:00.000Z
- Filing Date: 2022-08-16T00:00:00.000Z
- National Class: 704
- National Sub-Class: 009000
- Examiner Employee Number: 98748
- Art Unit: 2658
- Tech Center: 2600
Rejection Summary
- 102 Rejections: 0
- 103 Rejections: 1
Cited Patents
No patents were cited in this rejection.
Office Action Text
DETAILED ACTION Notice of Pre-AIA or AIA Status The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Response to Amendment This communication is responsive to the applicantâs amendment dated 1/17/2025. The applicant amended claims 1, 11, and 20. Response to Arguments Applicant's arguments with respect to 35 U.S.C 101 (pg. 8, line 18- pg. 11, line 3) filed 1/17/2025 have been fully considered but they are not persuasive. First the applicant argues (pg. 9, line 20- pg. 10, line 2) âinstead of any generic computer component, a specific classifier model is recited in claim 1. Therefore, the amended claim 1 is not directed to merely an âabstract ideaââ. The examiner respectfully disagrees. The applicant supports their argument by pointing out specific components of the invention from the specification. These components need to be recited in the claims to be considered as non-generic computer components. The applicantâs invention is directed to analyzing text using a generic computer component (processor). This is a mental process that human can do either manually or mentally. Next, the applicant argues (pg. 10, line 3- pg. 11, line 3) the abstract idea is integrated in a practical application of classifying topics in news text. The applicant supports their arguments with examples from the specification showing how classification technology is improved. However, the practical application and improvement must be evident in the claim language. MPEP 2106.05(a) states, âAfter the examiner has consulted the specification and determined that the disclosed invention improves technology, the claim must be evaluated to ensure the claim itself reflects the disclosed improvement in technology.â In the examinerâs opinion, the claim language does not provide a practical application or improvement in technology. Therefore, the 35 USC 101 rejection is maintained. Applicantâs arguments with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. The applicant argues (pg. 11, line 4 â pg. 12, line 17) Sewak fails to teach, âgenerating, via a base classifier implemented on one or more processors, a subset of classification labels having a fewer number of classification labels from the set of classification labels for the input textâ. Additionally, the applicant states that Sewak fails to teach, âa predicted classification label selected from the subset of classification labels for the input textâ. Given that the applicant has amended the claims, a new ground of rejection is now being issued. Claim Rejections - 35 USC § 101 35 U.S.C. 101 reads as follows: Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Independent claims 1, 11, and 20 recite, âreceiving, at a zero-shot classification model associated with a set of classification labels having one or more classification labels that the zero-shot classification model has not been trained on, the input textâ, âgenerating, via a base classifier implemented on one or more processors, a subset of classification labels having a fewer number of classification labels from the set of classification labels for the input text by: generating, via the base classifier, a first set of non-conformity scores corresponding to the set of classification labels based on a calibration datasetâ, âcomputing a non-conformity threshold based on the set of non- conformity scores and a pre-defined error rateâ, âgenerating, via the base classifier, a second set of non-conformity scores, wherein each non-conformity score in the second set is obtained by comparing a class prediction of the input text generated by the base classifier and each label from the set of classification labelsâ, âdetermining the subset of classification labels by selecting classification labels with corresponding non-conformity scores from the second set of non- conformity scores that are less than the non-conformity thresholdâ, âgenerating, via the zero-shot classification model implemented on one or more processors, a predicted classification label selected from the subset of classification labels for the input textâ, and displaying, the predicted classification label on a user interface (UI) application. The limitation of receiving an input text is a process, that under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting, âa communication interfaceâ, âa memoryâ, âa plurality of processorâ, and âa non-transitory processor-readable storage mediumâ, nothing in the claims precludes the steps from practically being performed in the mind. For example, but for the generic computer components recited above, âreceivingâ in the context of this claim encompasses receiving user input through text, which a human can do with a pen and paper. Next, the limitation of generating non-conformity scores is a process, that under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the generic computer components above, nothing in the claims precludes the steps from practically being performed in the mind. For example, but for the generic computer components recited above, âgeneratingâ in the context of this claim encompasses further scoring labels, which a human can do with a pen and paper. Next, the limitation of computing a non-conformity threshold based on non-conformity scores and a pre-defined error rate is a process, that under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the generic computer components above, nothing in the claims precludes the steps from practically being performed in the mind. For example, but for the generic computer components recited above, âcomputingâ in the context of this claim encompasses calculating a set limit based off other values, which a human can do with a pen and paper. Next, the limitation of generating non-conformity scores by comparing a class prediction and label is a process, that under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the generic computer components above, nothing in the claims precludes the steps from practically being performed in the mind. For example, but for the generic computer components recited above, âgeneratingâ in the context of this claim encompasses generating a similarity score based off a prediction and actual label, which a human with a pen and paper. Next, the limitation of determining the subset of classification labels is a process, that under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the generic computer components above, nothing in the claims precludes the steps from practically being performed in the mind. For example, but for the generic computer components recited above, âdeterminingâ in the context of this claim encompasses determining further subclassifications based off similarity scores and thresholds, which a human can do with a pen and paper. Next, the limitation of generating a predicted classification label is a process, that under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the generic computer components above, nothing in the claims precludes the steps from practically being performed in the mind. For example, but for the generic computer components recited above, âgeneratingâ in the context of this claim encompasses further categorizing labels, which a human can do with a pen and paper. Lastly, the limitations of displaying a classification label, is a process, that under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting the generic computer components user interface (UI), nothing in the claims precludes the steps from practically being performed in with a pen and paper. For example, but for the generic computer components recited above, âdisplayingâ in the context of this claim encompasses displaying text, which a human can do with a pen and paper. The judicial exception is not integrated into a practical application. In particular, the claims only recite the elements âa communication interfaceâ, âa memoryâ, âa plurality of processorâ, âa non-transitory processor-readable storage mediumâ, and a user interface (UI). These elements in these steps are recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. Accordingly, the additional elements do no integrate the abstract idea into a practical application because it does not impose meaningful limits on practicing the abstract idea. The claims are directed to an abstract idea. The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements perform the recited steps amounts no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using generic computer components cannot provide an inventive concept. The claims are not patent eligible. Dependent claim 2-10 and 12-19 are also rejected for the same reason provided in the independent claims above. The dependent claim, including the further recited limitation, does not integrate the abstract idea into a practical application and the additional elements, taken individually and in combination do not contribute to an inventive concept. In other words, the dependent claims are directed to an abstract idea without significantly more. Claim Rejections - 35 USC § 103 In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action: A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made. Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Sewak et al. US 20220414137 A1 (hereinafter Sewak) in view of Xu et al US 20230214679 A1 (hereinafter Xu). Regarding claim 1, 11, and 20, Sewak teaches a method for efficient zero-shot classification of classifying a topic for an input text from a news source, the method comprising, a system for efficient zero-shot classification, the system comprising: a communication interface that receives the input text (FIG. 1 [0056] âthe operating system 107 converts audio signal input to a text string and labeling application 110 receives the text string as an input. In an embodiment, operating system 107 receives keystrokes from a keyboard 115 and provides a text string to labeling application 110. The labeling application 110 also receives candidate text to be classified from the user in a similar fashion. Candidate text might be received by the labeling application 110 from user input or from a document in a corpus 154 of system documents.â, examiner interprets 107 to be the communication interface); a memory storing (FIG. 8, 812): a zero-shot classification model associated with a set of classification labels having one or more classification labels that the zero-shot classification model has not been trained on; (FIG. 4, 420, [0099] âthe system probes the generative model in zero-shot mode with some arbitrary positive and negative sentence with Boolean/multinominal indexed classes and then the input-sentence for which the model is expected to generate a similar Boolean/Multinominal class label along with its associated âtoken probabilityââ; [0114] âZero-Shot Learning. In this approach very large (billions of parameters, e.g., GPT-3) NLP models are trained to generate text on ever larger unlabelled training dataâ, [0121] âzero-shot/few-shot classification techniques, which require an existing (unlabelled) dataset that it classifiesâ); a base classifier (FIG. 1, 142, examiner interprets the labeling service as the base classifier); and a plurality of processor-executable instructions for efficient zero-shot classification (FIG. 8, 814, [0217] âA system comprising: one or more processors; and one or more computer storage media storing computer-useable instructions that, when used by the one or more processors, cause the one or more processors to perform a method.â; [0214] âThe media as in any of the preceding embodiments, wherein the generative model is GPT3 run in zero shot mode.â); and one or more hardware processors reading and executing the plurality of processor- executable instructions from the memory to perform operations comprising ([0162] âvarious functions may be carried out by a processor executing instructions stored in memory. The methods may also be embodied as computer-usable instructions stored on computer storage mediaâ), a non-transitory processor-readable storage medium storing a plurality of processor-executable instructions for efficient zero-shot classification, the plurality of processor-executable instructions executed by one or more hardware processors to perform operations comprising: receiving, at a zero-shot classification model associated with a set of classification labels having one or more classification labels that the zero-shot classification model has not been trained on, the input text ([0050] âA zero-shot generative mode is generally a mode of a generative NLP model capable of generating text without fine-tuning with a specific type of data. A generative NLP model generally receives an input text string and produces a generative result that is text, which is generated at the prompting of the input text stringâ, examiner interprets the NLP model as the classification model; FIG. 3 [0083] âAt step 310, the labeling service 142 receives a text string defining a label, e.g. from labeling application 110â; [0114] âZero-Shot Learning. In this approach very large (billions of parameters, e.g., GPT-3) NLP models are trained to generate text on ever larger unlabelled training dataâ, [0121] âzero-shot/few-shot classification techniques, which require an existing (unlabelled) dataset that it classifiesâ); generating, via the base classifier, a first set of non-conformity scores corresponding to the set of classification labels based on a calibration dataset (FIG. 3 [0096] âAt step 385, the scoring label service 142 accumulates performance, records estimates, similarity weights, and class labels into a library of known performance. The label scoring service 168 makes use of a repository of vector and similarity algorithms and determines if the label for a present score is similar to labeling methods available in the libraryâ); computing a non-conformity threshold based on the set of non- conformity scores and a pre-defined error rate (FIG. 2 [0066] âAdditionally, display area 209 could provide a display of a result, such as true or false, based on a threshold decision of label class membership applied to the estimate of probabilityâ, examiner interprets the probability for a âfalseâ result as the pre-defined error rate); generating, via the base classifier, a second set of non-conformity scores wherein each non-conformity score in the second set is obtained by comparing a class prediction of the input text generated by the base classifier and each label from the set of classification labels ([0032] âThe labeling service makes use of a generative model to produce a generative result, which estimates the likelihood that the label properly applies to the candidate text. The success rate of the classification can be improved, while maintaining this improved efficiency, by obtaining a second generative result from a generative model and estimating label probability using the second generative resultâ; [0096] âAt step 385, the scoring label service 142 accumulates performance, records estimates, similarity weights, and class labels into a library of known performance. The label scoring service 168 makes use of a repository of vector and similarity algorithms and determines if the label for a present score is similar to labeling methods available in the libraryâ); displaying, the predicted classification label on a user interface (UI) application ([0056] âthe labeling application 110 provides a result of classification, such as an indication presented on display 120 that the candidate text likely belongs to the user-defined labelâ) Sewak fails to teach generating, via a base classifier implemented on one or more processors, a subset of classification labels having a fewer number of classification labels from the set of classification labels for the input text by; determining the subset of classification labels by selecting classification labels with corresponding non-conformity scores from the second set of non- conformity scores that are less than the non-conformity threshold; generating, via the zero-shot classification model implemented on one or more processors, a predicted classification label selected from the subset of classification labels for the input text However, Xu teaches generating, via a base classifier implemented on one or more processors, a subset of classification labels having a fewer number of classification labels from the set of classification labels for the input text by ([0035] âUpon receiving the candidate term(s), the key entity classification manager 116 may then apply a classification model to the extracted entities to determine a subset of entities (e.g., key entities of interest) associated with the one or more candidate termsâ, examiner interprets the subset of entities to be reduced in number in comparison to the extracted entities) determining the subset of classification labels by selecting classification labels with corresponding non-conformity scores from the second set of non- conformity scores that are less than the non-conformity threshold ([0060] âthe key entity classification manager 116 outputs an extraction report including an identified subset of entities (e.g., key entities) based on those entities having a higher importance score in combination with the subset of entities being associated with the candidate terms 212. For example, the key entity classification manager 116 may output an extraction report 214 including any number of key entities that satisfy a threshold importance score and which are determined to be associated with at least one of the candidate terms 212â) generating, via the zero-shot classification model implemented on one or more processors, a predicted classification label selected from the subset of classification labels for the input text ([0087] âthe classification model is a zero-shot classification model having been trained to associate a given input term with at least one term from a set of base terms⌠The classification model may further be configured to associate the subset of entities from the collection of entities with the candidate term based on a determined association between the subset of entities and the base termâ) Sewak in view of Xu considered to be analogous to the claimed invention because both are the same field of text classification. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the techniques automatic labeling of text data of Sewak with the technique of generating classification labels from a subset of labels taught by Xu in order to classify certain entities using a combination of rule-based and machine learning approaches (see Xu [Abstract]). Regarding claims 2 and 12, Sewak in view of Xu teaches all of the limitations of claim 1 and 11, upon which claim 2 and 12 depends. Additionally, Sewak teaches wherein the base classifier has a smaller size or is more computationally efficient than the zero-shot classification model ([0114] âAnother scalable but insufficient technique is mere Zero-Shot Learning. In this approach very large (billions of parameters, e.g., GPT-3) NLP models are trained to generate text (as opposed to classify text) on ever larger unlabelled training data. It is assumed that when the available few training samples are used as prompt to generate text, the models could serve as a pseudo-NLP-classification model, and hence alleviate the need of training with large, labelled training dataâ; [0121] â. Additionally, as opposed to zero-shot/few-shot classification techniques, which require an existing (unlabelled) dataset that it classifies, the disclosed method that fulfils both augmentation and pre-classification requirements. The augmentation method disclosed automatically and intelligently acquires and buckets the data samples in the correct data sub-set, ready for any classification modelâ examiner interprets labeling service 142 as the base classifier and NLP model as the classification model). Regarding claims 3 and 13, Sewak in view of Xu teaches all of the limitations of claim 1 and 11, upon which claim 3 and 13 depends. Additionally, Sewak teaches wherein the first set of non-conformity scores are generated by: receiving the calibration dataset including a plurality of texts and corresponding labels that belong to the set of classification labels ([0065] âThe labeling application 110 sends the two strings (candidate text and label) to labeling service 142â); generating, via a base classifier model, a plurality of predicted labels corresponding to an input of the plurality of texts ([0067] âLabeling service 142 returns a candidate label-class prediction, such as 1 for true, 0 for false, to provide a binary classification outputâ); computing the first set of non-conformity scores by comparing the plurality of predicted labels and the corresponding labels from the calibration dataset (FIG. 3 [0096] âAt step 385, the scoring label service 142 accumulates performance, records estimates, similarity weights, and class labels into a library of known performance. The label scoring service 168 makes use of a repository of vector and similarity algorithms and determines if the label for a present score is similar to labeling methods available in the libraryâ, examiner interprets the content in the library as predicted labels). Regarding claims 4 and 14, Sewak in view of Xu teaches all of the limitations of claim 3 and 13, upon which claim 4 and 14 depends. Additionally, Sewak teaches wherein the first set of non-conformity scores are computed based on a percentage of common tokens between representative tokens corresponding to each classification label and a specific text from the calibration dataset ([0099] âFor this system the system probes the generative model in zero-shot mode with some arbitrary positive and negative sentence with Boolean/multinominal indexed classes and then the input-sentence for which the model is expected to generate a similar Boolean/Multinominal class label along with its associated âtoken probabilityâ. The associated token probability is normalized/scaled with historical model specific range parameters to be used as prediction probability/likelihoodâ). Regarding claims 5 and 15, Sewak in view of Xu teaches all of the limitations of claim 3 and 13, upon which claim 5 and 15 depends. Additionally, Sewak teaches wherein the first set of non-conformity scores are computed based on a cosine distance between a bag-of-words representation of each classification label description and a specific text from the calibration dataset ([0089] âAn exemplary method of evaluating an overall semantic similarity between the keyword structure of the text snippet and the label keyword structure could be the use of cosine similarity based on a vectorized transformation of graph terms, or some other method provided by vectorization functions 156â; [0177] âThe method 1400, at block 1416, includes determining a label probability estimate by comparing the first ranked score of the positive example result to the second ranked score of the negative example result. The reconciliation rules disclosed herein may be used to estimate probability. In an embodiment label probability is a scaled comparison between the average positive example cosine similarity and the average negative example cosine similarityâ). Regarding claims 6 and 16, Sewak in view of Xu teaches all of the limitations of claim 3 and 13, upon which claim 6 and 16 depends. Additionally, Sewak teaches wherein the first set of non-conformity scores are computed as negative of class logits generated by the base classification model in response to a specific text from the calibration dataset (FIG. 12, [0169] âIn the LP method, at steps 1250 the results that exceed a threshold of predictability provides scaling to positive probabilities and to negative probabilities to approximate a label probabilityâ, examiner interprets negative probabilities as negative of class logits). Regarding claims 7 and 17, Sewak in view of Xu teaches all of the limitations of claim 3 and 13, upon which claim 7 and 17 depends. Additionally, Sewak teaches wherein the first set of non-conformity scores are computed as negative entailment probabilities logits generated by the base classification model in response to a specific text from the calibration dataset ([0109] âAt step 1215, the log-probability of the label is determined from the input. At step 1220, the method 1200 takes the negative examples as input with candidate text, e.g. by using a sentence conjunction technique that combines text example with anti-label type. The method proceeds to step 1225 where the log-probability of the anti-label is determined from the input. Similarly, at step 1230 next text is predicted with all combinations of example and candidate text, e.g. by using a sentence conjunction technique to combine text example with label type. The method proceeds to step 1235 where the log-probability of key terms/tokens are derived, and used as a threshold indication. The method proceeds to step 1240 where a test is performed to see if the thresholds obtained ensure that the separation between the log-probability of the candidate text in conjunction with the label is separated sufficiently from the log-probability of the candidate text in conjunction with the anti-label. If the threshold is not valid the method proceeds to step 1245 where an error signal is generated. Otherwise, the method proceeds to step 1250 where the positive and negative probabilities are scaled to generate a prediction probability, and a prediction in favor of the class with the higher score is generatedâ). Regarding claims 8 and 18, Sewak in view of Xu teaches all of the limitations of claim 3 and 13, upon which claim 8 and 18 depends. Additionally, Sewak teaches wherein the corresponding labels in the calibration dataset is generated by the zero-shot classification model in response to an input of the plurality of texts ([0081] âFIG. 3 shows the processing flow of the labeling service 142 that performs a computerized method of rendering a result, such as that shown in display area 209, which is sent to labeling application 110 when the labeling service completes without error message to provide a valid estimateâŚSome examples of similar models could be (but not limited to) GPT-3, Microsoft DeBerta etc, preferably models with a good zero-shot generative capabilities modeâ). Regarding claims 9 and 19, Sewak in view of Xu teaches all of the limitations of claim 1 and 11, upon which claim 9 and 19 depends. Additionally, Sewak teaches wherein a given label from the set of classification labels comprises an ensemble of class descriptions, including any of a hypothesis in natural language inference, or a next sentence for next sentence prediction ([0064] âWhen the user enters text to define the label into graphical control 206, the labeling application 110 receives the label text string. A text string defining a label could be a word, a term or a description of an arbitrary concept or ideaâ; FIG. 11 âThe feeling/sentiment tag has been assigned to happy and harmony. The graph structure shown provides richer terms, and also a richer order description which includes not only order but also strength and similarityâ; FIG. 12 âat step 1230 next text is predicted with all combinations of example and candidate text, e.g. by using a sentence conjunction technique to combine text example with label typeâ); or an ensemble of prompts or an ensemble of verbalizer when the zero-shot classification model is a prompt-based classification model ([0050] âA zero-shot generative mode is generally a mode of a generative NLP model capable of generating text without fine-tuning with a specific type of data. A generative NLP model generally receives an input text string and produces a generative result that is text, which is generated at the prompting of the input text stringâ; [0056] In an aspect, the technology is directed toward a computerized system, e.g. shown in operating environment 100 that performs a method to classify a text as either belonging to a user-defined text label or not belonging to that label. A labeling application 110 in the operating environment 100 may present a prompt to the user on a display 120.). Regarding claim 10, Sewak in view of Xu teaches all of the limitations of claim 1, upon which claim 10 depends. Additionally, Sewak teaches wherein the calibration dataset comprises data corresponding to a similar task when calibration data from a zero-shot task is unavailable ([0191] âGenerally, program components, including routines, programs, objects, components, data structures, and the like, refer to code that performs particular tasks or implements particular abstract data typesâ; [0121] âAdditionally, as opposed to zero-shot/few-shot classification techniques, which require an existing (unlabelled) dataset that it classifies, the disclosed method that fulfils both augmentation and pre-classification requirementsâ, examiner assumes zero-shot is unavailable when an unlabelled dataset that it classifies is unavailable). Conclusion The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Buchhold et al. (POLIBITS, 60, 73-81. (2019)) teaches a method to detect topics in news articles by training a model that can be seen as a similarity function between such a descriptive document and a news article. The model is a neural network that operates on two kinds of inputs. (1) The full texts of the descriptive documents and the news articles are passed through the same recurrent encoder network and then the distance of the resulting encodings is taken. (2) a proprietary NLP pipeline and knowledge base are used to recognize named entities and significant keywords and computed features based on their overlap for a descriptive document and a news article. The model finally combines the encoding distance with the overlap features and acts as a binary classifier. Evaluation and comparison of several model configurations on two datasets, a large one automatically created from Wikipedia and a smaller one created manually. Zhao (US 20200251091 A1) teaches a system and method of creating the natural language understanding component of a speech/text dialog system. The method involves a first step of defining user intent in the form of an intent flow graph. Next, (context, intent) pairs are created from each of the plurality of intent flow graphs and stored in a training database. A paraphrase task is then generated from each (context, intent) pair and also stored in the training database. A zero-shot intent recognition model is trained using the plurality of (context, intent) pairs in the training database to recognize user intents from the plurality of paraphrase tasks in the training database. Once trained, the zero-shot intent recognition model is applied to user queries to generate semantic outputs. Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. Any inquiry concerning this communication or earlier communications from the examiner should be directed to ZEESHAN SHAIKH whose telephone number is (703)756-1730. The examiner can normally be reached Monday-Friday 7:30AM-5:00PM. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examinerâs supervisor, Richemond Dorvil can be reached at (571) 272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. /ZEESHAN MAHMOOD SHAIKH/Examiner, Art Unit 2658 /RICHEMOND DORVIL/Supervisory Patent Examiner, Art Unit 2658