Patent Application 17490514 - TRAINING DATASET GENERATION FOR SPEECH-TO-TEXT

Title: TRAINING DATASET GENERATION FOR SPEECH-TO-TEXT SERVICE

Application Information

Invention Title: TRAINING DATASET GENERATION FOR SPEECH-TO-TEXT SERVICE
Application Number: 17490514
Submission Date: 2025-05-19T00:00:00.000Z
Effective Filing Date: 2021-09-30T00:00:00.000Z
Filing Date: 2021-09-30T00:00:00.000Z
National Class: 704
National Sub-Class: 260000
Examiner Employee Number: 80018
Art Unit: 2657
Tech Center: 2600

Rejection Summary

102 Rejections: 0
103 Rejections: 2

Cited Patents

The following patents were cited in the rejection:

Office Action Text

DETAILED ACTION
Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Pre-Brief Conference
A conference was held and the rejection has been withdrawn, however, the claims are rejected based on reasons below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 10-15 and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aher et al. (PGPUB 2021/0034662), hereinafter referenced as Aher in view of Anisimovich et al. (PGUB 2020/0104354), hereinafter referenced as Anisimovich and in further view of Saon et al. (PGPUB 2020/0335100), hereinafter referenced as Saon.

Regarding claims 1, 15 and 20, Aher discloses a computer-implemented method, system and media, hereinafter referenced as a method of automated speech-to-text training data generation comprising:
one or more processors (p. 0030-0032);
memory storing a plurality of stored linguistic expression generation templates following a syntax (p. 0030-0032);
wherein the memory is configured to cause the one or more processors to perform operations (p. 0030-0032) comprising:
based on a stored linguistic expression generation template following a syntax, generating a plurality of generated textual linguistic expressions (new text strings; p. 0026-0027, 0077); and
from the plurality of generated textual linguistic expressions, with a text-to- speech service, generating a plurality of synthetic speech audio recordings for developing a speech-to-text service (taking text expressions and converting to synthetic speech audio recordings and running the recordings through a STT; fig. 3 with p. 0026-0027, 0077). It is noted that Aher trains/conditions/tunes the STT, but does not specifically teach that it is based on recorded audio. Aher also does not teach the specific linguistic expression.
Anisimovich discloses a method comprising:
generating a plurality of generated textual linguistic expression, the linguistic generated textual linguistic expressions (linguistic expression) comprising (1) a first set of two or more alternative tokens (tokens), wherein an alternative token of the set is included within a given linguistic expression of the plurality of generated linguistic expressions (substitution); or (2) a variable configured to be replaced by a retrieved value of the variable in generating a generation linguistic expression of the plurality of generated linguistic expressions, wherein the generating in (1) or (2) comprises generating respective generated linguistic expressions for multiple tokens of the first set of two or more alternative tokens, or generating respective generated linguistic expressions using different values retrieved values of the variable (fig. 2A and abstract with p. 0033-0034, 0042-0047, 0058-0072), to enhance information extraction and automatically creating templates.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to assist with natural language processing.
Saon discloses a method comprising training the speech-to-text service using speech audio recordings (p. 0003 and 0034), to improve accuracy.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention, to assist with personalization and individual variations of speech.
Regarding claim 10, it is interpreted and rejected for similar reasons as set forth above. In addition, Anisimovich discloses a method further comprising:
receiving a target domain for service (classes; p. 0052-0062, 0091);
wherein:
generating the plurality of generated textual linguistic expressions comprises applying keywords from the target domain (p. 0062-0064).
Regarding claim 11, it is interpreted and rejected for similar reasons as set forth above. In addition, Anisimovich discloses a method wherein:
the syntax supports multiple alternative phrases (acquire, buy, purchase; fig. 3A with p. 0052-0065); and
at least one of the plurality of stored linguistic expression generation templates incorporates at least one instance of multiple alternative phrases (acquire, buy, purchase; fig. 3A with p. 0052-0065).
Regarding claim 12, it is interpreted and rejected for similar reasons as set forth above. In addition, Anisimovich discloses a method wherein:
the syntax supports optional phrases (acquire, buy, purchase; fig. 3A with p. 0052-0065); and
at least one of the plurality of stored linguistic expression generation templates incorporates an optional phrase (acquire, buy, purchase; fig. 3A with p. 0052-0065).
Regarding claim 13, Aher discloses a method further comprising:
selecting a subset of the plurality of generated synthetic speech audio recordings for training (p. 0014-0017, 0043-0052).
Regarding claim 14, it is interpreted and rejected for similar reasons as set forth above. In addition, Anisimovich discloses a method wherein:
the syntax supports regular expressions (p. 0078-0091).
Regarding claim 18, it is interpreted and rejected for similar reasons as set forth above. In addition, Anisimovich discloses a method further comprising:
a dictionary of domain-specific vocabulary comprising nouns of objects acted upon in a particular domain (p. 0033-0034, 0079, 0087);
wherein the operations further comprise:
applying the domain-specific vocabulary when generating the plurality of generated textual linguistic expressions (ontology/class; p. 0027-0033).
Regarding claim 19, it is interpreted and rejected for similar reasons as set forth above. In addition, Anisimovich discloses a method wherein:
at least one given template of the linguistic expression generation templates specifies that an attribute value is to be included when generating a textual linguistic expression from the given template (p. 0043-0049, 0054-0056, 0063-0073); and
generating the plurality of generated textual linguistic expressions comprises including a word from a domain-specific dictionary in the textual linguistic expression (p. 0027-0033, 0052-0062).

Claim(s) 2-9 and 16-17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aher in view of Anisimovich and Saon and in further view of Li et al. (PGPUB 2009/0222268), hereinafter reference as Li.

Regarding claim 2, Aher in view of Anisimovich and Saon disclose a method as described above, but does not specifically teach wherein:
generating the plurality of synthetic speech audio recordings comprises adjusting one or more pre-generation speech characteristics.
Li discloses a method wherein generating the plurality of synthetic speech audio recordings comprises adjusting one or more pre-generation speech characteristics (adding background noise; abstract with p. 0036-0038), to assist with speech synthesis.
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the method as described above, to improve the system.
Regarding claim 3, Aher discloses a method wherein:
the one or more pre-generation speech characteristics comprise speech accent (accent; p. 0026, 0049, 0074).
Regarding claim 4, it is interpreted and rejected for similar reasons as set forth above. In addition, Saon discloses a method wherein:
the one or more pre-generation speech characteristics comprise speaker gender (p. 0022).
Regarding claim 5, it is interpreted and rejected for similar reasons as set forth above. In addition, Li discloses a method wherein:
the one or more pre-generation speech characteristics comprise speech rate (p. 0028).
Regarding claim 6, it is interpreted and rejected for similar reasons as set forth above. In addition, Li discloses a method further comprising:
applying a post-generation audio adjustment to at least one of the plurality of synthetic speech audio recordings (adding background noise; abstract with p. 0036-0038).
Regarding claim 7, it is interpreted and rejected for similar reasons as set forth above. In addition, Li discloses a method wherein:
the post-generation adjustment comprises applying background noise (adding background noise; abstract with p. 0036-0038).
Regarding claim 8, it is interpreted and rejected for similar reasons as set forth above. In addition, Li discloses a method wherein:
the plurality of synthetic speech audio recordings are associated with respective original texts before the synthetic speech audio recording is recognized (p. 0017).
Regarding claim 9, it is interpreted and rejected for similar reasons as set forth above. In addition, Li discloses a method wherein:
a given synthetic speech audio recording is associated with original text used to generate the given synthetic speech audio recording (p. 0017); and
the original text is used during the training (p. 0017).
Regarding claim 16, it is interpreted and rejected for similar reasons as set forth above. In addition, Li discloses a method further comprising:
a digital representation of background noise (adding background noise; abstract with p. 0036-0038);
wherein the operations further comprise:
applying the digital representation of background noise to at least one of the plurality of synthetic speech audio recordings (adding background noise; abstract with p. 0036-0038).
Regarding claim 17, it is interpreted and rejected for similar reasons as set forth above. In addition, Li discloses a method wherein the operations further comprise:
receiving an indication of a custom background noise (adding background noise; abstract with p. 0036-0038); and
using the custom background noise as the digital representation of background noise (adding background noise; abstract with p. 0036-0038).
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. This information has been detailed in the PTO 892 attached (Notice of References Cited).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAKIEDA R JACKSON whose telephone number is (571)272-7619. The examiner can normally be reached Mon - Fri 6:30a-2:30p.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on 571.272.5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/JAKIEDA R JACKSON/Primary Examiner, Art Unit 2657

Patent Application 17490514 - TRAINING DATASET GENERATION FOR SPEECH-TO-TEXT - Rejection

Patent Application 17490514 - TRAINING DATASET GENERATION FOR SPEECH-TO-TEXT

Application Information

Rejection Summary

Cited Patents

Office Action Text

Transform your business with AI in minutes, not months