Patent Application 18330953 - AUDIO SCENE ENCODER AUDIO SCENE DECODER AND

Title: AUDIO SCENE ENCODER, AUDIO SCENE DECODER AND RELATED METHODS USING HYBRID ENCODER-DECODER SPATIAL ANALYSIS
Application Information

Invention Title: AUDIO SCENE ENCODER, AUDIO SCENE DECODER AND RELATED METHODS USING HYBRID ENCODER-DECODER SPATIAL ANALYSIS
Application Number: 18330953
Submission Date: 2025-05-21T00:00:00.000Z
Effective Filing Date: 2023-06-07T00:00:00.000Z
Filing Date: 2023-06-07T00:00:00.000Z
National Class: 704
National Sub-Class: 500000
Examiner Employee Number: 90810
Art Unit: 2691
Tech Center: 2600
Rejection Summary

102 Rejections: 1
103 Rejections: 1
Cited Patents

The following patents were cited in the rejection:
US 0071446🔗
US 0230436🔗
Office Action Text


    DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the "right to exclude" granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Langi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321 (c) or 1.321 (d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. tor applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321 (b).
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-l.jsp.

Application #18/330,953
Claim 1: Audio scene encoder for encoding an audio scene, the audio scene comprising at least two component signals, the audio scene encoder comprising:
a core encoder for core encoding the at least two component signals, wherein the core encoder is configured to generate a first encoded representation for a first portion of the at least two component signals, and to generate a second encoded representation for a second portion of the at least two component signals;
a spatial analyzer for analyzing the audio scene comprising the at least two component signals to derive one or more spatial parameters or one or more spatial parameter sets for the second portion of the at least two component signals; and
an output interface for forming an encoded audio scene signal, the encoded audio scene signal comprising the first encoded representation for the first portion of the at least two component signals, the second encoded representation for the second portion of the at least two component signals, and the one or more spatial parameters or the one or more spatial parameter sets for the second portion of the at least two component signals.

Patent #11,854,560
Claim 1: Audio scene encoder for encoding an audio scene, the audio scene comprising at least two component signals, the audio scene encoder comprising:
a core encoder for core encoding the at least two component signals, wherein the core encoder is configured to generate a first encoded representation for a first portion of the at least two component signals, and to generate a second encoded representation for a second portion of the at least two component signals;
a spatial analyzer for analyzing the audio scene comprising the at least two component signals to derive one or more spatial parameters or one or more spatial parameter sets for the second portion of the at least two component signals; and
an output interface for forming an encoded audio scene signal, the encoded audio scene signal comprising the first encoded representation for the first portion of the at least two component signals, the second encoded representation for the second portion of the at least two component signals, and the one or more spatial parameters or the one or more spatial parameter sets for the second portion of the at least two component signals,
wherein the core encoder is configured to generate the first encoded representation with a first frequency resolution and to generate the second encoded representation with a second frequency resolution, the second frequency resolution being lower than the first frequency resolution, from subsequent time frames from the at least two component signals, wherein a first time frame of the subsequent time frames is the first portion of the at least two component signals and a second time frame of the subsequent time frames is the second portion of the at least two component signals, or
wherein a border frequency between a first frequency subband of a time frame and a second frequency subband of the time frame coincides with a border between a scale factor band and an adjacent scale factor band or does not coincide with a border between the scale factor band and the adjacent scale factor band, wherein the scale factor band and the adjacent scale factor band are used by the core encoder, wherein the first frequency subband of the time frame is the first portion of the at least two component signals and the second frequency subband of the time frame is the second portion of the at least two component signals.
Claim 2: Audio scene encoder of claim 1,
wherein the audio scene comprises, as a first component signal, an omnidirectional audio signal, and, as a second component signal, at least one directional audio signal.
Claim 3: Audio scene encoder of claim 1,
wherein the audio scene comprises, as a first component signal, a signal captured by an omnidirectional microphone positioned at a first position, and, as a second component signal, at least one signal captured by an omnidirectional microphone positioned at a second position different from the first position.
Claim 4: Audio scene encoder of claim 1,
wherein the audio scene comprises, as a first component signal, at least one signal captured by a directional microphone directed to a first direction, and, as a second component signal, at least one signal captured by a directional microphone directed to a second direction, the second direction being different from the first direction.
Claim 5: Audio scene encoder of claim 1,
wherein the audio scene comprises A-format component signals, B-format component signals, First-Order Ambisonics component signals, Higher-Order Ambisonics component signals, or component signals captured by a microphone array with at least two microphone capsules.
Claim 6: Audio scene encoder of claim 1,
wherein the audio scene comprises component signals as determined by a virtual microphone calculation from an earlier recorded or synthesized sound scene.
Claim 7: Audio scene encoder of claim 1,
wherein the first portion of the at least two component signals is a first frequency subband of a time frame and the second portion of the at least two component signals is a second frequency subband of the time frame, and wherein the core encoder is configured to use a predetermined border frequency between the first frequency subband and the second frequency subband.
Claim 8: Audio scene encoder of claim 1,
wherein the core encoder comprises a dimension reducer for reducing a dimension of the audio scene to obtain a lower dimension audio scene, wherein the core encoder is configured to calculate the first encoded representation for the first portion of the at least two component signals from the lower dimension audio scene, and wherein the spatial analyzer is configured to derive the spatial parameters from the audio scene having a dimension being higher than the dimension of the lower dimension audio scene.
Claim 9: Audio scene encoder of claim 1,
wherein the core encoder is configured to generate the first encoded representation for the first portion of the at least two component signals comprising M component signals, and to generate the second encoded representation for the second portion of the at least two component signals comprising N component signals, and wherein M is greater than N and N is greater than or equal to 1.
Claim 10: Audio scene encoder of claim 1,
wherein the first portion of the at least two component signals is a first frequency subband of the at least two component signals, and wherein the second portion of the at least two component signals is a second frequency subband of the at least two component signals, and
wherein the spatial analyzer is configured to calculate, for the second frequency subband, as the one or more spatial parameters, at least one of a direction parameter and a non-directional parameter.
Claim 11: Audio scene encoder of claim 1, 

























wherein the core encoder comprises:
a time-frequency converter for converting sequences of time frames comprising a time frame of the at least two component signals into sequences of spectral frames for the at least two component signals,
a spectral encoder for quantizing and entropy-coding spectral values of a frame of the sequences of spectral frames within a first frequency subband of a spectral frame corresponding to a first frequency subband; and
a parametric encoder for parametrically encoding spectral values of the spectral frame within a second frequency subband of the spectral frame corresponding to a second frequency subband.
Claim 12: Audio scene encoder of claim 1,
wherein the core encoder comprises a time domain or a mixed time domain and frequency domain core encoder for performing a time domain or a mixed time domain and frequency domain encoding operation of a lowband portion of a time frame, the lowband portion corresponding to the first portion of the at least two component signals.
Claim 13: Audio scene encoder of claim 1,
wherein the spatial analyzer is configured to subdivide the second portion being a second frequency subband into analysis bands, wherein a bandwidth of an analysis band is greater than or equal to a bandwidth associated with two adjacent spectral values processed by the core encoder within the first portion being a first frequency subband, or is lower than a bandwidth of a lowband portion representing the first frequency subband, and wherein the spatial analyzer is configured to calculate at least one of a direction parameter and a diffuseness parameter for each analysis band of the second frequency subband.
Claim 14: Audio scene encoder of claim 1,
wherein the core encoder and the spatial analyzer are configured to use a common filterbank or different filterbanks having different characteristics.
Claim 15: Audio scene encoder of claim 1,
wherein the spatial analyzer is configured to subdivide the second portion being a second frequency subband into analysis bands, wherein a bandwidth of an analysis band is greater than or equal to a bandwidth associated with two adjacent spectral values processed by the core encoder within the first portion being a first frequency subband, or is lower than a bandwidth of a lowband portion representing the first frequency subband,


wherein the spatial analyzer is configured to calculate a direction parameter and a diffuseness parameter for each analysis band of the second frequency subband, and 
wherein the spatial analyzer is configured to use, in the calculation of the direction parameter, an analysis band being smaller than an analysis band used in the calculation of the diffuseness parameter.
Claim 16: Audio scene encoder of claim 1,
wherein the core encoder comprises a multi-channel encoder for generating an encoded multi-channel signal for the at least two component signals.
Claim 17: Audio scene encoder of claim 1,
wherein the core encoder comprises a multi-channel encoder for generating two or more encoded multi-channel signals, when a number of component signals of the at least two component signals is three or more.
Claim 18: Audio scene encoder of claim 1,
wherein the core encoder is configured to generate the first encoded representation with a first resolution and to generate the second encoded representation with a second resolution, wherein the second resolution is lower than the first resolution.
Claim 19: Audio scene encoder of claim 1,
wherein the core encoder is configured to generate the first encoded representation with a first time or first frequency resolution and to generate the second encoded representation with a second time or second frequency resolution, the second time or frequency resolution being lower than the first time or frequency resolution.
Claim 20: Audio scene encoder of claim 1,
wherein the output interface is configured for not including any spatial parameters for the first portion of the at least two component signals or a first frequency subband into the encoded audio scene signal, or for including a smaller number of spatial parameters for the first frequency subband into the encoded audio scene signal compared to a number of the spatial parameters for a second frequency subband.
Claim 21: Method of encoding an audio scene, the audio scene comprising at least two component signals, the method comprising:
core encoding the at least two component signals, wherein the core encoding comprises generating a first encoded representation for a first portion of the at least two component signals, and generating a second encoded representation for a second portion of the at least two component signals;
analyzing the audio scene comprising the at least two component signals to derive one or more spatial parameters or one or more spatial parameter sets for the second portion of the at least two component signals; and
forming the encoded audio scene signal, the encoded audio scene signal comprising the first encoded representation, the second encoded representation, and the one or more spatial parameters or the one or more spatial parameter sets for the second portion of the at least two component signals.



























































































Claim 22: Non-transitory storage medium, having stored thereon a computer program for performing, when running on a computer or a processor, the method of claim 21.
Claim 2: Audio scene encoder of claim 1,
wherein the audio scene comprises, as a first component signal, an omnidirectional audio signal, and, as a second component signal, at least one directional audio signal, or

wherein the audio scene comprises, as a first component signal, a signal captured by an omnidirectional microphone positioned at a first position, and, as a second component signal, at least one signal captured by an omnidirectional microphone positioned at a second position different from the first position, or

wherein the audio scene comprises, as a first component signal, at least one signal captured by a directional microphone directed to a first direction, and, as a second component signal, at least one signal captured by a directional microphone directed to a second direction, the second direction being different from the first direction.
Claim 3: Audio scene encoder of claim 1,
wherein the audio scene comprises A-format component signals, B-format component signals, First-Order Ambisonics component signals, Higher-Order Ambisonics component signals, or component signals captured by a microphone array with at least two microphone capsules or

wherein the audio scene comprises signals as determined by a virtual microphone calculation from an earlier recorded or synthesized sound scene.
Claim 4: Audio scene encoder of claim 1,
wherein the first portion of the at least two component signals is a first frequency subband of a time frame and the second portion of the at least two component signals is a second frequency subband of the time frame, and wherein the core encoder is configured to use a predetermined border frequency between the first frequency subband and the second frequency subband, or

wherein the core encoder comprises a dimension reducer for reducing a dimension of the audio scene to obtain a lower dimension audio scene, wherein the core encoder is configured to calculate the first encoded representation for the first portion of the at least two component signals from the lower dimension audio scene, and wherein the spatial analyzer is configured to derive the spatial parameters from the audio scene having a dimension being higher than the dimension of the lower dimension audio scene, or

wherein the core encoder if is configured to generate the first encoded representation for the first portion of the at least two component signals comprising M component signals, and to generate the second encoded representation for the second portion of the at least two component signals comprising N component signals, and wherein M is greater than N and N is greater than or equal to 1.
Claim 5: Audio scene encoder of claim 1,

wherein the first portion of the at least two component signals is a first frequency subband of the at least two component signals, and wherein the second portion of the at least two component signals is a second frequency subband of the at least two component signals, and
wherein the spatial analyzer is configured to calculate, for the second frequency subband, as the one or more spatial parameters, at least one of a direction parameter and a non-directional parameter.
Claim 12: Audio scene encoder for encoding an audio scene, the audio scene comprising at least two component signals, the audio scene encoder comprising:
a core encoder for core encoding the at least two component signals, wherein the core encoder is configured to generate a first encoded representation for a first portion of the at least two component signals, and to generate a second encoded representation for a second portion of the at least two component signals;
a spatial analyzer for analyzing the audio scene comprising the at least two component signals to derive one or more spatial parameters or one or more spatial parameter sets for the second portion of the at least two component signals,
an output interface for forming an encoded audio scene signal, the encoded audio scene signal comprising the first encoded representation for the first portion of the at least two component signals, the second encoded representation for the second portion of the at least two component signals, and the one or more spatial parameters or the one or more spatial parameter sets for the second portion of the at least two component signals
wherein the core encoder comprises:
a time-frequency converter for converting sequences of time frames comprising a time frame of the at least two component signals into sequences of spectral frames for the at least two component signals,
a spectral encoder for quantizing and entropy-coding spectral values of a frame of the sequences of spectral frames within a first frequency subband of a spectral frame corresponding to a first frequency subband; and
a parametric encoder for parametrically encoding spectral values of the spectral frame within a second frequency subband of the spectral frame corresponding to a second frequency subband, or


wherein the core encoder comprises a time domain or a mixed time domain and frequency domain core encoder for performing a time domain or a mixed time domain and frequency domain encoding operation of a lowband portion of a time frame, the lowband portion corresponding to the first portion of the at least two component signals, or


wherein the spatial analyzer is configured to subdivide the second portion of the at least two component signals being a second frequency subband into analysis bands, wherein a bandwidth of an analysis band is greater than or equal to a bandwidth associated with two adjacent spectral values processed by the core encoder within the first portion of the at least two component signals being a first frequency subband, or is lower than a bandwidth of a lowband portion representing the first frequency subband,




wherein the core encoder and the spatial analyzer are configured to use a common filterbank or different filterbanks having different characteristics.


wherein the spatial analyzer is configured to subdivide the second portion of the at least two component signals being a second frequency subband into analysis bands, wherein a bandwidth of an analysis band is greater than or equal to a bandwidth associated with two adjacent spectral values processed by the core encoder within the first portion of the at least two component signals being a first frequency subband, or is lower than a bandwidth of a lowband portion representing the first frequency subband,
wherein the spatial analyzer is configured to calculate a direction parameter and a diffuseness parameter for each analysis band of the second frequency subband, and
wherein the spatial analyzer is configured to use, for calculating the direction parameter, an analysis band being smaller than an analysis band used to calculate the diffuseness parameter;

Claim 6: Audio scene encoder of claim 1,

wherein the core encoder comprises a multi-channel encoder for generating an encoded multi-channel signal for the at least two component signals, or


wherein the core encoder comprises a multi-channel encoder for generating two or more encoded multi-channel signals, when a number of component signals of the at least two component signals is three or more, or


wherein the core encoder is configured to generate the first encoded representation with a first resolution and to generate the second encoded representation with a second resolution, wherein the second resolution is lower than the first resolution, or


wherein the core encoder is configured to generate the first encoded representation with a first time or first frequency resolution and to generate the second encoded representation with a second time or second frequency resolution, the second time or frequency resolution being lower than the first time or frequency resolution, or


wherein the output interface is configured for not including any spatial parameters for the first portion of the at least two component signals or a first frequency subband into the encoded audio scene signal, or for including a smaller number of spatial parameters for the first frequency subband into the encoded audio scene signal compared to a number of the spatial parameters for a second frequency subband.
Claim 13: Method of encoding an audio scene, the audio scene comprising at least two component signals, the method comprising:
core encoding the at least two component signals, wherein the core encoding comprises generating a first encoded representation for a first portion of the at least two component signals, and generating a second encoded representation for a second portion of the at least two component signals;
analyzing the audio scene comprising the at least two component signals to derive one or more spatial parameters or one or more spatial parameter sets for the second portion of the at least two component signals; and
forming the encoded audio scene signal, the encoded audio scene signal comprising the first encoded representation, the second encoded representation, and the one or more spatial parameters or the one or more spatial parameter sets for the second portion of the at least two component signals,
wherein the core encoding comprises generating the first encoded representation with a first frequency resolution and generating the second encoded representation with a second frequency resolution, the second frequency resolution being lower than the first frequency resolution, from subsequent time frames from the at least two component signals, wherein a first time frame of the subsequent time frames is the first portion of the at least two component signals and a second time frame of the subsequent time frames is the second portion of the at least two component signals, or
wherein a border frequency between a first frequency subband of a time frame and a second frequency subband of the time frame coincides with a border between a scale factor band and an adjacent scale factor band or does not coincide with a border between the scale factor band and the adjacent scale factor band, wherein the scale factor band and the adjacent scale factor band are used by the core encoder, wherein the first frequency subband of the time frame is the first portion of the at least two component signals and the second frequency subband of the time frame is the second portion of the at least two component signals, or
wherein the forming comprises not including any spatial parameters from the same parameter kind as the one or more spatial parameters generated by the analyzing for the second portion into the encoded audio scene signal, so that only the second portion of the at least two component signals has the parameter kind, and any parameters of the parameter kind are not included for the first portion of the at least two component signals in the encoded audio scene signal, or
wherein the core encoding comprises performing a parametric encoding operation for the second portion of the at least two component signals, and performing a wave form preserving encoding operation for the first portion of the at least two component signals, or
wherein a start band for the second portion of the at least two component signals is lower than a bandwidth extension start band, and wherein a core noise filling operation performed by the core encoding does not have any fixed crossover band and is gradually used for more parts of core spectra as a frequency increases, or
wherein the core encoding comprises performing a parametric processing for a second frequency subband of a time frame, the parametric processing comprising calculating an amplitude-related parameter for the second frequency subband and quantizing and entropy-coding the amplitude-related parameter instead of individual spectral lines in the second frequency subband, and quantizing and entropy-encoding individual spectral lines in a first frequency subband of the time frame, or
wherein the core encoding comprises performing a parametric processing for a high frequency subband of a time frame corresponding to a second frequency subband of the at least two component signals, the parametric processing comprising calculating an amplitude-related parameter for the high frequency subband and quantizing and entropy-coding the amplitude-related parameter instead of a time domain audio signal in the high frequency subband, and quantizing and entropy-encoding a time domain audio signal in a low frequency subband of the time frame corresponding to the first portion of the at least two component signals, by a time domain coding operation, or
wherein the method of audio encoding operates at different bitrates, wherein a predetermined border frequency between the first portion of the at least two component signals being a first frequency subband and the second portion of the at least two component signals being a second frequency subband depends on a selected bitrate, and wherein the predetermined border frequency is lower for a lower bitrate, or wherein the predetermined border frequency is greater for a greater bitrate.
Claim 16: Non-transitory storage medium, having stored thereon a computer program for performing, when running on a computer or a processor, the method of claim [[15]] 13.


Allowable Subject Matter
Claims 7-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1 and 21-22 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Sun et al. (US #2015/0071446).

Regarding Claim 1, Sun discloses audio scene encoder for encoding an audio scene (Figs. 1, 3, and ¶0048-¶0062 and ¶0066: signal M(k,l,t) encoded by generator 103 of audio scene which includes a desired signal/component and noise signal/component), the audio scene comprising at least two component signals (Figs. 1, 3, and ¶0048-¶0062 and ¶0067: 1 to L channel signals), the audio scene encoder comprising:
a core encoder (Sun Figs. 1, 3, and ¶0048-¶0062 and ¶0067: generator 103) for core encoding the at least two component signals (Sun ¶0048-¶0062 and ¶0067: L channel signals are encoded), wherein the core encoder is configured to generate a first encoded representation for a first portion of the at least two component signals (Sun ¶0048-¶0062: M(k,l,t)), and to generate a second encoded representation for a second portion of the at least two component signals (Sun ¶0048-¶0062: noise signal of frequency subband);
a spatial analyzer for analyzing the audio scene (Sun ¶0048, ¶0054, ¶0061, ¶0066: generator 103 analyzes the audio scene to determine the perceptual hearing property for the L channel signals) comprising the at least two component signals to derive one or more spatial parameters or one or more spatial parameter sets for the second portion of the at least two component signals (Sun ¶0048, ¶0054, ¶0061, ¶0066: the perceptual hearing property assigned for the noise signal frequency subband representation); and
an output interface for forming an encoded audio scene signal (Sun Fig. 3, and ¶0048-¶0062 and ¶0067: the adder output of generator 103 forms the encoded audio scene signal M(k,l,t)), the encoded audio scene signal comprising the first encoded representation for the first portion of the at least two component signals, the second encoded representation for the second portion of the at least two component signals, and the one or more spatial parameters or the one or more spatial parameter sets for the second portion of the at least two component signals (Sun Fig. 3, and ¶0048-¶0062, ¶0066 and ¶0067: encoded signal M(k,l,t) comprises the desired signal of frequency subband of L channel signals, noise signal of frequency subband of L channel signals and the perceptual hearing property assigned for the noise signal frequency subband representation).

Claims 21-22 are rejected for the same reasons as set forth in Claim 1.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-6 is/are rejected under 35 U.S.C. 103 as being unpatentable over Sun et al. (US #2015/0071446) in view of Tsingos et al. (US #2019/0230436).

Regarding Claim 2, Sun discloses audio scene encoder of claim 1, but may not explicitly disclose wherein the audio scene comprises, as a first component signal, an omnidirectional audio signal, and, as a second component signal, at least one directional audio signal.
However, in a related field of endeavor [i.e., component signals of an audio scene] Tsingos teaches wherein the audio scene comprises, as a first component signal, an omnidirectional audio signal, and, as a second component signal, at least one directional audio signal (Tsingos ¶0027 discloses capturing an audio scene to provide component signals and further teaches in ¶0057 and ¶0064: the audio scene comprises component signals, including a first directional microphone signal [first component signal] captured by the first microphone in one spatial direction [directed to a first direction] and a second directional microphone signal [second component signal] captured by the second microphone in another different spatial direction [directed to a second direction]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teachings of Tsingos to Sun to allow the Sun's component signals to be provided by a first directional microphone signal and a second directional microphone signal capturing the audio scene at different spatial directions, thus allowing a simple method of providing component signals of an audio scene by using the well-known proven technique of utilizing microphones to provide audio signals, as well as requiring a relatively small number of directional microphones (Tsingos, ¶0027) to record the audio scene allowing a simplified configuration.

Regarding Claim 3, Sun discloses audio scene encoder of claim 1, but may not explicitly disclose wherein the audio scene comprises, as a first component signal, a signal captured by an omnidirectional microphone positioned at a first position, and, as a second component signal, at least one signal captured by an omnidirectional microphone positioned at a second position different from the first position.
However, in a related field of endeavor [i.e., component signals of an audio scene] Tsingos teaches wherein the audio scene comprises, as a first component signal, a signal captured by an omnidirectional microphone positioned at a first position, and, as a second component signal, at least one signal captured by an omnidirectional microphone positioned at a second position different from the first position (Tsingos ¶0027 discloses capturing an audio scene to provide component signals and further teaches in ¶0057 and ¶0064: the audio scene comprises component signals, including a first directional microphone signal [first component signal] captured by the first microphone in one spatial direction [directed to a first direction] and a second directional microphone signal [second component signal] captured by the second microphone in another different spatial direction [directed to a second direction]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teachings of Tsingos to Sun to allow the Sun's component signals to be provided by a first directional microphone signal and a second directional microphone signal capturing the audio scene at different spatial directions, thus allowing a simple method of providing component signals of an audio scene by using the well-known proven technique of utilizing microphones to provide audio signals, as well as requiring a relatively small number of directional microphones (Tsingos, ¶0027) to record the audio scene allowing a simplified configuration.

Regarding Claim 4, Sun discloses audio scene encoder of claim 1, but may not explicitly disclose wherein the audio scene comprises, as a first component signal, at least one signal captured by a directional microphone directed to a first direction, and, as a second component signal, at least one signal captured by a directional microphone directed to a second direction, the second direction being different from the first direction.
However, in a related field of endeavor [i.e., component signals of an audio scene] Tsingos teaches wherein the audio scene comprises, as a first component signal, at least one signal captured by a directional microphone directed to a first direction, and, as a second component signal, at least one signal captured by a directional microphone directed to a second direction, the second direction being different from the first direction (Tsingos ¶0027 discloses capturing an audio scene to provide component signals and further teaches in ¶0057 and ¶0064: the audio scene comprises component signals, including a first directional microphone signal [first component signal] captured by the first microphone in one spatial direction [directed to a first direction] and a second directional microphone signal [second component signal] captured by the second microphone in another different spatial direction [directed to a second direction]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teachings of Tsingos to Sun to allow the Sun's component signals to be provided by a first directional microphone signal and a second directional microphone signal capturing the audio scene at different spatial directions, thus allowing a simple method of providing component signals of an audio scene by using the well-known proven technique of utilizing microphones to provide audio signals, as well as requiring a relatively small number of directional microphones (Tsingos, ¶0027) to record the audio scene allowing a simplified configuration.

Regarding Claim 5, Sun discloses audio scene encoder of claim 1,
wherein the audio scene comprises A-format component signals, B-format component signals (Sun ¶0082-¶0083: the audio scene is represented by B-format component signals W, X, Y, Z), or First-Order Ambisonics component signals, Higher-Order Ambisonics component signals (Sun ¶0028 discloses B-format, HOA. ¶0082 discloses the generator and the process described in connection with Figs. 3 and  4, the multi-dimensional auditory presentation method is an ambisonics auditory presentation method. In the ambisonics auditory presentation method, there are generally four channels, i.e., W, X, Y and Z channels in a B-format. The W channel contains omnidirectional sound pressure information, while the remaining three channels, X, Y and Z, represent sound velocity information measured over the three axes in a 3D Cartesian coordinates. ¶0083 discloses there can be three channels W, X, and Y, corresponding to a first order horizontal sound field).
Sun may not explicitly disclose component signals captured by a microphone array with at least two microphone capsules.
However, in a related field of endeavor [i.e., component signals of an audio scene] Tsingos teaches component signals captured by a microphone array with at least two microphone capsules (Tsingos ¶0094 discloses the audio signal emitted by an audio source 200 can be determined based on the first and second microphone signals of the two or more microphone arrays 210, 220, 230. In particular, the method 700 can allow determining the position, the directivity pattern 302 and/or the audio signal of an audio source 200 only using the microphone signals captured by K differently positioned microphone arrays 210, 220, 230. This information can be used to generate an audio representation independent of the listening position. This audio representation can be re-rendered for a listener at an arbitrary listening position within the three-dimensional (3D) environment. In particular, the determined audio signal, the determined position and/or the determined directivity pattern 302 of an audio source 200 can be used to determine how a listener perceives the audio signal emitted by the audio source 200 at an arbitrary listening position within the 3D environment. Hence, an efficient and precise audio representation scheme [e.g., for VR applications] is provided).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teachings of Tsingos to Sun to allow the Sun's component signals to be provided by a first directional microphone signal and a second directional microphone signal capturing the audio scene at different spatial directions, thus allowing a simple method of providing component signals of an audio scene by using the well-known proven technique of utilizing microphones to provide audio signals, as well as requiring a relatively small number of directional microphones (Tsingos, ¶0027) to record the audio scene allowing a simplified configuration.

Regarding Claim 6, Sun discloses audio scene encoder of claim 1, but may not explicitly disclose wherein the audio scene comprises component signals as determined by a virtual microphone calculation from an earlier recorded or synthesized sound scene.
However, in a related field of endeavor [i.e., component signals of an audio scene] Tsingos teaches wherein the audio scene comprises component signals as determined by a virtual microphone calculation from an earlier recorded or synthesized sound scene (Tsingos ¶0095 discloses indirect components can be determined from the first microphone signal and from the second microphone signal of a microphone array 210, 220, 230. The audio representation can also include the indirect components of one or more microphone arrays 210, 220, 230. For generating these indirect components of the audio representation, the indirect components of a microphone array 210, 220, 230 can be assumed to be originating from a position which is associated with the position of the microphone array 210, 220, 230. By way of example, the virtual source of the indirect components of a microphone array 210, 220, 230 can correspond to or can be equal to the position of the microphone array 210, 220, 230. By taking into account the indirect components when generating an audio representation, the perceived quality of the audio representation can be improved. ¶0096 discloses any of the features described in the present document can be part of a corresponding system for determining the position of at least one audio source 200. The system can comprise a processor for performing the method steps outlined in the present document. In particular, the system can be adapted to capture first and second microphone signals at two or more microphone arrays 210, 220, 230, wherein the two or more microphone arrays 210, 220, 230 are placed at different positions. The two or more microphone arrays 210,220, 230 can each comprise at least a first microphone capsule to capture a first microphone signal and a second microphone capsule to capture a second microphone signal. Furthermore, the first and second microphone capsules can exhibit differently oriented spatial directivities).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to apply the teachings of Tsingos to Sun to allow the Sun's component signals to be provided by a first directional microphone signal and a second directional microphone signal capturing the audio scene at different spatial directions, thus allowing a simple method of providing component signals of an audio scene by using the well-known proven technique of utilizing microphones to provide audio signals, as well as requiring a relatively small number of directional microphones (Tsingos, ¶0027) to record the audio scene allowing a simplified configuration.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YOGESHKUMAR G PATEL whose telephone number is (571)272-3957. The examiner can normally be reached 7:30 AM-4 PM PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached at (571) 272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YOGESHKUMAR PATEL/Primary Examiner, Art Unit 2691
Patent Application 18330953 - AUDIO SCENE ENCODER AUDIO SCENE DECODER AND - Rejection

Patent Application 18330953 - AUDIO SCENE ENCODER AUDIO SCENE DECODER AND

Application Information

Rejection Summary

Cited Patents

Office Action Text

Transform your business with AI in minutes, not months