Jump to content

Patent Application 17765002 - APPARATUS AND METHOD FOR AUDIO ENCODING - Rejection

From WikiPatents

Patent Application 17765002 - APPARATUS AND METHOD FOR AUDIO ENCODING

Title: APPARATUS AND METHOD FOR AUDIO ENCODING

Application Information

  • Invention Title: APPARATUS AND METHOD FOR AUDIO ENCODING
  • Application Number: 17765002
  • Submission Date: 2025-04-08T00:00:00.000Z
  • Effective Filing Date: 2022-03-30T00:00:00.000Z
  • Filing Date: 2022-03-30T00:00:00.000Z
  • National Class: 704
  • National Sub-Class: 500000
  • Examiner Employee Number: 86955
  • Art Unit: 2695
  • Tech Center: 2600

Rejection Summary

  • 102 Rejections: 0
  • 103 Rejections: 3

Cited Patents

No patents were cited in this rejection.

Office Action Text


    DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to claim amendment filed on January 15, 2025 and wherein claims 14-15, 17 amended.
In virtue of this communication, claims 1-20 are currently pending in this Office Action.
With respect to the objection of the application specification due to formality issue, as set forth in the previous Office Action, the specification supplement sheet and argument, see paragraph 2 of page 9 in Remarks filed on January 15, 2025 have been fully considered and the argument found persuasive and therefore, the objection of the application specification, as set forth in the previous Office Action, has been withdrawn.
The Office appreciates the explanation of the amendment and analyses of the prior arts, and however, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993) and MPEP 2145.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 5, 8, 11-16, 19 are rejected under 35 U.S.C. 103 as being unpatentable over Herre et al (US 20160142853 A1, hereinafter Herre) and in view of reference Johnston et al. (US 20120057715 A1, hereinafter Johnston).
Claim 1: Herre teaches an audio encoding apparatus (title and abstract, ln 1-16, an 3D-audio encoder in fig. 1, paired with a 3D-audio decoder in fig. 2) comprising: 
an audio receiver circuit (part of element 102 for receiving channels 104 and audio objects 106 in fig. 1), 
wherein the audio receiver circuit is arranged to receive a plurality of audio items (audio channels 104 and audio objects 106 in fig. 1, para 41), 
wherein the plurality of audio items represent an audio scene (representing a channel plus object input scene, para 41); 
a metadata receiver circuit (part of OAM encoder 124 to receive object Metadata OAM 108 in fig. 1), 
wherein the metadata receiver circuit is arranged to receive input presentation metadata (OAM 108 received by the OAM encoder 124 in fig. 1, para 37), and 
wherein the input presentation metadata describes presentation constraints for rendering of the plurality of audio items (OAM 108, providing control signals to the object renderer 216 and SAOC decoder optional 220 for rendering, via elements 124, 116 in fig. 1, elements 202, 224 and element mixer 226 in fig. 2, weighting for rendering audio objects by associated object metadata OAM, para 41); 
an audio encoder circuit (SAOC encoder optional 112, para 46, and part of USAC encoder 116, para 42, in fig. 1), 
wherein the audio encoder circuit is arranged to generate encoded audio data (generating mp4 128 via 112, 116 in fig. 1) for the audio scene by encoding the plurality of audio items (the transmitted mp4 128 used for audio decoder, fig. 2, to generate output audio scene based on decompressed object metadata information and user interaction information, para 47); 
a metadata circuit (other part of OAM encoder 124 for encoding the received object metadata OAM 108, para 60), 
wherein the metadata circuit is arranged to generate output presentation metadata (encoded or compressed OAM 126 in fig. 1, wherein the associated metadata specifies the geometrical position and volume of the objects in the 3D space, para 48 and the encoded or compressed OAM 126 is transmitted to an audio decoder, fig. 2, for rendering or presentation), 
wherein the output presentation metadata is arranged to constrain an extent (including mapping from inputs to outputs by using reproduction layout, the decompressed object, metadata OAM, and the user interaction information, para 47, and mapping performed through an application of downmix coefficients, including a mapping of the azimuth/elevation of the input channels to the output channels, para 59-63) by which a user-adaptable parameter of a rendering system (the downmix coefficients, as claimed user-adaptable parameter, applied in the audio decoder in fig. 2, as claimed rendering system) can be adapted by a user when rendering the encoded audio data (via the user interaction information, para 47, and including manually tuning the downmix coefficients by an expert while rendering the audio channels to speaker audio signals, e.g., through format conversion 232, etc., in fig. 2, para 62-63, and wherein the position of virtual sound source is given by the position in space associated with the particular channel, i.e., the loudspeaker position associated with the particular input channel, including azimuth/elevation mapping or downmixing from the input channels to the output channels, para 62-63 or in binaural downmix rendering, an equalization filter is derived based empirical expert knowledge and/or measured BRIR data, para 79);
wherein the extent by which the user-adaptable parameter of the rendering system can be adapted includes a plurality of permissible values of the user-adaptable parameter (including downmix coefficients to be tuned by the expert above and applied for mapping from the input channels to the output channels, para 62 and including mapping of the azimuth/elevation of the input channels to the output channels, para 63), and
an output circuit (other part of element 116 in fig. 1), wherein the output circuit is arranged to generate an encoded audio data stream (compressed representation of the audio signal mp4 with encoded metadata OAM 126 and SAOC-SI 118 in fig. 1), 
wherein the encoded data stream comprises the encoded audio data (encoded audio objects and channels 120, 122, and transport channels 114 in fig. 1) and the output presentation metadata (including compressed OAM 126, and as well SAOC side information SAOC-SI 118); and
wherein the output presentation metadata comprises at least one of: an audio item position constraint (metadata information 108 specifies the geometrical position and volume of the object in 3D space, para 72 and weighting the objects for each channel to be rendered based on object metadata OAM, para 41).
However, Herre does not explicitly teach wherein the output presentation metadata comprises a reverberation constraint.
Johnston teaches an analogous field of endeavor by disclosing an audio encoding apparatus (title and abstract, ln 1-11 and an audio encoder aspect in fig. 1, and the paired audio decoder aspect in fig. 2, para 19-20) and wherein output presentation metadata is disclosed to comprises a reverberation constraint (encoded metadata including a representation of reverberation parameters by which a reverberator can be configured in receiver/decoder, para 50, e.g., T10 Reverberation decay-time parameters F1-Fn, etc., in table 1, para 63) and gain constraints (G1-Gn as mixing coefficients, gain values, in the metadata, para 62-63) for benefits of achieving a desired sound quality (by achieving an environment-friendly and user-flexibly mixing scheme, para 105, flexible user’s choices of a set of parameters, para 46, and practicing a variety of optimal reverberant sounds by providing a more tube-like or flutter echo sound quality, para 93).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied the reverberation constraint that is included in the output presentation metadata, as taught by Johnston, to the output presentation metadata in the audio encoding apparatus, as taught by Herre, for the benefits discussed above.
	Claim 14 recited a method of encoding audio, which is essentially consistent with the features as recited in claim 1 above and thus, claim 14 rejected for the at least similar reasons described in claim 1 above.
Claim 15 recited a computer-readable non-transitory medium storing a computer program, wherein the computer program, when executed on a processor, performs the method as claimed in claim 14 (Herre, storage medium storing computer program for performing one of the methods by using a computer, para 122-123, and Johnston, software module executed by general purpose microprocessors, para 132).
Claim 2: the combination of Herre and Johnston further teaches, according to claim 1 above, wherein the audio encoder circuit comprises a combiner circuit (Herre, other part of element 102 for mixing by the optional mixer, para 37), wherein the audio encoder circuit is arranged to generate a combined audio item (Herre, channels + prerendered objects 122 in fig. 1) by combining at least a first audio item (Herre, one of the channels 104 or and/or one of objects 106 in fig. 1) and a second audio item (Herre, other ones of the channels 104 and/or other one of objects 106 in fig. 1) from the plurality of audio items (Herre, the channels 104 and the objects 106 in fig. 1, para 41, and the discussion in claim 1 above) in response to input presentation metadata for the first audio item and input presentation metadata for the second audio item (Herre, mixing and prerendering for ensuring the deterministic signal entropy at the encoder input by disclosed rendering in the decoder, para 41, by using the OAM to render the audio objects, para 41 and rendered with specified geometrical position and volume of the objects associated with the OAM in the 3D space, para 48), wherein the audio encoder circuit is arranged to generate combined encoded audio data for the first audio item and the second audio item by encoding the combined audio item (Herre, generating the combined audio signal as channel scene 122 to be encoded by USAC 116 in fig. 1, para 41), wherein the encoded audio data comprises the combined encoded audio data (the USAC 116 further provided with the element 122 and also audio objects 120, and SAOC transport channels 114 and as well SAOC-SI 118 and compressed OAM 126 to generate mp4 bitstream 128, para 37-39).
Claim 5: the combination of Herre and Johnston further teaches, according to claim 2 above, wherein the input presentation metadata for the first audio item and the input presentation metadata for the second audio item comprise a position constraint (Herre, the metadata specifies the geometrical position and volume of the objects in the 3D space, para 48 and the discussion in claim 1 above and Hohnston, more than on T60 parameter is transmitted, as the metadata, corresponding to the perceived geometry of the synthetic listening space, para 65).
Claim 8: the combination of Herre and Johnston further teaches, according to claim 2 above, wherein the audio encoder circuit is arranged to adapt a compression of a first audio item (e.g., audio objects 106 to be prerendered and converted with the channels to form a channel scene, para 41 and wherein the object signals are weighted and fitted to channel layout of the audio reproduction side through prerenderer/mixer being similar to audio object render performed at audio decoder of fig. 2, para 41) in response to input presentation metadata (Herre, using metadata including geometrical position and volume of the audio objects in the 3D space, para 48) for a second audio item (Herre, for the channels 104 to be formed as audio scene 122 in fig. 1).
Claim 11: the combination of Herre and Johnston further teaches, according to claim 8 above, wherein the audio encoder circuit is arranged to adapt the compression of the first audio item in response to input presentation metadata for the first audio item (pre-rendering the audio objects 106 to form the channel scene with the channels 104 and the discussion in claims 1, 8 above).
Claim 12: the combination of Herre and Johnston further teaches, according to claim 1 above, wherein the input presentation metadata comprises priority data for a portion of the plurality of audio items (Herre, the pre-rendering/mixing 112 for forming the channels and the audio objects to channel scene as an option, para 41 and Johston, a flag a1 in the metadata, specified that each of the input channels optionally with no synthetic diffusion processing and maintaining intrinsic diffusing characteristic of channels , i.e., priority with respect to synthesis processing for each of channels, para 63), wherein the encoder circuit is arranged to adapt a compression for a first audio item in response to a priority indication for the first audio item in the input presentation metadata (Johnston, encoding audio channels according to whether applying or not applying synthesis diffusion processing or not, para 63 and wherein while a1 is specified, i.e., synthesis diffusion processing is not expected, and F1-Fn, a1-an, etc., in metadata are invalidated for encoding, para 63).
Claim 13: the combination of Herre and Johnston further teaches, according to claim 1 above, wherein the audio encoder circuit is arranged to generate encoding adaptation data, wherein the encoding adaptation data is indicative of how the encoding is adapted in response to the input presentation metadata (the discussion in claim 12 above), wherein the encoded audio data stream comprises the encoding adaptation data (e.g., Johnston, the parameter a1 in the metadata, indicating that some of the channels are not processed with synthesis diffusion processing, para 63).
Claim 16 recited a method and has been analyzed and rejected according to claims 14, 2 above.
Claim 19 recited a method and has been analyzed and rejected according to claims 16, 5 above.

Claims 3-4, 9-10, 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over Ertel (above) and in view of references Herre (above) and in view of references Johnston (above) and Chinen et al (WO 2018180531 A1 in IDS, also published as US 20200043505 A1 by Chinen et al., hereinafter Chinen).
Claim 3: the combination of Herre and Johnston, further teaches, according to claim 2 above, wherein the combiner circuit is arranged to select portion of the first audio item and portion of the second audio item from the plurality of audio of audio items in response to the input presentation metadata for the first audio item and for input presentation metadata sound audio item (Johnston, selecting channels for no synthesis diffusion processing by engineer control, and specified in metadata, e.g., the field of a1 in table 1, para 62-63), except select the first audio item and the second audio item from the plurality of audio items in response to the input presentation metadata for the first audio item and the input presentation metadata for the second audio item.
Chinen teaches an analogous field of endeavor by disclosing an audio encoding apparatus (title and abstract, ln 1-15, including metadata encoder and audio encoder 52, 51, respectively in fig. 10) and wherein select the first audio item and the second audio item from the plurality of audio items in response to the input presentation metadata for the first audio item and for input presentation metadata second audio item (a flag as metadata, included in the bitstream, per audio object, para 15, indicating whether the bitstream includes an independent audio object, i.e., uncombined audio object, or combined format to other audio object, para 260 and combined audio objects are undistinguishable at a predetermined listening position, para 8, i.e., selecting the audio object as uncombined or independent in the bitstream by the flag) for benefits of improving transmission efficiency (using a transmission bit rate for combined audio objects, para 15, by reducing amount of transmission of data, para 18).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein selecting the first audio item and the second audio item from the plurality of audio items in response to the input presentation metadata for the first audio item and for input presentation metadata second audio item, as taught by Chinen, to the combiner circuit in the audio encoding apparatus, as taught by the combination of Herre and Johnston, for the benefits discussed above.
Claim 4: the combination of Herre, Johnston, and Chinen further teaches, according to claim 2 above, wherein the combiner circuit is arranged to select the first audio item and the second audio item in response to a determination that at least a portion of input presentation metadata for the first audio item and at least a portion of the input presentation metadata for the second audio item meet a similarity criterion (Johnston, the flag a1 in the metadata in table 1, para 62-63 and the discussion in claims 1, 3 above, and Chinen, undistinguishable audio objects coming from a same direction and the direction is a part of metadata for the audio objects, para 60 or coming from within a predetermined horizontal angular range, para 71).
Claim 9: the combination of Herre, Johnston, and Chinen further teaches, according to claim 8 above, wherein the audio encoder circuit is arranged to estimate a masking effect to the first audio item from the second audio item in response to input presentation metadata for the second audio item (Johnston, flag a1 specified channel that avoided for the synthesis diffusion processing as specified in table 1, para 62-63, i.e., masking with respect to the synthesis diffusion processing, and Chinen, masking from one audio object over other by amount of components of the audio waveform data of one object is over a threshold, so that undistinguishable sounds, para 185), wherein the audio encoder circuit is arranged to adapt the compression of the first audio item in response to the masking effect (Johnston, assigning a1 in Metadata in the encoder side and the discussion in claim 8, 4 above, and Chinen, the undistinguishable sounds or objects are encoded as one combined object specified by the flag, para 260).
Claim 10: the combination of Herre, Johnston, and Chinen further teaches, according to claim 9 above, wherein the audio encoder circuit is arranged to estimate the masking effect to the first audio item from the second audio item in response to at least one of a gain constraint  and a position constraint for the second audio item (Johnston, the discussion in claim 9 above, and Chinen, amount of components of the one audio object over other, i.e., gain constraint, para 185), wherein the position constraint is indicated by the input presentation metadata for the second audio item (Herre, the metadata specified geometrical position and volume of the objects in the 3D space, para 48, Johnston, the metadata also includes source position parameters, source distance parameters, and mixing coefficients such as gain values in table 1, and a1 in table 1 specified channels to be masked for synthesis diffusion processing, para 62-63, and Chinen, from same direction, para 60, i.e., masking from higher amount over lower amount of the objects specifically at the same direction with respect to the listener position, para 60).
Claim 17 recited a method and has been analyzed and rejected according to claims 16, 3 above.
Claim 18 recited a method and has been analyzed and rejected according to claims 16, 4 above.

Claims 6-7, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Herre (above) and in view of references Johnston, and Vilkamo (US 20190132674 A1).
Claim 6: the combination of Herre and Johnston, further teaches, presentation metadata (Herre, OAM and encoding of OAM in fig. 1 and the discussion in claims 1-2 above and Johnston, the metadata includes contents in table 1, para 62-63), according to claim 2 above, except wherein the audio encoder circuit is arranged to generate combined presentation metadata for the combined audio item in response to the input presentation metadata for the first audio item and the input presentation metadata for the second audio item, wherein the output presentation metadata comprises the combined presentation metadata.
Vilkamo teaches an analogous field of endeavor by disclosing an audio signal processing apparatus for audio encoding (title and abstract, ln 1-18 and a mobile device or tablet computer in fig. 9, and the SPAC is encoded and passed to the decoder, para 107) and wherein the device is arranged to generate combined presentation metadata (via the element 711 to generate combined metadata 713 in fig. 7) for the combined audio item (for the combined audio signal via the element 895 in fig. 8) in response to the input presentation metadata for the first audio item and the input presentation metadata for the second audio item (the individual audio signal processed based on the direction as the channel metadata through 891, 893, in fig. 8, para 202 or as part of the audio signal combiner in fig. 8), wherein the output presentation metadata comprises the combined presentation metadata (the combined metadata is outputted from the metadata processor 661 in fig. 6) for benefits of improving perceptuality quality (by providing perceptual at complex sound scenes, para 6) in a flexible environment (flexible definition of the number of simultaneous SPAC directions, para 7).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have applied wherein the audio encoder circuit is arranged to generate combined presentation metadata for the combined audio item in response to the input presentation metadata for the first audio item and the input presentation metadata for the second audio item, and wherein the output presentation metadata comprises the combined presentation metadata, as taught by Vilkamo, to the metadata of the audio encoder circuit in the audio encoding apparatus, as taught by the combination of Herre and Johnston, for the benefits discussed above.
Claim 7: the combination of Herre, Johnston, and Valkamo further teaches, according to claim 6 above, wherein the audio encoder circuit is arranged to generate a portion of the combined presentation metadata (Valkamo, directions or energy ratio parameter via the metadata processor 561 in fig. 5, para 107) to reflect a constraint for a presentation parameter (Valkamo, for the merged audio signal, para 107), wherein the presentation parameter is for the combined audio item (Valkamo, for the merged audio signal and reflecting the contribution of the individual audio signals prior to the merge, para 107), wherein the constraint is determined as a constraint meeting both a constraint for the first audio item and a constraint for the second audio item (Valkamo, e.g., energy requirement for the merging, e.g., energies requirement for achieving the merged overall energy being 2, para 107; e.g., individual energy calculation meets the overall energy requirement for merging, para 105).
Claim 20 recited a method and has been analyzed and rejected according to claims 16, 6 above.

Response to Arguments

Applicant's arguments filed on January 15, 2025 have been fully considered and but are not persuasive.
With respect to prior art rejection of claim 1, similar to prior art rejection of claims 14-15, applicant described a background and emphasized “Current technology enables an encoding of audio material such that rendering parameters, such as the apparent position of each audio item and the reverberation effect, can be controlled to achieve a preferred acoustic experience”, and “applicants have recognized that a conflict exists between enabling a user to freely control the presentation parameters vs. enabling the content provider to control the presentation parameters achieve a desired acoustic effect” and wherein “the content provider may constrain the reverberation level to be above a minimum level, or below a maximum level, or both and the user is enabled to increase or decrease the reverberation level to the user’s preference, but not to a level that the content provider considers to be too high or too low …”,  asserted in paragraph 3 of page 9 and paragraphs 2-3 of page 10 in Remarks filed on January 15, 2025, and argued “Herre does not disclose metadata that constrains a user-adaptable parameter to a plurality of permissible values” because Herre’s “output presentation metadata is unrelated to the downmix coefficients”, although “applicants concur that the downmix coefficients are user-adaptable parameters” and “the user interaction information does not include manually tuning the downmix coefficients by an expert”, and clearly, Herre does not teach that the user interaction information in Herre’s metadata includes constraints on the downmix coefficients”, asserted in paragraphs 2-5 of page 11 and paragraphs 1-2 of page 12 in Remarks filed on January 15, 2025, and further argued “Herre’s metadata includes positioning and other information related to the input audio sources, Herre’s expert provides positioning and other information related to the output audio sources or speakers, Obviously, the positioning of the output speakers is dependent upon the physical environment of the listening area, and is constant regardless of the positioning of the input sources”, and “As is clearly evident, the reproduction layout 248 created by the expert is not affected by the metadata associated with the input audio material. Herre’s metadata does not constrain the expert’s reproduction layout information. To the contrary, Herre acknowledges that the loudspeaker positions may be random configurations with non-standard loudspeaker positions”, etc., as asserted in paragraphs 3-5 of page 12 and paragraphs 1-2 of page 13 in Remarks filed on January 15, 2025.
In response to the argument above, the Office respectfully disagrees because 
(1) the Office appreciated the statement about the advantages of the instant patent application in light of the statement of the background in the field, etc., and however, although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims, see In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993) and MPEP 2145. For example, applicant appears to emphasize, in the argument above, “rendering parameter” (mapped to claimed output presentation metadata”) including “apparent position of each audio item” and “the reverberation effect” and further emphasized selection of levels of “the reverberation effect”, etc., but the applicant claims applied different and broad manner by using Markush format, see MPEP 2117, to allow “output presentation metadata” comprises “at least one of: an audio item position constraint, and a reverberation constraint”, i.e., comprising at least either “an audio item position constraint”, or “a reverberation constraint”,  other than argued “position of each audio item” AND “the reverberation effect” (Bolded for emphasis herein),
(2) claims recited “An audio encoding apparatus” (claim 1) and “method” (claim 14), but neither recited that “apparatus” comprises “a rendering system” that has “a user-adaptable parameter” and “can be adapted by a user when rendering the encoded audio data”, nor recited that “output presentation metadata” comprises “user-adaptable parameter” that further includes “a plurality of permissible values”, and instead, claims broadly recited “wherein the output presentation metadata is arranged to constrain an extent” and “the output presentation metadata comprises at least one of: an audio item position constraint, and a reverberation constraint” with no recitation of what “an extent” is/comprises, but merely reciting by such “extent”, “a user-adaptable parameter of a rendering system” would be intended to be “adapted by a user when …” and “the user-adaptable parameter” would be intended to “includes a plurality of permissible values”, and therefore, the argued features about including “a rendering system” having “a user-adaptable parameter” “can be adapted by a user when rendering …” and “can have a plurality of permissible values”, etc., above would be merely intended purpose for claimed “apparatus” and “method”,
(3) with respect to the prior art rejection of claims 1, 14-15, Herre clearly does not only disclose the argued “downmix coefficients” that related to rendering inputted audio objects through format conversion from inputs to outputs (e.g., practicing virtual sound sources whose position is associated with particular input channels, para 59-62, fig. 10), but also disclose associated metadata (mapped to output presentation metadata represented by encoded OAM, as discussed in office action above) which specifies including geometrical position of the audio objects in 3D space (mapped to the claimed “an audio item position constraint”, and also specifies volume of the audio objects in the 3D space, para 48), and further, disclose the downmix coefficients (formulated to downmix matrix, para 52) defines a mapping from received input channel configuration/positions to desired output channel configuration/ positions (mapping in format conversion of the audio decoder, para 59 and three approaches for mapping, para 63, as to claimed “extent”), and wherein by such mapping through downmix (para 59-63), the downmix coefficients (as to claimed user-adaptable parameter) applied in the audio decoder (fig. 2, as to claimed rendering system) are corresponding to geometrical positions of the audio objects as inputs (through the mapping, discussed above) and also adjusted by an expert during rendering to speaker audio signals (element 232 of the audio decoder in fig. 2, para 62, as to claimed adapted by a user when rendering the encoded audio data) and/or based on user interaction information (generating the output audio scene is based on including user interaction information, para 47), and wherein downmix coefficients are inherently permissible values (as to the claimed plurality of permissible values of the user-adaptable parameter), which is essentially consistent with claimed and argued features including “extent” constrained by “output presentation metadata” and by the mapping or “extent”, “a user-adaptable parameter” of “a rendering system” is “adapted by a user when rendering the encoded audio data” and includes “permissible values”, etc., but applicant is in silence and wherein Herre’s expert indeed provides positioning and other information related to the output audio sources, and the positioning of the output speakers is indeed dependent upon the physical environment of the listening area, as indicated by applicant (paragraph 3 of page 12 in Remarks filed on January 15, 2025), but mapping from inputs to outputs must be obeyed to and constrained by input sources including positioning of the input sources (mapping from inputs to outputs includes transition of azimuth/elevation of the sound sources, para 63) and thus, the argument about “regardless of the positioning of the input sources”, etc. is moot. 
Applicant further argued “Johnstone’s metadata does not constrain a user-adjustable parameter to a plurality of permissible values”, because “Johnstone’s user is the creator, sound engineer, that creates the reverberation metadata” and no teach of “the reverberation parameters can be adapted by a user when rendering the encoded audio data”, as asserted in paragraph 3 of page 13 in Remarks filed on January 15, 2025.
In response to the argument above, the Office further respectfully disagrees because 
(1) the second prior art Johnston does not have to disclose the features the first prior art Herre has taught as discussed above, and further, claim uses Markush for selecting either “an audio item position constraint” or “a reverberation constraint” with no recitation of at least both,
(2) even though considering the Johnson’s disclosure along, as discussed in office action, Johnson’s reverberation parameters is mapped to be included in Johnson’s metadata (mapped to claimed output presentation metadata), and claims failed to recite that “output presentation metadata” has to be “adapted by a user” and has to have “permissible values” and thus, the argued features above has not been recited in claims and argument above is also not persuasive. In addition, Johnson clearly teaches processing audio signal under a control of mixing engineer (110), including a live sources and in synchronous relationship with the audio signal to match direct/diffuse audio characteristics of the live sources to synchronized cinematic scene changes, etc., (Johnston, para 49), which is also essentially consistent with the argued “user-adjustable parameter” and having “permissible values” in order to practice the “synchronization” above (e.g., synchronized to cinematic scene changes, etc., inherently “permissible values” at different situations and changes and/or at different times) and thus, the argument is also not persuasive.
On the bases of above analyses and evidences from the prior arts, the prior art rejection of independent claims 1, 14-15 under 35 USC §103(a), as set forth in the previous Office Action, is maintained. For the at least similar reasons discussed above, the prior art rejection of dependent claims 2-13, 16-20 is also maintained. 
In the response to this office action, the Office respectfully requests that support be shown for language added to any original claims on amendment and any new claims. That is, indicate support for newly added claim language by specifically pointing to page(s) and line numbers in the specification and/or drawing figure(s). This will assist the Office in prosecuting this application.
The Applicant is also respectfully encouraged to contact the Office in order to be better for clarifying the invented subject matters, further amendment to discern from the prior arts, and clarifying the claim language, so that the expedition of the application prosecution can be achieved in an efficient manner.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action (e.g., amended claims 14-15, etc.). Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LESHUI ZHANG whose telephone number is (571)270-5589.  The examiner can normally be reached on Monday-Friday 6:30amp-4:00pm EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 571-272-7848.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/LESHUI ZHANG/
Primary Examiner, 
Art Unit 2695




    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    


(Ad) Transform your business with AI in minutes, not months

✓
Custom AI strategy tailored to your specific industry needs
✓
Step-by-step implementation with measurable ROI
✓
5-minute setup that requires zero technical skills
Get your AI playbook

Trusted by 1,000+ companies worldwide

Cookies help us deliver our services. By using our services, you agree to our use of cookies.