Patent Application 18176884 - PROCESSING METHOD AND ELECTRONIC DEVICE

Title: PROCESSING METHOD AND ELECTRONIC DEVICE
Application Information

Invention Title: PROCESSING METHOD AND ELECTRONIC DEVICE
Application Number: 18176884
Submission Date: 2025-05-15T00:00:00.000Z
Effective Filing Date: 2023-03-01T00:00:00.000Z
Filing Date: 2023-03-01T00:00:00.000Z
National Class: 348
National Sub-Class: 014080
Examiner Employee Number: 100775
Art Unit: 2691
Tech Center: 2600
Rejection Summary

102 Rejections: 0
103 Rejections: 7
Cited Patents

No patents were cited in this rejection.
Office Action Text


    Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 1, 2023, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement(s) submitted on March 1, 2023 has/have been considered by the Examiner and made of record in the application file.
Specification
The disclosure is objected to because of the following informalities:
[0035] line 23, “secondo” should read “second”.
[0046] line 18, “baes” should read “base”. 
[0064] line 19, “outputted The target” should read “outputted. The target”
Appropriate correction is required.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 3, 9, 10, 11, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Chen (CN 202210139651 A).

Regarding claim 1, Jiao discloses a processing method applied to a first electronic device (“the video collecting component” Jiao (Page 2, ¶6)), comprising: obtaining first audio data and/or first image data (“the at least two image collecting module, for collecting respectively least two first video in different directions” Jiao (Page 2, ¶6)); performing at least one process on the first audio data and/or the first image data to obtain target data to be outputted (“the data processing component is used for receiving the second video transmitted by the image processor; based on the second video, determining the video to be displayed” Jiao (Page 2, ¶8)); and transmitting the target data (“sending the to-be-displayed video” Jiao (Page 3, ¶15)) to be outputted to a target application running on a second electronic device (“to the electronic device and/or display component, so that the electronic device and/or the display component displays the video to be displayed.” Jiao (Page 3, ¶15)) having a communication connection with the first electronic device (the video collecting component (first electronic device) of at least two image collecting module 112 of the output end are connected with the first end of the image processor 111. The second end of the image processor 111 is connected with the first end of the data processing component 12. The second end of the data processing component 12 is connected with the first end of the control processor 131; the second end of the control processor 131 is connected with the display touch screen 132 (second electronic device)) Jiao (Page 8, ¶7)), the target application being configured to directly output the target data to be outputted (“the to-be-displayed video and/or third video is displayed in a specific area of the display area of the display component” Jiao (Page 8, ¶2)); 
Jiao does not expressively teach wherein data size of the target data to be outputted is different from data size of the first audio data and/or the first image data. 
However, Chen teaches wherein data size of the target data to be outputted is different from data size of the first audio data and/or the first image data (“In some embodiments, the to-be-processed target data can be target audio data to be processed, so the target field represented by the data processing mode further comprises audio quality analysis operation for the target audio data to be processed…the corresponding target field is different, therefore the target field characterizing the audio conversion and the target field characterizing the audio quality analysis are different attribute fields.” Chen (P11, ¶4)).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Chen so as to optimize and improve the security of the data storage process. Chen acknowledges that there is a desire to optimize the data storage process and does so by obtaining the attribute data set of each target data and storing the target data in a data storage space, while ensuring the target data is different from the input data.

Regarding claim 3, Jiao discloses the method of claim 2, wherein performing at least one process on the first audio data to obtain the target data to be outputted includes at least one of the alternatives below. Examiner has chosen to reject alternative 1. 
performing at least one process on the first audio data based on the change information in the target space environment to obtain the target data to be outputted (“In some embodiments, the data processing component 12 can be based on the sound direction information, from the second image to determine the sound direction information corresponding to the target sound image, then determining the target sound image for specific presentation of the video to be displayed. For example, the target sounding portrait specific presentation may include: and amplifying the target sounding image in the second video.” Jiao (Page 10, ¶2)); 
or, performing at least one process on the first audio data in response to obtaining instruction information generated by operations acting on the target application to obtain the target data to be outputted; 
or, performing at least one process on the first audio data based on target space environment information and resource information of the first electronic device to obtain the target data to be outputted.

Regarding claim 9, Jiao discloses the method of claim 1 further comprising: outputting the target data to be outputted to a target output component, the target output component being an output component of the first electronic device and/or a display component and/or an audio output component connected to the first electronic device, wherein (“sending the to-be-displayed video to the electronic device and/or display component, so that the electronic device and/or the display component displays the video to be displayed” Jiao (Page 3, ¶14)): the target data to be outputted is output to the target output component and the target application through the same or different channels (“In some embodiments, the based on the second video determining video to be displayed, comprising: analyzing the second video; determining the target vocalizing portrait; performing specific manner presentation processing to the target sounding portrait in the first video to obtain the video to be displayed.” Jiao (Page 3, ¶16-18)).

Regarding claim 10, Jiao discloses an electronic device, as a first electronic device, comprising: a body (“the conference device 10 may include a main body and a column body” Jiao (Page 6, ¶5)); a microphone array (“the video collecting component further comprises a first microphone array and a voice processor” Jiao (Page 2, ¶14)) arranged on the body for collecting audio data in a target space environment (“the voice processor is used for receiving the first sound information sent by the first microphone array” Jiao (Page 2, ¶16); Fig. 11); a camera array arranged on the body for collecting image data in the target space environment (“the video collecting component 11 comprises an image processor 111 and at least two image collecting module 112 different from the optical axis direction.” Jiao (Page 5, ¶16); Fig. 3A, 1); a processing disposed in the body (“a data processing component 12” Jiao (Page 5, ¶15)), the processing device being configured to: obtain first audio data and/or first image data, the first audio data including or not including the audio data collected by the microphone array, the first image data including or not including the image data collected by the camera array (“the at least two image collecting module, for collecting respectively least two first video in different directions.” Jiao (Page 2, ¶6)); perform at least one process on the first audio data and/or the first image data to obtain target data to be outputted (“the data processing component is used for receiving the second video transmitted by the image processor; based on the second video, determining the video to be displayed” Jiao (Page 2, ¶8)); and transmit the target data to be outputted to a target application running on a second electronic device having a communication connection with the electronic device (the video collecting component (first electronic device) of at least two image collecting module 112 of the output end are connected with the first end of the image processor 111. The second end of the image processor 111 is connected with the first end of the data processing component 12. The second end of the data processing component 12 is connected with the first end of the control processor 131; the second end of the control processor 131 is connected with the display touch screen 132 (second electronic device)) Jiao (Page 8, ¶7)), the target application being configured to directly output the target data to be outputted (“sending the video to be displayed to the electronic device and/or the display component, receiving the third video transmitted by the electronic device, outputting the third video to the display component, so that the display component displays the third video, and the electronic device and/or the display component displays the video to be displayed.” Jiao (Page 3, ¶15)).
Jiao does not expressively teach data size of the target data to be outputted being different from data size of the first audio data and/or the first image data. 
However, Chen teaches data size of the target data to be outputted being different from data size of the first audio data and/or the first image data (“In some embodiments, the to-be-processed target data can be target audio data to be processed, so the target field represented by the data processing mode further comprises audio quality analysis operation for the target audio data to be processed…the corresponding target field is different, therefore the target field characterizing the audio conversion and the target field characterizing the audio quality analysis are different attribute fields.” Chen (P11, ¶4)).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Chen so as to optimize and improve the security of the data storage process. Chen acknowledges that there is a desire to optimize the data storage process and does so by obtaining the attribute data set of each target data and storing the target data in a data storage space, while ensuring the target data is different from the input data.

Regarding claim 11, Jiao discloses a processing device comprising: an acquisition module configured to obtain first audio data and/or first image data (“the at least two image collecting module, for collecting respectively least two first video in different directions” Jiao (Page 2, ¶6)); a processing module configured to perform at least one process on the first audio data and/or the first image data to obtain target data to be outputted (“the data processing component is used for receiving the second video transmitted by the image processor; based on the second video, determining the video to be displayed” Jiao (Page 2, ¶8)); and an output module configured to transmit the target data to be transmitted (“sending the to-be-displayed video” Jiao (Page 3, ¶15)) to a target application running on a second electronic device (“to the electronic device and/or display component” Jiao (Page 3, ¶15)) having a communication connection with the electronic device (the video collecting component (first electronic device) of at least two image collecting module 112 of the output end are connected with the first end of the image processor 111. The second end of the image processor 111 is connected with the first end of the data processing component 12. The second end of the data processing component 12 is connected with the first end of the control processor 131; the second end of the control processor 131 is connected with the display touch screen 132 (second electronic device)) Jiao (Page 8, ¶7)), the target application being configured to directly output the target data to be outputted (“the to-be-displayed video and/or third video is displayed in a specific area of the display area of the display component” Jiao (Page 8, ¶2)).
Jiao does not expressively teach wherein data size of the target data to be outputted is different from data size of the first audio data and/or the first image data. 
However, Chen teaches wherein data size of the target data to be outputted is different from data size of the first audio data and/or the first image data (“In some embodiments, the to-be-processed target data can be target audio data to be processed, so the target field represented by the data processing mode further comprises audio quality analysis operation for the target audio data to be processed…the corresponding target field is different, therefore the target field characterizing the audio conversion and the target field characterizing the audio quality analysis are different attribute fields.” Chen (P11, ¶4)).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Chen so as to optimize and improve the security of the data storage process. Chen acknowledges that there is a desire to optimize the data storage process and does so by obtaining the attribute data set of each target data and storing the target data in a data storage space, while ensuring the target data is different from the input data.

Regarding claim 13, Jiao discloses the processing device of claim 12, wherein the processing module is further configured to perform at least one of the alternatives below. Examiner has chosen to reject alternative 1. 
perform at least one process on the first audio data based on the change information in the target space environment to obtain the target data to be outputted (“In some embodiments, the data processing component 12 can be based on the sound direction information, from the second image to determine the sound direction information corresponding to the target sound image, then determining the target sound image for specific presentation of the video to be displayed. For example, the target sounding portrait specific presentation may include: and amplifying the target sounding image in the second video.” Jiao (Page 10, ¶2)); 
or, perform at least one process on the first audio data in response to obtaining instruction information generated by operations acting on the target application to obtain the target data to be outputted; 
or, perform at least one process on the first audio data based on target space environment information and resource information of the first electronic device to obtain the target data to be outputted.

Regarding claim 19, Jiao discloses the processing device of claim 11, wherein the output module is further configured to: transmit the target data to be outputted to a target output component, the target output component being an output component of the processing device and/or a display component and/or an audio output component connected to the processing device (“sending the to-be-displayed video to the electronic device and/or display component, so that the electronic device and/or the display component displays the video to be displayed” Jiao (Page 3, ¶12)), the target data to be outputted being output to the target output component and the target application through the same or different channels (“In some embodiments, the based on the second video determining video to be displayed, comprising: analyzing the second video; determining the target vocalizing portrait; performing specific manner presentation processing to the target sounding portrait in the first video to obtain the video to be displayed.” Jiao (Page 3, ¶16-18)). 


Claims 2 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Tanaka et. al (US 20180322896 A1, hereinafter Tanaka). 

Regarding claim 2, Jiao discloses the method of claim 1, wherein obtaining the first audio data and/or the first image data includes at least one of the alternatives below. Examiner has chosen to reject alternative 1. 
using a microphone array and/or a camera array of the first electronic device (“the video collecting component” Jiao (Page 2, ¶6)) to collect audio data and/or image data in a target space environment as the first audio data and/or the first image data (“the first microphone array is used for collecting the first sound information” Jiao (Page 2, ¶15)); 
or, using audio data and/or image data from the target application as the first audio data and/or the first image data; 
or, using the audio data and/or the image data in the target space environment collected by the microphone array and/or the camera array of the first electronic device, and the audio data and/or the image data from the target application as the first audio data and/or the first image data; 
or, using the audio data and/or the image data in the target space environment collected by the microphone array and/or the camera array of the first electronic device, the audio data and/or the image data from the target application, and audio data and/or image data collected by a third electronic device as the first audio data and/or the first image data; 
Jiao does not expressively teach “wherein the target space environment being a space environment where the first electronic device is located, the microphone array and/or the camera array being configured to adjust their collection ranges in the target space environment based on change information in the target space environment, the target application including one application or multiple applications of the same or different types.” 
However, Tanaka does teach wherein the target space environment being a space environment where the first electronic device is located, the microphone array and/or the camera array being configured to adjust their collection ranges in the target space environment based on change information in the target space environment, the target application including one application or multiple applications of the same or different types (“In addition, in the present embodiment, scanning for specifying a direction to an environmental noise source can be performed to improve voice recognition performance. At step S6, the control unit 1 narrows the sound collection range of the microphone unit 2a and changes the sound collection range.” Tanaka [0103]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Tanaka so as to improve performance of voice recognition of the target. In particular, when any other person's voice is generated as environmental noise to dictation target voice, performance of voice recognition of the target voice degrades so that dictation cannot reliably performed in some cases. Tanaka acknowledges that there is a desire to improve target voice detection and does so by adjusting the collection range of the microphone in the target space environment.

Regarding claim 12, Jiao discloses the processing device of claim 1, wherein the acquisition module is further configured to perform at least one of the alternatives below. Examiner has chosen to reject alternative 1. 
use a microphone array and/or a camera array of the processing device to collect audio data and/or image data in a target space environment as the first audio data and/or the first image data (“In some embodiments, the video collecting component further comprises a first microphone array and a voice processor; the first microphone array is used for collecting the first sound information; sending the first sound information to the voice processor” Jiao (Page 2, ¶14-15)); 
or, use audio data and/or image data from the target application as the first audio data and/or the first image data; 
or, use the audio data and/or the image data in the target space environment collected by the microphone array and/or the camera array of the processing device, and the audio data and/or the image data from the target application as the first audio data and/or the first image data; 
or, use the audio data and/or the image data in the target space environment collected by the microphone array and/or the camera array of the processing device, the audio data and/or the image data from the target application, and audio data and/or image data collected by a third electronic device as the first audio data and/or the first image data, the target space environment being a space environment where the processing device is located; 
Jiao does not expressively teach “wherein the microphone array and/or the camera array are configured to adjust their collection ranges in the target space environment based on change information in the target space environment, the target application including one application or multiple applications of the same or different types.”
However, Tanaka does teach wherein the target space environment being a space environment where the first electronic device is located, the microphone array and/or the camera array being configured to adjust their collection ranges in the target space environment based on change information in the target space environment, the target application including one application or multiple applications of the same or different types (“In addition, in the present embodiment, scanning for specifying a direction to an environmental noise source can be performed to improve voice recognition performance. At step S6, the control unit 1 narrows the sound collection range of the microphone unit 2a and changes the sound collection range.” Tanaka [0103]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Tanaka so as to improve performance of voice recognition of the target. In particular, when any other person's voice is generated as environmental noise to dictation target voice, performance of voice recognition of the target voice degrades so that dictation cannot reliably performed in some cases. Tanaka acknowledges that there is a desire to improve target voice detection and does so by adjusting the collection range of the microphone in the target space environment
  
Claims 4 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Ostap et. al (US 11350029 B1, hereinafter Ostap).

Regarding claim 4, Jiao discloses the method of claim 2, wherein performing at least one process on the first image data to obtain the target data to be outputted includes at least one of the alternatives below. Examiner has chosen to reject alternative 1.
performing at least one process on the first image data to obtain the target data to be outputted (For example, the middleware software can intercept each character in the second video, combining into a head portrait splicing image and outputting to the conference software, or middleware software can intercept and amplify some area in the 360 degrees panoramic picture, only the intercepted part is output to the conference software; or overlapping a small display window on the screenshot, for displaying the portrait picture of the appointed character (target data). Jiao (Page 12, ¶3));
or, performing at least one process on the first image data in response to obtaining instruction information generated by operations acting on the target application to obtain the target data to be outputted; 
or, performing at least one process on the first image data based on target space environment information and resource information of the first electronic device to obtain the target data to be outputted. 
Jiao does not expressively teach “based on the change information in the target space environment.”
However, Ostap does teach based on the change information in the target space environment (“Typically, the survey frames are analyzed at the beginning of the video-conferencing session, e.g., to detect conference participants 306, and periodically throughout the video-conferencing session to detect changes in the video-conferencing session, such as participants leaving, participants changing location, new participants joining, changes in participant activity (changes in who is speaking) and shifting participant engagement levels.” Ostap [39]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Ostap so as to improve the target data acquisition during a videoconference. Ostap acknowledges that there is a desire to optimize this and does so by prioritizing a certain region-of-interest during a videoconference to ensure the desired view of the target data is displayed. 

Regarding claim 14, Jiao discloses the processing device of claim 12, wherein the processing module is further configured to perform at least one of the alternatives below. Examiner has chosen to reject alternative 1. 
perform at least one process on the first image data to obtain the target data to be outputted (For example, the middleware software can intercept each character in the second video, combining into a head portrait splicing image and outputting to the conference software, or middleware software can intercept and amplify some area in the 360 degrees panoramic picture, only the intercepted part is output to the conference software; or overlapping a small display window on the screenshot, for displaying the portrait picture of the appointed character (target data). Jiao (Page 12, ¶3));
or, perform at least one process on the first image data in response to obtaining instruction information generated by operations acting on the target application to obtain the target data to be outputted; 
or, perform at least one process on the first image data based on configuration information and/or usage information of an output component for outputting the target data to be outputted to obtain the target data to be outputted.
Jiao does not expressively teach “based on the change information in the target space environment.”
However, Ostap does teach based on the change information in the target space environment (“Typically, the survey frames are analyzed at the beginning of the video-conferencing session, e.g., to detect conference participants 306, and periodically throughout the video-conferencing session to detect changes in the video-conferencing session, such as participants leaving, participants changing location, new participants joining, changes in participant activity (changes in who is speaking) and shifting participant engagement levels.” Ostap [39]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Ostap so as to improve the target data acquisition during a videoconference. Ostap acknowledges that there is a desire to optimize this and does so by prioritizing a certain region-of-interest during a videoconference to ensure the desired view of the target data is displayed.

Claims 5-6 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Feng (CN 113573161 A).

Regarding claim 5, Jiao discloses the method of claim 2, wherein performing at least one process on the first audio data and the first image data to obtain the target data to be outputted includes: processing a plurality of first audio data obtained through a control signal into target audio data (“the second microphone array, for collecting the second sound information, sending the second sound information to the signal processor” Jiao (Page 3, ¶5)); (“sending the third sound information to the data processing component, so that the data processing component forwards the third sound information to the electronic device, so that the electronic device plays the third sound information.” Jiao (Page 3, ¶9)); processing a plurality of first image data (“In some embodiments, the based on the second video determining video to be displayed, comprising: intercepting all the images in the second video; arranging all the images to obtain a fourth video; determining the fourth video as the video to be displayed, or superimposing the fourth video on the second video to obtain the video to be displayed” Jiao (Page 4, ¶1-4)) obtained through the control signal into target image data (“obtaining the trigger instruction generated by triggering the appointed area in the video to be displayed” Jiao (Page 14, ¶9)); wherein the control signal at least including a signal for triggering the microphone array and/or the camera array of the first electronic device to collect corresponding data (“obtaining the trigger instruction generated by triggering the appointed area in the video to be displayed” Jiao (Page 14, ¶9)).
Jiao does not expressively teach merging the target audio data and the target image data.
However, Feng does teach merging the target audio data and the target image data (“a fusion module, for extracting key video segment from the target video data, fusing the key video segment and the secondary song segment of the target audio data, obtaining the multimedia data comprising the key video segment and the secondary song segment of the target audio data.” Feng (P2, ¶13)).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Feng so as to improve the obtaining efficiency of the target data to meeting participants. Feng acknowledges that there is a desire to optimize obtaining efficiency and does so by fusing the target image audio and image data to be outputted together.

Regarding claim 6, Jiao discloses the method of claim 2, wherein performing at least one process on the first audio data and the first image data to obtain the target data to be outputted includes: determining a use mode of the first electronic device (“collecting mode”, “first structure mode”, “second structure mode” Jiao (Page 6, ¶6)); and selecting target audio data and target image data from the first audio data and the first image data based at least on the use mode (For example, in the first structure mode, at least two image collecting module 112 can collect the panoramic image, adapted to the panoramic collection mode of the conference scene; In the second structure mode, at least two image collecting module 112 can collect the image in a certain viewing angle range, adapted to the single direction collecting mode of the conference scene. In some embodiments, the first structure mode, at least two image collecting module 112 can be annularly set, the second structure mode, at least two image collecting module 112 can be set on the plane, or can be set in the preset angle range, to collect a certain visual angle under the video; so as to obtain the video of the specific view angle. Jiao (Page 6, ¶6)). 
Jiao does not expressively teach performing fusion processing on the target audio data and the target image data based at least one the use mode to obtain the target data to be outputted. 
However, Feng teaches performing fusion processing on the target audio data and the target image data based at least one the use mode to obtain the target data to be outputted (“a fusion module, for extracting key video segment from the target video data, fusing the key video segment and the secondary song segment of the target audio data, obtaining the multimedia data comprising the key video segment and the secondary song segment of the target audio data.” Feng (P2, ¶13)).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Feng so as to improve the obtaining efficiency of the target data to meeting participants. Feng acknowledges that there is a desire to optimize obtaining efficiency and does so by fusing the target image audio and image data to be outputted together.

Regarding claim 15, Jiao discloses the processing device of claim 12, wherein the processing module is further configured to: process a plurality of first audio data obtained through a control signal into target audio data (“the second microphone array, for collecting the second sound information, sending the second sound information to the signal processor” Jiao (Page 3, ¶5)); (“sending the third sound information to the data processing component, so that the data processing component forwards the third sound information to the electronic device, so that the electronic device plays the third sound information.” Jiao (Page 3, ¶9)); process a plurality of first image data (“In some embodiments, the based on the second video determining video to be displayed, comprising: intercepting all the images in the second video; arranging all the images to obtain a fourth video; determining the fourth video as the video to be displayed, or superimposing the fourth video on the second video to obtain the video to be displayed” Jiao (Page 4, ¶1-4)) obtained through the control signal into target image data (“obtaining the trigger instruction generated by triggering the appointed area in the video to be displayed”; Page 14, ¶9)); based on the control signal to obtain the target data to be outputted, the control signal at least including a signal for triggering the microphone array and/or the camera array of the first electronic device to collect corresponding data (“obtaining the trigger instruction generated by triggering the appointed area in the video to be displayed; based on the trigger instruction, intercepting the designated area in the video to be displayed; obtaining the video of the designated area; based on the specified area of the video, determining the target video; obtaining the trigger instruction generated by triggering the appointed area in the video to be displayed” Jiao (Page 14, ¶9)).
Jiao does not expressively teach “merging the target audio data and the target image data.”
However, Feng does teach merging the target audio data and the target image data (“a fusion module, for extracting key video segment from the target video data, fusing the key video segment and the secondary song segment of the target audio data, obtaining the multimedia data comprising the key video segment and the secondary song segment of the target audio data.” Feng (P2, ¶13)).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Feng so as to improve the obtaining efficiency of the target data to meeting participants. Feng acknowledges that there is a desire to optimize obtaining efficiency and does so by fusing the target image audio and image data to be outputted together.

Regarding claim 16, Jiao discloses the processing device of claim 12, wherein the processing module is further configured to: determine a use mode of the process device (“collecting mode”, “first structure mode”, “second structure mode” Jiao (Page 6, ¶6)); and select target audio data and target image data from the first audio data and the first image data based at least on the use mode (For example, in the first structure mode, at least two image collecting module 112 can collect the panoramic image, adapted to the panoramic collection mode of the conference scene; In the second structure mode, at least two image collecting module 112 can collect the image in a certain viewing angle range, adapted to the single direction collecting mode of the conference scene. In some embodiments, the first structure mode, at least two image collecting module 112 can be annularly set, the second structure mode, at least two image collecting module 112 can be set on the plane, or can be set in the preset angle range, to collect a certain visual angle under the video; so as to obtain the video of the specific view angle. Jiao (Page 6, ¶6)).
Jiao does not expressively teach to perform fusion processing on the target audio data and the target image data based at least one the use mode to obtain the target data to be outputted. 
However, Feng teaches to perform fusion processing on the target audio data and the target image data based at least one the use mode to obtain the target data to be outputted (“a fusion module, for extracting key video segment from the target video data, fusing the key video segment and the secondary song segment of the target audio data, obtaining the multimedia data comprising the key video segment and the secondary song segment of the target audio data.” Feng (P2, ¶13)).  
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Feng so as to improve the obtaining efficiency of the target data to meeting participants. Feng acknowledges that there is a desire to optimize obtaining efficiency and does so by fusing the target image audio and image data to be outputted together.

Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Albadawi et. al (US 2018017506 W, hereinafter Albadawi).
 
Regarding claim 7, Jiao discloses the method of claim 1, wherein performing at least one process on the first audio data and/or the first image data to obtain the target data to be outputted includes (“the voice processor is used for receiving the first sound information sent by the first microphone array; based on the first sound information, determining the sound direction information of the first sound information;” Jiao (Page 2, ¶16); Fig. 11). 
Jiao does not expressively teach obtaining system resource information of the first electronic device, determining a target algorithm set from an algorithm library preset by the first electronic device based on the system resource information, and performing corresponding processing on the first audio data and/or the first image data by using an algorithm model in the target algorithm set to obtain the target data to be outputted, the algorithm library being located in the first electronic device or in the space environment where the first electronic device is ls located, the target algorithm set being updated correspondingly based on changes in the system resource information; or, obtaining the system resource information of the first electronic device, optimizing an original algorithm model based on the system resource information, and performing the corresponding processing on the first audio data and/or the first image data by using the optimized target algorithm model or the target algorithm set to obtain the target data to be outputted, the target algorithm set or the target algorithm model being updated correspondingly based on changes in the system resource information.
However, Albadawi discloses at least one of the following alternatives below. Examiner choses to reject alternative 2.
obtaining system resource information of the first electronic device, determining a target algorithm set from an algorithm library preset by the first electronic device based on the system resource information, and performing corresponding processing on the first audio data and/or the first image data by using an algorithm model in the target algorithm set to obtain the target data to be outputted, the algorithm library being located in the first electronic device or in the space environment where the first electronic device is ls located, the target algorithm set being updated correspondingly based on changes in the system resource information; 
or, obtaining the system resource information of the first electronic device (“The selection module 914 may select from among the tracking algorithms based on the available computing resources of the computing device(s) with which tracking is performed.” Albadawi [157]), optimizing an original algorithm model based on the system resource information (“the selection module 914 may select the tracking algorithm that consumes the least power when used to process image data if the remaining battery life is less than a predetermined threshold (e.g., 10%), and may forego selection of the other tracking algorithms.” Albadawi [157]), and performing the corresponding processing on the first audio data and/or the first image data by using the optimized target algorithm model or the target algorithm set to obtain the target data to be outputted, the target algorithm set or the target algorithm model being updated correspondingly based on changes in the system resource information (“Algorithm feedback in this manner may enable other types of updates, including updates to a tracking algorithm by the face detection algorithm, and updates between two or more tracking algorithms, as described in further detail below with reference to FIG. 25.” Albadawi [162]; “The selection module 914 then determines a less computationally intensive manner in which to track the first and second people 930 and 932 by selecting one or more of the tracking algorithms 920-926.” Albadawi [165]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Albadawi so as to improve the processor load during image analysis. In order to prevent stalls and a degraded user experience, the tracking algorithm can be optimized such that the image processing does not consume more than the available share of logic processor resources (Albadawi [158]). Albadawi acknowledges that there is a desire to optimize processor load during image analysis, such as selecting a tracking algorithm that does not consume more than the available share of logic processor, i.e., less processor intensive (for further support see ¶157, ¶158, ¶162, and ¶165 of Albadawi).

Regarding claim 17, Jiao discloses the processing device of claim 11 (“the data processing component” Jiao (Page 2, ¶8)).
Jiao does not expressively teach to obtain system resource information of the processing device, determine a target algorithm set from an algorithm library preset by the processing device based on the system resource information, and perform corresponding processing on the first audio data and/or the first image data by using an algorithm model in the target algorithm set to obtain the target data to be outputted, the algorithm library being located in the processing device or in the space environment where the processing device is ls located, the target algorithm set being updated correspondingly based on changes in the system resource information; or, obtain the system resource information of the processing device, optimize an original algorithm model based on the system resource information, and perform the corresponding processing on the first audio data and/or the first image data by using the optimized target algorithm model or the target algorithm set to obtain the target data to be outputted, the target algorithm set or the target algorithm model being updated correspondingly based on changes in the system resource information.
However, Albadawi discloses at least one of the following alternatives below. Examiner choses to reject alternative 2. 
to obtain system resource information of the processing device, determine a target algorithm set from an algorithm library preset by the processing device based on the system resource information, and perform corresponding processing on the first audio data and/or the first image data by using an algorithm model in the target algorithm set to obtain the target data to be outputted, the algorithm library being located in the processing device or in the space environment where the processing device is located, the target algorithm set being updated correspondingly based on changes in the system resource information; 
or, obtain the system resource information of the processing device (“The selection module 914 may select from among the tracking algorithms based on the available computing resources of the computing device(s) with which tracking is performed.” Albadawi [157]), optimize an original algorithm model based on the system resource information (“the selection module 914 may select the tracking algorithm that consumes the least power when used to process image data if the remaining battery life is less than a predetermined threshold (e.g., 10%), and may forego selection of the other tracking algorithms.” Albadawi [157]), and perform the corresponding processing on the first audio data and/or the first image data by using the optimized target algorithm model or the target algorithm set to obtain the target data to be outputted, the target algorithm set or the target algorithm model being updated correspondingly based on changes in the system resource information (“Algorithm feedback in this manner may enable other types of updates, including updates to a tracking algorithm by the face detection algorithm, and updates between two or more tracking algorithms, as described in further detail below with reference to FIG. 25.” Albadawi [162]; “The selection module 914 then determines a less computationally intensive manner in which to track the first and second people 930 and 932 by selecting one or more of the tracking algorithms 920-926.” Albadawi [165]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Albadawi so as to improve the processor load during image analysis. In order to prevent stalls and a degraded user experience, the tracking algorithm can be optimized such that the image processing does not consume more than the available share of logic processor resources (Albadawi [158]). Albadawi acknowledges that there is a desire to optimize processor load during image analysis, such as selecting a tracking algorithm that does not consume more than the available share of logic processor, i.e., less processor intensive (for further support see ¶157, ¶158, ¶162, and ¶165 of Albadawi).
 
Claims 8 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Bu et. al (CN 112866619 A, hereinafter Bu). 

Regarding claim 8, Jiao discloses the method of claim 2, wherein outputting the target data to be outputted to the target application running on the second electronic device having the communication connection with the first electronic device includes at least one of the alternatives below. Examiner choses to reject alternative 1. 
if the first audio data and/or the first image data includes audio data and/or image data from a first target application (“sending the to-be-displayed video to the electronic device and/or display component, so that the electronic device and/or the display component displays the video to be displayed” Jiao (Page 3, ¶14)). 
or, if the first audio data and/or the first image data includes the audio data and/or the image data from the first target application, outputting the target data to be outputted to a third target application identical to the first target application, the first target application and the third target application being ran on different second electronic devices; 
or, in response to obtaining sharing request from the first target application, the sharing request including a sharing object of the target data to be outputted, outputting the target data to be outputted to a fourth target application corresponding to the sharing object, the fourth /target application and the first target application being the same or different applications running on different second electronic devices. 
Jiao does not expressively teach the entirety of alternative 1: outputting the target data to be outputted to the second target application different from the first target application, the first target application and the second target application being ran on different second electronic devices.
However, Bu discloses the remainder of alternative 1: outputting the target data to be outputted to the second target application different from the first target application, the first target application and the second target application being ran on different second electronic devices (“taking the network easy cloud conference as an example, the service end can run the transfer server, receiving remote control instruction sent by the external computer (i.e., remote control terminal), opening the network easy cloud conference software application, and login by the conference account of the network easy cloud conference, then initiating a remote conference; the participants of other remote conference are added in, or an existing remote conference is added. the transfer server of the service end also can open the remote conference login page through the browser, login by the conference account number, then initiating a remote conference, the participants of other remote conference are added in, or adding an existing remote conference.” Bu (Pages 6-7, ¶10)). 
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Bu. The motivation for doing so being to allow multiple devices to view the target data in the target application across multiple electronic devices. 

Regarding claim 18, Jiao discloses the processing device of claim 12, wherein the output module is further configured to perform at least one of the alternatives below. Examiner choses to reject alternative 1. 
if the first audio data and/or the first image data includes audio data and/or image data from a first target application (“sending the to-be-displayed video to the electronic device and/or display component, so that the electronic device and/or the display component displays the video to be displayed” Jiao (Page 3, ¶14)).
or, if the first audio data and/or the first image data includes the audio data and/or the image data from the first target application, transmit the target data to be outputted to a third target application identical to the first target application, the first target application and the third target application being ran on different second electronic devices; or, in response to obtaining sharing request from the first target application, the sharing request including a sharing object of the target data to be outputted, transmit the target data to be outputted to a fourth target application corresponding to the sharing object, the fourth /target application and the first target application being the same or different applications running on different second electronic devices.
Jiao does not expressively teach the entirety of alternative 1: transmit the target data to be outputted to the second target application different from the first target application, the first target application and the second target application being ran on different second electronic devices. 
However, Bu discloses the remainder of alternative 1: transmit the target data to be outputted to the second target application different from the first target application, the first target application and the second target application being ran on different second electronic devices (“taking the network easy cloud conference as an example, the service end can run the transfer server, receiving remote control instruction sent by the external computer (i.e., remote control terminal), opening the network easy cloud conference software application, and login by the conference account of the network easy cloud conference, then initiating a remote conference; the participants of other remote conference are added in, or an existing remote conference is added. the transfer server of the service end also can open the remote conference login page through the browser, login by the conference account number, then initiating a remote conference, the participants of other remote conference are added in, or adding an existing remote conference.” Bu (Pages 6-7, ¶10)). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Bu. The motivation for doing so being to allow multiple devices to view the target data in the target application across multiple electronic devices.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Jiao (CN112887654 A) in view of Witriol et. al (US 12294919 B2, hereinafter Witriol). 

Regarding claim 20, Jiao discloses the processing device of claim 16. 
Jiao does not expressively teach wherein: the use mode of the process device includes at least one of the alternatives below. Examiner choses to reject alternative 2. 
a whiteboard mode, 
a speech mode, 
a comparison mode, 
a display mode. 
However, Witriol does teach wherein: the use mode of the process device includes at least a whiteboard mode, speech mode (“a speech mode” Witriol [32]; Fig. 3, 303), a comparison mode, a display mode. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Jiao with Witriol. Videoconferences may have different use cases, i.e. conference meeting with twenty participants or a personal one-on-one, which could necessitate for a variety of use modes to better improve the user experience. Witriol acknowledges that there is a motivation to do so through including a variety of use modes, thereby improving the user experience. 
 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SAAD AHMED SYED whose telephone number is (571)272-6777. The examiner can normally be reached Monday - Friday 8:30 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Duc Nguyen can be reached on (571) 272-7503. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/SAAD AHMED SYED/Examiner, Art Unit 2691              

/DUC NGUYEN/Supervisory Patent Examiner, Art Unit 2691
Patent Application 18176884 - PROCESSING METHOD AND ELECTRONIC DEVICE - Rejection

Patent Application 18176884 - PROCESSING METHOD AND ELECTRONIC DEVICE

Application Information

Rejection Summary

Cited Patents

Office Action Text

(Ad) Transform your business with AI in minutes, not months

Transform your business with AI in minutes, not months