Patent Application 18077465 - APPARATUS AND METHOD WITH IMAGE PROCESSING AND

Title: APPARATUS AND METHOD WITH IMAGE PROCESSING AND TARGET TRACKING
Application Information

Invention Title: APPARATUS AND METHOD WITH IMAGE PROCESSING AND TARGET TRACKING
Application Number: 18077465
Submission Date: 2025-05-12T00:00:00.000Z
Effective Filing Date: 2022-12-08T00:00:00.000Z
Filing Date: 2022-12-08T00:00:00.000Z
National Class: 382
National Sub-Class: 100000
Examiner Employee Number: 85722
Art Unit: 2673
Tech Center: 2600
Rejection Summary

102 Rejections: 0
103 Rejections: 1
Cited Patents

The following patents were cited in the rejection:
US 6398646🔗
Office Action Text


    DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101 
35 U.S.C. 101 reads as follows:  
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. 
 
 	Claims 18 and 19 are rejected under 35 U.S.C. 101 Abstract idea. 
35 U.S.C. 101 requires that a claimed invention must fall within one of the four eligible categories of invention (i.e. process, machine, manufacture, or composition of matter) and must not be directed to subject matter encompassing a judicially recognized exception as interpreted by the courts.  MPEP 2106.  Three categories of subject matter are found to be judicially recognized exceptions to 35 U.S.C. § 101 (i.e. patent ineligible) (1) laws of nature, (2) physical phenomena, and (3) abstract ideas.  MPEP 2106(II).  To be patent-eligible, a claim directed to a judicial exception must as whole be directed to significantly more than the exception itself.  See 2014 Interim Guidance on Patent Subject Matter Eligibility, 79 Fed. Reg. 74618, 74624 (Dec. 16, 2014).  Hence, the claim must describe a process or product that applies the exception in a meaningful way, such that it is more than a drafting effort designed to monopolize the exception.  
  
Claims 18 and 19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., an abstract idea) without significantly more.  Claim 18 is directed to augmenting or synthesizing image data of image area of an image using augmentation methods and selecting image area of the N augmented image areas. The features are similar to carrying out mental process of augmenting various data in given image in paper or digital form and creating one augmented image using various methods including selecting specific objects in image or their features such as color or contrast of objects using pencil and paper and selecting one specific target object in augmented data preset such as any specific object of interest. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. The claims do not impose any requirements that would make the invention impossible to carry out manually. It is also similar to organizing human activity of storing, retrieving, combining and tracking image areas. There is no inventive concept sufficient to transform claimed subject matter patent eligible.  (Planet Bingo, LLC v. VKGS LLC (US Patent No. 6398646). (Bilski and Alice Corp.). 
 
Claim 19 is directed to performing target tracking based on the target image area similar to tracking one specific object in images using image area feature carried out as mental process by human using pen and paper. 

Therefore, dependent claim does not add significantly more to the steps of claim 18. Therefore, claim 19 is similarly rejected. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-8 and 13-20 are rejected under 35 U.S.C. 103 as being unpatentable over Rhee et al. (KR 2019023389 A1, as provided) in view of Koivisto et al. (US Pub No. 2019022776 A1).  

Regarding Claim 1,
Rhee discloses A target tracking method comprising: determining whether data augmentation is to be used to augment a target tracking process; (Rhee, Overview of Multi-class multi-object tracker, discloses the object is represented by the proposed regions traced by the tracking algorithm. The tracking algorithm divides the tracking trajectory into several segments based on likelihood calculation, considering possible interactions or occlusion. Taking into account the decision of the easy-to-follow tracker, we observe within the segment, compare the localized bounding boxes, and monitor sudden changes through change point detection. The motion-based tracking component adaptively applies the Lucas-Kanade tracker to predict the area of the next tracking point from the current tracking point. In the present invention, a multi-class object detector based on a deep feature point is used as a global object detector and a local object detector. At this time, the number of object categories may be expanded according to the capability of the object detector; objects withing video image data are processed to determine if the segments obtained are to be combined (augmentation) or not)


based on determining that data augmentation is to be used, performing the target tracking process based on an augmented image area obtained by the data augmentation on an image area; (Rhee, Overview of Multi-class multi-object tracker, discloses the object is represented by the proposed regions traced by the tracking algorithm. The tracking algorithm divides the tracking trajectory into several segments based on likelihood calculation, considering possible interactions or occlusion. Taking into account the decision of the easy-to-follow tracker, we observe within the segment, compare the localized bounding boxes, and monitor sudden changes through change point detection. The motion-based tracking component adaptively applies the Lucas-Kanade tracker to predict the area of the next tracking point from the current tracking point. In the present invention, a multi-class object detector based on a deep feature point is used as a global object detector and a local object detector. At this time, the number of object categories may be expanded according to the capability of the object detector; objects (target tracking) within video image data are processed to determine if the segments (image area) obtained are to be combined (augmentation) or not) and 

Rhee does not explicitly disclose outputting a tracking result generated by the target tracking process.  
Koivisto discloses outputting a tracking result generated by the target tracking process.  (Koivisto, [0061], [0317] FIG. 16, the object detector 106 may comprise one or more machine learning models trained to generate the detected object data from features extracted from the sensor data (e.g., the image data). In some examples, the object detector 106 is configured to determine a set of detected object data (e.g., a coverage value and detected object region and/or location) for each spatial element region of a field of view and/or image. Locations and areas of the spatial element regions may be defined by corresponding spatial elements (e.g., outputs) of the object detector 106. For example, the spatial element regions for the same spatial element for different field of view(s) and/or images may be in a same location and a same area, which corresponds to the spatial element. In various examples, a spatial element may also refer to a grid cell, an output cell, a super-pixel, and/or an output pixel of the object detector 106; computing device 1600 suitable for use in implementing some embodiments of the present disclosure. Computing device 1600 may include a bus 1602 that directly or indirectly couples the following devices: memory 1604, one or more central processing units (CPUs) 1606, one or more graphics processing units (GPUs) 1608, a communication interface 1610, input/output (I/O) ports 1612, input/output components 1614, a power supply 1616, and one or more presentation components 1618 (e.g., display(s); tracking object results are output).
	
Rhee discloses the claimed invention except for the (missing element). Koivisto teaches that it is known to output results obtained as images are processed to determine their augmentation. It would have been obvious to one having ordinary skill in the art at the time the invention was made to use modification in Rhee that tracks multiple images and processes to determine tracking results if augmentation is applied or not as output to the user or display systems, as taught by Koivisto in order to improve the tracking of results to the user. 

Regarding Claim 2, 
The combination of Rhee and Koivisto discloses outputting, as the tracking result, a first original tracking result obtained from the target tracking process based on the image area, when it is determined that the data augmentation is not to be used. (Rhee, Summary of the invention, discloses a recording medium embodied as computer-readable code for multiclass multi-object tracking of an input video sequence, the method comprising the steps of: determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments; detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments. According to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of a proposed region that ideally moves a multi-object unlimited-class which changes in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment; change detection (tracking results of targets moving in dynamic frames) of target tracking of objects in images are determined in static and dynamic (moving environment or images) and combination (augmentation) of such image segments (regions) of such targets are applied or not is determined). 

Regarding Claim 3, 
The combination of Rhee and Koivisto discloses wherein the determining the data augmentation is to be used comprises: obtaining a first original tracking result by performing the target tracking process based on the image area; and 
determining whether the data augmentation is to be used according to the first original tracking result.  (Rhee, Summary of the invention, discloses a recording medium embodied as computer-readable code for multiclass multi-object tracking of an input video sequence, the method comprising the steps of: determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments; detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments. According to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of a proposed region that ideally moves a multi-object unlimited-class which changes in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment; change detection (tracking results of targets moving in dynamic frames) of target tracking of objects in images are determined in static and dynamic (moving environment or images) and combination (augmentation) of such image segments (regions) of such targets are applied or not is determined). (Koivisto, [0128], discloses augmented images may be passed to the object detector 106 to perform forward pass computations. The object detector 106 may perform feature extraction and prediction on a per spatial element basis (e.g., prediction of object classes, bounding boxes, and/or other outputs on a per grid square basis). Loss functions may simultaneously measure the error in the tasks of predicting the various outputs (e.g., the object coverage for each object class and regressing object bounding box coordinates, or more generally in all tasks when additional outputs are included).  

Regarding Claim 4, 
The combination of Rhee and Koivisto discloses wherein the first original tracking result comprises a first original predicted position of a tracked target and a first original confidence score corresponding to the first original predicted position of the tracked target, and the determining of whether the data augmentation is to be used .  (Rhee, Summary of the invention, discloses a recording medium embodied as computer-readable code for multiclass multi-object tracking of an input video sequence, the method comprising the steps of: determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments; detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments. According to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of a proposed region that ideally moves a multi-object unlimited-class which changes in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment; change detection (tracking results of targets moving in dynamic frames) of target tracking of objects in images are determined in static and dynamic (moving environment or images) and combination (augmentation) of such image segments (regions) of such targets are applied or not is determined). is based on the first original confidence score.  (Koivisto, [0053], Fig. 1B, discloses an example process 118 for detecting objects and determining corresponding detection confidence scores, in accordance with some embodiments of the present disclosure. The object detector 106 may be configured to analyze sensor data, such as image data, received from the communications manager 104 and generate detected object data that is representative of detected objects captured in the sensor data. The detected object clusterer 108 may be configured to generate or determine one or more clusters of the detected objects based at least in part on the detected object data. The feature determiner 110 may be configured to generate or determine features of the clusters for use as inputs to the confidence score generator 112. The confidence score generator 112 may be configured to compute confidence scores for one or more of the clusters based at least in part on the inputs. The object tracker 114 may be configured to track objects and/or detected objects across frames (e.g., video frames) and/or images, such as in a time-domain. The detected object filter 116 of FIG. 1A may include one or more of a detected object filter 116A configured to filter detected objects from the detected object data, or a detected object filter 116B configured to filter clusters from the clusters of detected objects; confidence score of features in image regions are determined in target tracking and afterwards the object detection is performed for image frames). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim. 

Regarding Claim 5, 
The combination of Rhee and Koivisto discloses wherein the determining of whether the data augmentation is to be used, according to the first original confidence score, comprises: when the first original confidence score is within a first preset score range, determining that the data augmentation is to be used.  (Koivisto, [0053], [0080], [0127-0128], Fig. 1B, discloses object detector 106 (or the object detector 306) may be trained using various possible approaches. In some examples, the object detector 106 may be trained in a fully supervised manner Training images together with their labels may be grouped in minibatches, where the size of the minibatches may be a tunable hyperparameter, in some examples. Each minibatch may be passed to an online data augmentation layer which may apply transformations to images in that minibatch. The data augmentation may be used to alleviate possible overfitting of the object detector 106 to the training data. The data augmentation transformations may include (but are not limited to) spatial transformations such as left-right flipping, zooming-in/-out, random translations, etc., color transformations such as hue, saturation and contrast adjustment, or additive noise. Labels may be transformed to reflect corresponding transformations made to training images; Augmented images may be passed to the object detector 106 to perform forward pass computations. The object detector 106 may perform feature extraction and prediction on a per spatial element basis (e.g., prediction of object classes, bounding boxes, and/or other outputs on a per grid square basis). Loss functions may simultaneously measure the error in the tasks of predicting the various outputs (e.g., the object coverage for each object class and regressing object bounding box coordinates, or more generally in all tasks when additional outputs are included; an example process 118 for detecting objects and determining corresponding detection confidence scores, in accordance with some embodiments of the present disclosure. The object detector 106 may be configured to analyze sensor data, such as image data, received from the communications manager 104 and generate detected object data that is representative of detected objects captured in the sensor data. The detected object clusterer 108 may be configured to generate or determine one or more clusters of the detected objects based at least in part on the detected object data. The feature determiner 110 may be configured to generate or determine features of the clusters for use as inputs to the confidence score generator 112. The confidence score generator 112 may be configured to compute confidence scores for one or more of the clusters based at least in part on the inputs. The object tracker 114 may be configured to track objects and/or detected objects across frames (e.g., video frames) and/or images, such as in a time-domain. The detected object filter 116 of FIG. 1A may include one or more of a detected object filter 116A configured to filter detected objects from the detected object data, or a detected object filter 116B configured to filter clusters from the clusters of detected objects; the detected object filter 116B may be used to filter aggregated detections based on associated coverage scores and/or other criteria (e.g., any of the various information described herein that may be represented by the associated detected object data, aggregated detected object data, or extracted therefrom). For example, an aggregated detection may be retained based at least in part on the confidence score exceeding a threshold (e.g., an adjustable value). This filtering may be performed to reduce false positives; confidence score of features in image regions are determined in target tracking and afterwards the object detection is performed for image frames). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim. 



Regarding Claim 6, 
The combination of Rhee and Koivisto discloses wherein the performing of the target tracking process based on the augmented image area comprises: 
obtaining N tracking results by performing the target tracking process based on N augmented image areas; (Rhee, Multi Object Tracking, discloses recent multi-object tracking studies are focused on detection-based-tracking theory for performing data clustering that links object detection, present in continuous images. The offline based multi-object tracking method uses the information of the pre-frame to perform better multi-object tracking using hierarchical track clustering, network flow, and global tracking trajectory optimization. However, offline methods require high computational complexity. On the other hand, the on-line method requires only a small amount of computation to use only the past and present frame information to solve a multi-object tracking problem. The on-line method is more suitable for real-world problems, but drifts easily occur due to noise, light intensity, pose, camera angle, shadow, changes in the number of objects, and sudden changes. The number of dynamically changing objects is difficult to handle, especially in crowded or crowded objects. Most multi-object tracking methods rely on observations based on different kinds of features and are susceptible to drift. For these nonstationary and nonlinearities, probability-based tracking methods such as Kalman filters or particle filters are known to dominate deterministic based tracking methods; multiple continuous (N number) images are obtained and tracked and processed to determine augmentation on image data based on change detection of image regions) and 

determining a first augmented tracking result by comparing the N tracking results to one another, wherein the N augmented image areas are obtained by augmenting data on the image area by using N data augmentation processing methods, respectively.  (Rhee, Summary of the invention, discloses a recording medium embodied as computer-readable code for multiclass multi-object tracking of an input video sequence, the method comprising the steps of: determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments; detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments. According to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of a proposed region that ideally moves a multi-object unlimited-class which changes in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment; change detection (tracking results of targets moving in dynamic frames) of target tracking of objects in images are determined in static and dynamic (moving environment or images) and combination (augmentation) of such image segments (regions) of such targets are applied or not is determined). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim. 

Regarding Claim 7, 
The combination of Rhee and Koivisto discloses wherein the outputting of the tracking result comprises: when a first augmentation confidence score included in the first augmented tracking result is within a second preset score interval, outputting the first augmented tracking result as the tracking result; and when the first augmented confidence score included in the first augmented tracking result is not within the second preset score interval, outputting a first original tracking result as the tracking result.  (Rhee, Summary of the invention, discloses a recording medium embodied as computer-readable code for multiclass multi-object tracking of an input video sequence, the method comprising the steps of: determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments; detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments. According to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of a proposed region that ideally moves a multi-object unlimited-class which changes in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment; change detection (tracking results of targets moving in dynamic frames) of target tracking of objects in images are determined in static and dynamic (moving environment or images) and combination (augmentation) of such image segments (regions) of such targets are applied or not is determined). (Koivisto, [0053], Fig. 1B, discloses an example process 118 for detecting objects and determining corresponding detection confidence scores, in accordance with some embodiments of the present disclosure. The object detector 106 may be configured to analyze sensor data, such as image data, received from the communications manager 104 and generate detected object data that is representative of detected objects captured in the sensor data. The detected object clusterer 108 may be configured to generate or determine one or more clusters of the detected objects based at least in part on the detected object data. The feature determiner 110 may be configured to generate or determine features of the clusters for use as inputs to the confidence score generator 112. The confidence score generator 112 may be configured to compute confidence scores for one or more of the clusters based at least in part on the inputs. The object tracker 114 may be configured to track objects and/or detected objects across frames (e.g., video frames) and/or images, such as in a time-domain. The detected object filter 116 of FIG. 1A may include one or more of a detected object filter 116A configured to filter detected objects from the detected object data, or a detected object filter 116B configured to filter clusters from the clusters of detected objects; confidence score of features in image regions are determined in target tracking and afterwards the object detection is performed for image frames). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim. 



Regarding Claim 8, 
The combination of Rhee and Koivisto discloses wherein the image area comprises a template image area comprising a tracked target within a frame image or a search area within the frame image.  (Rhee, Summary of the invention, discloses a recording medium embodied as computer-readable code for multiclass multi-object tracking of an input video sequence, the method comprising the steps of: determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments; detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments. According to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of a proposed region that ideally moves a multi-object unlimited-class which changes in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment; change detection (tracking results of targets moving in dynamic frames) of target tracking of objects in images are determined in static and dynamic (moving environment or images) and combination (augmentation) of such image segments (regions) of such targets are applied or not is determined). .  (Koivisto, [0053], Fig. 1B, discloses an example process 118 for detecting objects and determining corresponding detection confidence scores, in accordance with some embodiments of the present disclosure. The object detector 106 may be configured to analyze sensor data, such as image data, received from the communications manager 104 and generate detected object data that is representative of detected objects captured in the sensor data. The detected object clusterer 108 may be configured to generate or determine one or more clusters of the detected objects based at least in part on the detected object data. The feature determiner 110 may be configured to generate or determine features of the clusters for use as inputs to the confidence score generator 112. The confidence score generator 112 may be configured to compute confidence scores for one or more of the clusters based at least in part on the inputs. The object tracker 114 may be configured to track objects and/or detected objects across frames (e.g., video frames) and/or images, such as in a time-domain. The detected object filter 116 of FIG. 1A may include one or more of a detected object filter 116A configured to filter detected objects from the detected object data, or a detected object filter 116B configured to filter clusters from the clusters of detected objects; confidence score of features in image regions are determined in target tracking and afterwards the object detection is performed for image frames). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim. 
Regarding Claim 13, 
The combination of Rhee and Koivisto discloses wherein the N augmented image areas are obtained by augmenting data on the image area, using the N data augmentation processing methods, respectively, through obtaining an augmented image area corresponding to the image area by augmenting the image area, using augmentation chains for each of the N data augmentation processing methods, respectively.   (Rhee, Summary of the invention, discloses a recording medium embodied as computer-readable code for multiclass multi-object tracking of an input video sequence, the method comprising the steps of: determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments; detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments. According to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of a proposed region that ideally moves a multi-object unlimited-class which changes in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment; change detection (tracking results of targets moving in dynamic frames) of target tracking of objects in images are determined in static and dynamic (moving environment or images) and combination (augmentation) of such image segments (regions) of such targets are applied or not is determined). .  (Koivisto, [0053], Fig. 1B, discloses an example process 118 for detecting objects and determining corresponding detection confidence scores, in accordance with some embodiments of the present disclosure. The object detector 106 may be configured to analyze sensor data, such as image data, received from the communications manager 104 and generate detected object data that is representative of detected objects captured in the sensor data. The detected object clusterer 108 may be configured to generate or determine one or more clusters of the detected objects based at least in part on the detected object data. The feature determiner 110 may be configured to generate or determine features of the clusters for use as inputs to the confidence score generator 112. The confidence score generator 112 may be configured to compute confidence scores for one or more of the clusters based at least in part on the inputs. The object tracker 114 may be configured to track objects and/or detected objects across frames (e.g., video frames) and/or images, such as in a time-domain. The detected object filter 116 of FIG. 1A may include one or more of a detected object filter 116A configured to filter detected objects from the detected object data, or a detected object filter 116B configured to filter clusters from the clusters of detected objects; confidence score of features in image regions are determined in target tracking and afterwards the object detection is performed for image frames). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim. 


Regarding Claim 14, 
The combination of Rhee and Koivisto discloses wherein the obtaining of the augmented image area corresponding to the image area by augmenting the image area using the augmentation chains comprises: performing an augmentation processing on the image area using each augmentation chain; and obtaining the augmented image area corresponding to the image area by performing weighted combination output results of the respective augmentation chains.  (Rhee, Summary of the invention, discloses a recording medium embodied as computer-readable code for multiclass multi-object tracking of an input video sequence, the method comprising the steps of: determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments; detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments. According to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of a proposed region that ideally moves a multi-object unlimited-class which changes in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment; change detection (tracking results of targets moving in dynamic frames) of target tracking of objects in images are determined in static and dynamic (moving environment or images) and combination (augmentation) of such image segments (regions) of such targets are applied or not is determined).  (Koivisto, [0048], [0053], [0127-0128], Fig. 1B, discloses determining a coverage value for a spatial element region when the spatial element region corresponds to multiple object regions (indicating different objects). In some examples, the system may determine which object region is the active object region and may use a coverage value for the active object region as the coverage value for the spatial element region (or give the active object region greater weight than the other object regions). The active object region may, in some cases, be set to the object region for the object that has the highest coverage value of the objects. Additionally or alternatively, the active object region may be set to the closest or front most of the objects in the image; an example process 118 for detecting objects and determining corresponding detection confidence scores, in accordance with some embodiments of the present disclosure. The object detector 106 may be configured to analyze sensor data, such as image data, received from the communications manager 104 and generate detected object data that is representative of detected objects captured in the sensor data. The detected object clusterer 108 may be configured to generate or determine one or more clusters of the detected objects based at least in part on the detected object data. The feature determiner 110 may be configured to generate or determine features of the clusters for use as inputs to the confidence score generator 112. The confidence score generator 112 may be configured to compute confidence scores for one or more of the clusters based at least in part on the inputs. The object tracker 114 may be configured to track objects and/or detected objects across frames (e.g., video frames) and/or images, such as in a time-domain. The detected object filter 116 of FIG. 1A may include one or more of a detected object filter 116A configured to filter detected objects from the detected object data, or a detected object filter 116B configured to filter clusters from the clusters of detected objects; confidence score of features in image regions are determined in target tracking and afterwards the object detection is performed for image frames and weighted combination of image regions is performed according to image region objects weights and minibatches are augmentation chains passed on to other online augmentation layers (chains) for augmentation based on weights of regions in images determined). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim. 


Regarding Claim 15, 
The combination of Rhee and Koivisto discloses wherein at least one of the augmentation chains is randomly selected from among augmentation chain candidates.  (Koivisto, [0127-0129], discloses object detector 106 (or the object detector 306) may be trained using various possible approaches. In some examples, the object detector 106 may be trained in a fully supervised manner Training images together with their labels may be grouped in minibatches, where the size of the minibatches may be a tunable hyperparameter, in some examples. Each minibatch may be passed to an online data augmentation layer which may apply transformations to images in that minibatch. The data augmentation may be used to alleviate possible overfitting of the object detector 106 to the training data. The data augmentation transformations may include (but are not limited to) spatial transformations such as left-right flipping, zooming-in/-out, random translations, etc., color transformations such as hue, saturation and contrast adjustment, or additive noise. Labels may be transformed to reflect corresponding transformations made to training images; augmented images may be passed to the object detector 106 to perform forward pass computations. The object detector 106 may perform feature extraction and prediction on a per spatial element basis (e.g., prediction of object classes, bounding boxes, and/or other outputs on a per grid square basis). Loss functions may simultaneously measure the error in the tasks of predicting the various outputs (e.g., the object coverage for each object class and regressing object bounding box coordinates, or more generally in all tasks when additional outputs are included); component losses for the various outputs may be combined together in a single loss function that applies to the whole minibatch (see further discussion of potential cost functions). Then, backward pass computations may take place to recursively compute gradients of the cost function with respect to trainable parameters (typically at least the weights and biases of the object detector 106, but not limited to this as there may be other trainable parameters, e.g. when batch normalization is used). Forward and backward pass computations may typically be handled by a deep learning framework and software stack underneath; any minibatches (random augmentation chain) is passed onto other augmentation layer). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim.  


Regarding Claim 16, 
The combination of Rhee and Koivisto discloses wherein each of the augmentation chain candidates is linked to at least one augmentation primitive among a plurality of augmentation primitives.   (Koivisto, [0127-0129], discloses object detector 106 (or the object detector 306) may be trained using various possible approaches. In some examples, the object detector 106 may be trained in a fully supervised manner Training images together with their labels may be grouped in minibatches, where the size of the minibatches may be a tunable hyperparameter, in some examples. Each minibatch may be passed to an online data augmentation layer which may apply transformations to images in that minibatch. The data augmentation may be used to alleviate possible overfitting of the object detector 106 to the training data. The data augmentation transformations may include (but are not limited to) spatial transformations such as left-right flipping, zooming-in/-out, random translations, etc., color transformations such as hue, saturation and contrast adjustment, or additive noise. Labels may be transformed to reflect corresponding transformations made to training images; augmented images may be passed to the object detector 106 to perform forward pass computations. The object detector 106 may perform feature extraction and prediction on a per spatial element basis (e.g., prediction of object classes, bounding boxes, and/or other outputs on a per grid square basis). Loss functions may simultaneously measure the error in the tasks of predicting the various outputs (e.g., the object coverage for each object class and regressing object bounding box coordinates, or more generally in all tasks when additional outputs are included); component losses for the various outputs may be combined together in a single loss function that applies to the whole minibatch (see further discussion of potential cost functions). Then, backward pass computations may take place to recursively compute gradients of the cost function with respect to trainable parameters (typically at least the weights and biases of the object detector 106, but not limited to this as there may be other trainable parameters, e.g. when batch normalization is used). Forward and backward pass computations may typically be handled by a deep learning framework and software stack underneath; any minibatches (random augmentation chain) is passed onto (linked) other augmentation layer). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim.  

Regarding Claim 17, 
The combination of Rhee and Koivisto discloses wherein the augmentation primitives comprise a contrast primitive, a color primitive, a brightness primitive, a sharpness primitive, a clipping primitive, a blur primitive, and a translation primitive. (Koivisto, [0127-0129], discloses object detector 106 (or the object detector 306) may be trained using various possible approaches. In some examples, the object detector 106 may be trained in a fully supervised manner Training images together with their labels may be grouped in minibatches, where the size of the minibatches may be a tunable hyperparameter, in some examples. Each minibatch may be passed to an online data augmentation layer which may apply transformations to images in that minibatch. The data augmentation may be used to alleviate possible overfitting of the object detector 106 to the training data. The data augmentation transformations may include (but are not limited to) spatial transformations such as left-right flipping, zooming-in/-out, random translations, etc., color transformations such as hue, saturation and contrast adjustment, or additive noise. Labels may be transformed to reflect corresponding transformations made to training images; augmented images may be passed to the object detector 106 to perform forward pass computations. The object detector 106 may perform feature extraction and prediction on a per spatial element basis (e.g., prediction of object classes, bounding boxes, and/or other outputs on a per grid square basis). Loss functions may simultaneously measure the error in the tasks of predicting the various outputs (e.g., the object coverage for each object class and regressing object bounding box coordinates, or more generally in all tasks when additional outputs are included); component losses for the various outputs may be combined together in a single loss function that applies to the whole minibatch (see further discussion of potential cost functions). Then, backward pass computations may take place to recursively compute gradients of the cost function with respect to trainable parameters (typically at least the weights and biases of the object detector 106, but not limited to this as there may be other trainable parameters, e.g. when batch normalization is used). Forward and backward pass computations may typically be handled by a deep learning framework and software stack underneath; any minibatches (random augmentation chain) is passed onto (linked) other augmentation layer based on primitives including color, translation, contrast, hue, saturation). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim.  

Claims 18 and 19 recite method with steps corresponding to the method steps  recited in Claims 1 and 2 respectively. Therefore, the recited steps of the method Claims 18 and 19 are mapped to the proposed combination in the same manner as the corresponding steps of Claims 1 and 2 respectively. Additionally, the rationale and motivation to combine the Rhee and Koivisto references presented in rejection of Claim 1, apply to these claims.
Furthermore, the combination of Rhee and Koivisto further discloses obtaining N augmented image areas by augmenting data on an image area of an image, using N data augmentation processing methods, respectively; and selecting a target image area among the N augmented image areas, according to a preset criterion. (Rhee, Summary of the invention, discloses a recording medium embodied as computer-readable code for multiclass multi-object tracking of an input video sequence, the method comprising the steps of: determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments; detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments. According to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of a proposed region that ideally moves a multi-object unlimited-class which changes in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment; change detection (tracking results of targets moving in dynamic frames) of target tracking of objects in images are determined in static and dynamic (moving environment or images) and combination (augmentation) of such image segments (regions) of such targets are applied or not is determined). (Koivisto, [0053], Fig. 1B, discloses an example process 118 for detecting objects and determining corresponding detection confidence scores, in accordance with some embodiments of the present disclosure. The object detector 106 may be configured to analyze sensor data, such as image data, received from the communications manager 104 and generate detected object data that is representative of detected objects captured in the sensor data. The detected object clusterer 108 may be configured to generate or determine one or more clusters of the detected objects based at least in part on the detected object data. The feature determiner 110 may be configured to generate or determine features of the clusters for use as inputs to the confidence score generator 112. The confidence score generator 112 may be configured to compute confidence scores for one or more of the clusters based at least in part on the inputs. The object tracker 114 may be configured to track objects and/or detected objects across frames (e.g., video frames) and/or images, such as in a time-domain. The detected object filter 116 of FIG. 1A may include one or more of a detected object filter 116A configured to filter detected objects from the detected object data, or a detected object filter 116B configured to filter clusters from the clusters of detected objects; confidence score of features in image regions are determined in target tracking and afterwards the object detection is performed for image frames). Additionally, the rational and motivation to combine the references Rhee and Koivisto as applied in rejection of claim 1 apply to this claim.

Claim 20 recite apparatus with elements corresponding to the method steps  recited in Claim 1. Therefore, the recited elements of the apparatus Claim 20 are mapped to the proposed combination in the same manner as the corresponding steps of Claim 1. Additionally, the rationale and motivation to combine the Rhee and Koivisto references presented in rejection of Claim 1, apply to this claim.
Furthermore, the combination of Rhee and Koivisto further discloses A target tracking apparatus comprising: one or more processors; storage hardware storing instructions configured to, when executed by the one or more processors, cause the one or more processors (Koivisto, [0201-0202], discloses [0201] The client device(s) 1420 may include at least some of the components, features, and functionality of the example computing device 1600 described herein with respect to FIG. 16. By way of example and not limitation, a client device 1420 may be embodied as a personal computer (PC), a laptop computer, a mobile device, a smartphone, a tablet computer, a smart watch, a wearable computer, a personal digital assistant (PDA), an MP3 player, a global positioning system (GPS) or device, a video player, a handheld communications device, a gaming device or system, an entertainment system, a vehicle computer system, an embedded system controller, a remote control, an appliance, a consumer electronic device, a workstation, any combination of these delineated devices, or any other suitable device. In any example, at least one client device 1420 may be part of a vehicle, such as the vehicle 1500 of Figs. 15A-15D, described in further detail herein; client device(s) 1420 may include one or more processors, and one or more computer-readable media. The computer-readable media may include computer-readable instructions executable by the one or more processors. The instructions may, when executed by the one or more processors, cause the one or more processors to perform any combination and/or portion of the methods described herein and/or implement any portion of the functionality of the object detection system 100 of Fig. 1A). 

Allowable Subject Matter
Claims 9-12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:

US 20200380701 A1 ([0263] An illustrative 
training process that may be used in one or more embodiments 
is to have one or more people move through a store, and to sample projected camera images at fixed time intervals (for example every one second). The sampled images may be labeled and processed as illustrated in FIG. 41. On each training iteration a random subset of the cameras in an area may be selected to be used as inputs. The plane projections may also be performed on randomly selected planes parallel to the floor within some height range above the store. In addition, random data augmentation may be performed to generate additional samples; for example, synthesized images may be generated to deform the shapes or colors of persons, or to move their images to different areas of the store (and to move the labeled positions accordingly))

US 20190222776 A1 (Abstract, Systems and methods are provided for identifying one or more portions of images or video frames that are appropriate for augmented overlay of advertisement or other visual content, and augmenting the image or video data to include such additional visual content. Identifying the portions appropriate for overlay or augmentation may include employing one or more machine learning models configured to identify objects or regions of an image or video frame that meet criteria for visual augmentation. The pose of the augmented content presented within the image or video frame may correspond to the pose of one or more real-world objects in the real world scene captured within the original image or video)


Any inquiry concerning this communication or earlier communications from the examiner should be directed to PINALBEN V PATEL whose telephone number is (571)270-5872. The examiner can normally be reached M-F: 10am - 8pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Wills-burns Chineyere can be reached at 571-272-9752. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Pinalben Patel/Examiner, Art Unit 2673