Patent Application 17601225 - DETERMINING POSITION OF AN IMAGE CAPTURE DEVICE

Title: DETERMINING POSITION OF AN IMAGE CAPTURE DEVICE
Application Information

Invention Title: DETERMINING POSITION OF AN IMAGE CAPTURE DEVICE
Application Number: 17601225
Submission Date: 2025-04-10T00:00:00.000Z
Effective Filing Date: 2021-10-04T00:00:00.000Z
Filing Date: 2021-10-04T00:00:00.000Z
National Class: 382
National Sub-Class: 103000
Examiner Employee Number: 98268
Art Unit: 2667
Tech Center: 2600
Rejection Summary

102 Rejections: 0
103 Rejections: 3
Cited Patents

The following patents were cited in the rejection:
US 7393265🔗
Office Action Text



    DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on March 13th, 2025, has been entered and made of record.

Response to Amendment
This is in response to Applicant’s Arguments/Remarks filed on March 13th, 2025, which has been entered and made of record.

Response to Arguments
Claim Rejections - 35 USC § 103
Applicant’s arguments regarding the current claim(s) have been fully considered. But, the arguments/remarks are directed to the claims as amended and so are believed to be answered by and therefore moot in view of the new grounds of rejection presented below.

Regarding the amended feature of the independent claims requiring a plurality of predefined categories evaluated separately for discrepancies, examiner would like to clarify how the newly cited prior art in the new grounds of rejection is being used below:
Without importing limitations from applicant’s specification into the claim, but for mere understanding applicant describes in at least one embodiment in [0070]: “Optionally, the simplified site image is a depth map wherein the plurality of predefined categories comprises a depth value indicating a distance between the camera a respective location within the imaged building site corresponding to the pixel”. Hence, it appears that the plurality of predefined categories can encompass a depth map where each depth value indicates a category.

Michel generally teaches localization of a robot in a building by comparing a captured image edges to edges of known reference images with known poses. Further Michel teaches in [0006] and [0030] that some implementations may include a camera with a depth channel and thereby a 2.5D image. Sun generally teaches estimating pose by comparing a captured 2.5D depth image to 2.5D depth images with known poses derived from a model, that have been stored in a database in at least [0014]-[0015]. One of ordinary skill in the art could have substituted Sun’s teachings into Michel, where instead of comparing edges taught by Michel, one could compare depth values for pose estimation as taught by Sun for the same purpose of localization by comparison with known reference images. See the rejection below.

Status of Claims
Claims 1, 4, 6-7, 9-13, 16-19, 21-24 are pending. Claim(s) 1, 6, and 13 were amended. Claim(s) 2-3, 5, 8, 14-15, 20 were canceled. No new claim(s) were added. 1, 4, 6-7, 9-13, 16-19, 21-24 are considered below.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 4, 7, 9-11, 13, 16-17, 19, 21-23, are rejected under 35 U.S.C. 103 as being unpatentable over Michel (US-20170098309-A1) in view of Sun (US 20200057778 A1), Kahle et al. (US-10347008-B2, hereinafter Kahle), and Kropp et al. (Christopher Kropp, Interior construction state recognition with 4D registered image sequences, Automation in Construction, 2018, Vol. 86, pp.11-32 hereinafter Kropp).
.
Regarding Claim 1, representative of claims 13 and 17,  Michel teaches a locator module for determining a position of an image capture device (ICD) within a building site, the module comprising: a processor that operates, based on executing a set of instruction stored in a memory to ([0070]: Processor(s) 402 are in communication with memory): 
receive an image captured in the building site by the ICD ([0021]: capturing a camera image by a camera of a robot in an environment, [0028]: robot 130 is mobile and has access to one or more portions of an indoor environment such as a building); 
receive an initial ICD position of the ICD at the time the image was captured ([0060]: the system identifies an estimated pose of the camera in the environment); 
generate a plurality of proposed ICD positions based on the initial ICD position ([0006] The estimated pose is used to render, from a three-dimensional model of the environment, a model image, [0009] In some implementations… one or more additional model images of the environment may be rendered from the 3D model—each with points of view from different candidate poses); and 

generate a plurality of expected site images by generating a respective expected site image for each one of the plurality of proposed ICD positions, each generated expected site image is a three dimensional (3D) representation of the building site as would have been captured by the ICD from the respective proposed ICD position, based on a model of the building site ([0053]: The pose determination engine 156 determines a current pose of the camera 105 of the robot based on comparing the camera image edges to the model image edges as described herein, [0028]: an example three-dimensional (“3D”) model of an environment 170. The robot 130 is mobile and has access to one or more portions of an indoor environment such as a building, [0009] In some implementations… one or more additional model images of the environment may be rendered from the 3D model—each with points of view from different candidate poses) 




determine an optimized ICD position ([0053] The pose determination engine 156 determines a current pose of the camera 105 of the robot based on comparing the camera image edges to the model image edges as described herein) 

Michel does not explicitly teach generate at least one simplified site image based on the captured image, by assigning to each pixel of the captured image one of a plurality of predefined categories; for each pixel of each of the plurality of expected site images, assign one of said plurality of predefined categories; 
for each one of the plurality of expected site images:
evaluate, separately, for each one of the plurality of predefined categories a discrepancy according to a discrepancy between categories assigned to each pixel in a pair of a pixel of the simplified site image and a corresponding pixel in a respective expected site image, to compute a plurality of categories discrepancies; and
combine the computed plurality of categories discrepancies to a combined measure of discrepancy; and
determine an optimized ICD position by selecting from the plurality of proposed ICD positions, a proposed ICD position with the most reduced combined measure of discrepancy.

Sun teaches generate at least one simplified site image based on the captured image, by assigning to each pixel of the captured image one of a plurality of predefined categories ([0086]: Using orthographic projections, the image processor 16 outputs labels specific to pixels or locations of the object displayed in a photograph or video from the 2.5D data. The pose is determined by matching with different poses stored in the memory. Examiner interpreting predefined categories as the depth values of the image);
for each pixel of each of the plurality of expected site images, assign one of said plurality of predefined categories ([0007]: depth images of the camera poses of the database, [0087]: depth measurements of the orthographic projections from the 3D data. Examiner interpreting predefined categories as the depth values of the image); 
for each one of the plurality of expected site images ([0089]: The orthographic projection from the 2.5D data is compared to any number of the other orthographic projections in the database):
evaluate, separately, for each one of the plurality of predefined categories a discrepancy according to a discrepancy between categories assigned to each pixel in a pair of a pixel of the simplified site image and a corresponding pixel in a respective expected site image, to compute a plurality of categories discrepancies ([0087]: depth measurements from the 2.5D data may be compared with the depth measurements of the orthographic projections from the 3D data, [0088]: The size of the pixels scales to be the same so that the depth measurements correspond to the same pixel or area size, [0089] Once scaled, the orthographic projection from the 2.5D data is matched with one or more orthographic projections from the 3D data. Examiner notes pixel scaling size prior to matching indicates a pixel by pixel comparison between categories/depths); and
combine the computed plurality of categories discrepancies to a combined measure of discrepancy ([0089]: The orthographic projection from the 2.5D data is compared to any number of the other orthographic projections in the database. A normalized cross-correlation, minimum sum of absolute differences, or other measure of similarity may be used to match based on the comparison); and
determine an optimized ICD position by selecting from the plurality of proposed ICD positions, a proposed ICD position with the most reduced combined measure of discrepancy ([0002] In another approach, the depth camera image is compared to images in the database. The images in the database are of the object from different poses. The comparison finds the closest image of the database to the depth camera image, providing the pose, [0089]: One or more matches may be found. Alternatively, a best one or other number of matches are found).

Neither Michel nor Sun explicitly teach generating expected site images based on a state of a progress of a building project of said building site and a presumed level of progress of the building project according to a state of a flow model associated with the building project at the time the image was captured. 

	Kahle teaches based on a state of a progress of a building project of said building site  ([0002]: disclosure relates to systems and methods that facilitate positioning points and objects in a work space or at a worksite, such as for example at a construction site,  [0038]: process 700 for determining a position and/or orientation of a camera at a worksite is illustrated. Process 700 begins in step 704 with retrieving a model of the worksite. The model is a three-dimensional model. In some embodiments, the model is a Building Information Modeling. Examiner notes a worksite indicates a site where construction or repair is in progress, hence the 3D model of a worksite would reflect a state of progress. Further, Examiner notes building information modelling (BIM) is a known building project tool).

	Khale does not explicitly teach based on a presumed level of progress of the building project according to a state of a flow model associated with the building project at the time the image was captured.

	Kropp teaches based on a presumed level of progress of the building project according to a state of a flow model associated with the building project at the time the image was captured ([Section 3.1.4]: makes use of BIM information to accurately align the camera pose to the building model. The currently expected state of the building model is analyzed, [Section 3, paragraph 1]: the method considers extensive access to information present in the 4D BIM model, [Introduction, paragraph 1]: application of 4D building models by linking activities of a schedule with corresponding building elements is very common. Based on 4D building models the construction sequence can be analyzed and progress monitoring can be supported).
	
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have modified Michel by substituting the image and edge detection feature comparison for the depth image comparison taught by Sun. Doing so would provide a predictable result of comparing a captured image with a set of known images with respective known poses, to determine the pose of the captured image. Further, it would have been obvious to one of ordinary skill in the art to have modified the Michel and Sun combination to include the teachings of Kahle. Michel, Sun, and Kahle all involve determining a position of a camera by comparing a captured image to an image derived from a model. The model in both Michel’s and Sun’s disclosure may include a building, however, neither explicitly state whether or not the 3D representation of the building site, used to generate the corresponding expected site image, is generated to reflect a state of a progress of a building project of said building site. Kahle discloses using a model that is a Building Information Modelling (BIM) file of a worksite in the refinement of a pose estimation process of which a worksite is of a building being finished, hence Kahle teaches the 3D representation of the building site, used to generate the corresponding expected site image, is generated to reflect a state of a progress of a building project of said building site. Modifying the Michel and Sun combination to include the teachings of Kahle would enable a pose estimation method to be used to position tools in a construction process more quickly, thus reducing construction cost and time (Kahle, [0019]). Further, it would have been obvious to one of ordinary skill in the art to substitute the teachings of the Michel, Sun, and Kahle combination of a BIM with a 4D BIM taught by Kropp to include a state of a flow model/schedule. Using information regarding a current state of a building in relation to a schedule would improve the accuracy of the comparison of features involved in the pose estimation of the camera, and thereby improve the accuracy of pose estimation.

Regarding Claim 4, representative of claim 16, the Michel, Sun, Kahle, and Kropp combination (hereinafter the Michel combination) teaches the locator module according to claim 1. In addition, Michel teaches wherein the expected site image comprises a computer-rendered image of the building site as would be captured by a virtual camera having a location and an orientation within the 3D representation based on the proposed ICD position ([0006]: render, from a three-dimensional model of the environment, a model image … the rendered model image is a rendered image of the model with the point of view of the rendered image having the same location and orientation as the estimated pose).

Regarding Claim 7, representative of claim 19, the Michel combination teaches the locator module according to claim 6. In addition, Sun teaches wherein the at least one simplified site image comprises one or more of: a corner map, a depth map ([0066]:  a depth from the camera and depth sensor 12 to each pixel or groups of pixels is captured), a boundary map, or a semantically segmented image

Regarding Claim 9, representative of claim 21, the Michel combination teaches the locator module according to claim 1. In addition, Michel teaches wherein the optimized ICD position is selected responsive to reducing a measure of discrepancy ([0007]: the current pose of the camera may be determined by modifying the estimated pose in view of the differences between…camera image (taken by the camera at its actual pose) …and the model image (rendered from the point of view of the estimated pose).) between a proposed ICD position ([0007]: the model image (rendered from the point of view of the estimated pose)) of the plurality of proposed ICD positions ([0009]: one or more additional model images of the environment may be rendered from the 3D model—each with points of view from different candidate poses) and a calculated ICD position ([0011]: estimated pose of the camera in the environment) based on one or more of: a previously determined ICD position and inertial measurement data responsive to movement of the ICD ([0047]: the estimated pose may be determined based on modifying the immediately preceding current pose of the robot's camera based on sensor data…from an inertial measurement unit).

Regarding Claim 10, representative of claim 22, the Michel combination teaches the locator module according to claim 9. In addition, Michel teaches wherein the previously determined ICD position is a chronologically earlier position of the ICD ([0013]: estimated pose of the camera in the environment is determined based on modifying the immediately preceding pose based on sensor data from one or more additional sensors of the robot, such as an inertial measurement unit sensor, [0056]: In some implementations, the example in FIG. 2 may be performed in real-time).

Regarding Claim 11, representative of claim 23, the Michel combination teaches the locator module according to claim 9. In addition, Michel teaches wherein the previously determined ICD position is a position of the ICD in a chronologically later time than a position of the ICD when said image was captured ([0056]: Multiple iterations of the example of FIG. 2 above may be performed, each time using a newly captured camera image 101 from the robot's camera and using an “estimated pose” that is determined by engine 158 based on the immediately preceding determined current pose of the camera 105. [0056]: In some implementations, the example in FIG. 2 may be performed in real-time, examiner notes the implication that some implementations of the method in Fig. 2 may not be performed in real-time, hence reading on the limitation chronologically later).

Claim(s) 6 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Michel (US-20170098309-A1), Sun (US 20200057778 A1), Kahle (US-10347008-B2), and Kropp (Christopher Kropp, Interior construction state recognition with 4D registered image sequences, Automation in Construction, 2018, Vol. 86, pp.11-32) in view of Ochotorena (C. A. Ochotorena, C. N. Ochotorena and E. Dadios, "Gradient-guided filtering of depth maps using deep neural networks," 2015 International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM), Cebu, Philippines, 2015, pp. 1-8, doi: 10.1109/HNICEM.2015.7393265).
Regarding Claim 6, representative of claim 18, the Michel combination teaches the locator module according to claim 1. However, none explicitly teach the remaining limitation of Claim 6. Ochotorena teaches wherein the at least one simplified site image is generated from the captured image based on evaluation of the captured image with a neural network ([abstract]: propose a filter that is specifically tuned to operate on noisy depth maps, [introduction, paragraph 3]:  filter derived from training neural networks).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have modified the Michel combination to include the teachings of Ochotorena by including a preprocessing filtering step for a depth map done by a trained neural network. Doing so would improve the quality of the image prior to comparison. 

Claim(s) 12 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Michel (US-20170098309-A1), Sun (US 20200057778 A1), Kahle (US-10347008-B2), and Kropp (Christopher Kropp, Interior construction state recognition with 4D registered image sequences, Automation in Construction, 2018, Vol. 86, pp.11-32) in view of Newcombe et al. (US-20120194644-A1, hereinafter Newcombe).

Regarding Claim 12, representative of claim 24, the Michel combination teaches the locator module according to claim 9. In addition, Michel teaches wherein the measure of discrepancy is based on the proposed ICD position not exceeding a physical limit for a possible ICD position ([0007]: model image rendered from…estimated pose, [0013]: estimated pose …based on sensor data…such as inertial measurement unit), the physical limit being based on the previously determined ICD position, a time elapsed between the initial ICD position the previously determined ICD position ([0013]: estimated pose of the camera in the environment is determined based on modifying the immediately preceding pose based on sensor data from one or more additional sensors of the robot, such as an inertial measurement unit sensor).
None of Michel, Sun, Kahle, or Kropp teaches and a maximum speed for a vehicle onto which the ICD was mounted.
However, Newcombe teaches and a maximum speed for a vehicle onto which the ICD was mounted ([0050]: the mobile environment sensor 300 moves with a random walk with a maximum linear velocity).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the present invention to have modified Michel combination to include the teachings of Newcombe. Doing so would help to detect tracking failures of a mobile camera (Newcombe [0050]) improving the accuracy of estimation of a camera position.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JANICE VAZ whose telephone number is (703)756-4685. The examiner can normally be reached Monday-Friday 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571) 272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JANICE E. VAZ/Examiner, Art Unit 2667   

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667