Patent Application 15192992 Rejection Details

Title: SCATTER TO GATHER OPERATION
Application Information

Invention Title: SCATTER TO GATHER OPERATION
Application Number: 15192992
Submission Date: 2025-04-09T00:00:00.000Z
Effective Filing Date: 2016-06-24T00:00:00.000Z
Filing Date: 2016-06-24T00:00:00.000Z
National Class: 712
National Sub-Class: 225000
Examiner Employee Number: 93560
Art Unit: 2182
Tech Center: 2100
Rejection Summary

102 Rejections: 0
103 Rejections: 2
Cited Patents

The following patents were cited in the rejection:
Office Action Text



    DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This office action is in response to the amendment filed on 12/24/2024. Claims 9, 12-13, 20, 24, 26-28, and 30-34 are pending. Claims 32-33 are amended. Claims 1-8, 10-11, 14-19, 21-23, 25, and 29 are canceled. 

Response to Arguments
Applicant's arguments filed 12/24/2024 have been fully considered but they are not persuasive. 
Applicant submits:
“First, each of claims 9 and 20 recites that "each memory-to-memory copy instruction in the sequence copies one of the two or more data elements from a corresponding source address in the list of two or more source addresses provided by the processor to the corresponding destination address in the list of two or more destination addresses provided by the processor." Thus, the plain language of the claim recites that each memory-to-memory copy is from a source address in a source address list that was provided by the processor to a destination address in a destination address list that was provided by the processor, and NOT using a source or destination of address that was calculated based on received parameters, such as is disclosed by Holt and Dunlap.
Second, the Examiner's statement that "the first element of each block that is copied by each instruction is mapped to the claimed 'one of the two or more data elements' which are copied from a corresponding source address in the source list to a corresponding destination address in the destination list" reveals that the Examiner clearly understands that the second and subsequent elements of each block copied by the combination of Holt and Dunlap would NOT use a source address that was provided by the processor, would NOT use a destination address in a destination address list that was provided by the processor, or both. Instead, it would use a source address, destination address, or both, that was NOT present in the lists provided by the processor but was instead algorithmically calculated.” (Remarks, pages 9-10)
However, this argument is not persuasive because it does not consider that Holt teaches a host system (i.e., a processor) that provides a scatter/gather list to a host adapter, see col 6 lines 19-42, and DMA 100 in the host adapter performs DMA transfers in accordance with the scatter/gather list, see col 8 lines 23-29, which includes copying an element from a corresponding source address in the list provided by the host system/processor to a corresponding destination address in the list provided by the host system/processor. Dunlap is relied on to modify holt to perform each transfer from a source address to a destination address using a memory-to-memory copy instruction.  Applicant’s argument focuses on Holt copying other elements after it copies an element from a source address in the list to a corresponding destination address in the list, however, these other copies are not relied on in the rejection- the rejection only maps the elements that are copied from each source address in the list to a corresponding destination address in the list to the claimed data elements. Each of these elements are copied (by a memory-to-memory copy instruction in the combination) from a corresponding source address in the list of two or more source addresses provided by the processor to the corresponding destination address in the list of two or more destination addresses provided by the processor, as required by the claims. 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 32-34 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Claim 32 recites:
providing, by a processor, per-element addressing of two or more data elements, comprising a list of two or more source addresses, wherein each source address corresponds to one of the two or more data elements, wherein the two or more source addresses are orthogonal or independent and non-contiguous with one or more data elements at one or more other source addresses arranged between two or more data elements at the two or more source addresses;
executing, by a transaction sequencer separate from the processor, a sequence of memory- to-memory copy instructions, the sequence consisting of two or more memory-to-memory copy instructions, wherein each memory-to-memory copy instruction in the sequence copies one of the two or more data elements from a corresponding source address in the list of two or more source addresses provided by the processor to a buffer within a memory; and 
copying, by the transaction sequencer separate from the processor, data from the buffer within the memory to a register in the processor.
However, the specification does not describe executing memory-to-memory copy instructions to copy data elements from source addresses to a buffer within the memory. While [0027] describes instructions that copy data elements from source addresses to a buffer, it does not disclose that the instructions are memory-to-memory copy instructions. The specification only mentions “memory-to-memory copy operations” in [0031] in the embodiment that copies data elements from source addresses to destination addresses, and, as explained in the previous 112(a) rejection of claim 32, this embodiment is different from the embodiment that copies data elements from source addresses to a buffer. 
Examiner further notes that the specification does not appear to describe “per-element addressing of two or more data elements comprising a list of two or more source elements” with respect to the embodiment that copies data elements from source addresses to a buffer. The only portion of the specification that mentions the terms “per-element addressing” and “list” is [0021], which is directed to the embodiment that copies elements from source addresses to destination addresses. 
Claim 33-34 are rejected based on their dependence from a rejected base claim. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 9, 12-13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Holt US 6,105,080 in view of Dunlap US 8,432,409.
Regarding claim 9, Holt teaches:
9. A method of performing a memory operation, the method comprising: 
providing, by a processor (Fig. 1 104), per-element addressing of two or more data elements, comprising a list of two or more source addresses and a list of a corresponding two or more destination addresses, wherein each source address and corresponding destination address corresponds to one of the two or more data elements (col 6 lines 19-42 and col 8 lines 23-29: a scatter/gather list is provided by 104 which includes source addresses and destination addresses of a destination memory for data blocks, see also col 2 lines 14-26, where the first unit of data transferred for each data block is an element, see Fig. 3 304, and each source and destination address corresponds to the first element of the block being transferred from the source address to the destination address; since each entry of the scatter/gather list has a source address and destination address, see also col 9 lines 35-39, the set of source addresses across all entries in the scatter/gather list is a list of source addresses and the set of destination addresses across all entries in the scatter/gather list is a list of destination addresses, see also col 3 52-54 describing that separate source and destination lists may be provided), wherein the two or more source addresses are orthogonal or independent and non-contiguous with one or more data elements at one or more other source addresses arranged between two or more data elements at the two or more source addresses (col 2 lines 14-26 and col 6 lines 19-42: the source data for reads and writes may be stored in non-contiguous blocks, i.e. the source addresses of the non-contiguous blocks are non-contiguous, where two of the non-contiguous blocks will have addresses and data between them since the blocks are non-contiguous; col 9 lines 31-38: the source addresses of the non-contiguous blocks are independent since each source address is specified in separate entries of a list), and wherein the two or more destination addresses are orthogonal and non-contiguous or independent and non-contiguous in memory (col 2 lines 14-26 and col 6 lines 19-42: the destination locations/addresses of reads and writes may be non-contiguous in the destination memory, and the destination addresses are also independent since they are specified in separate entries of a list, see also col 9 lines 31-38); and 
copying, by a transaction sequencer separate from the processor (col 6 lines 14-18 and lines 26-28: DMA 100, i.e., a transaction sequencer, receives the scatter/gather list from 104, and copies the data elements, see also Fig. 1 showing DMA 100 separate from processor 104), one of the two or more data elements from a corresponding source address in the list of two or more source addresses provided by the processor to the corresponding destination address in the list of two or more destination addresses provided by the processor (col 8 lines 43-46: the first data element of each block is transferred from a source address to the corresponding destination address, as specified within the list entry, within the destination memory since a first element transfer from a source to destination according to a first entry of the scatter/gather list happens when processing first enters 304 to transfer a first block and a second element transfer from a second source to a second destination according to second entry of the scatter/gather list happens when processing goes from 318 back to 304 to transfer a second block, see col 9 lines 35-39), without an intermediate copy to a register in the processor (col 2 lines 5-13: the transfers are directly between the host system memory and local memory, which indicates there is no intermediate copy to a register in the host system).
	Holt does not explicitly teach:
executing a sequence of memory-to-memory copy instructions, the sequence consisting of two or more memory-to-memory copy instructions, wherein each memory-to-memory copy instruction in the sequence copies one of the data elements from a corresponding source address to the corresponding destination address.
	However, Dunlap teaches a strided block transfer instruction that transfers a data block between two memories, see col 3 lines 52-57.
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the DMA controller of Holt to use the block transfer instruction of Dunlap to transfer each block from its corresponding source to its corresponding destination. In this combination, Holt will execute a sequence of instructions consisting of two or more block transfer instructions, where each block transfer instruction  in the sequence will copy a corresponding block from a source address to a destination address, where the first element of each block that is copied by each instruction is mapped to the claimed “one of the two or more data elements” which are copied from a corresponding source address in the source list to a corresponding destination address in the destination list. One of ordinary skill in the art would have been motivated to make this modification to improve efficiency (Dunlap col 3 lines 52-57) and because executing instructions to transfer data is a known technique on the known device of a processor for controlling a hardware to transfer data and would yield the predictable result of increasing control of the hardware, since an instruction allows a user to specify operations for the hardware to perform (see MPEP 2143 example D). 

	Regarding claim 12, Holt in view of Dunlap teaches:
12. The method of claim 9, wherein copying the two or more data elements from the two or more source addresses to the corresponding two or more destination addresses within the memory comprises executing a single instruction multiple data (SIMD) copy instruction (the block transfer instruction of Dunlap, see col 3 lines 52-57, is a SIMD copy instruction since it is a single instruction that copies multiple data of a block from a source memory to a destination memory, where copying the two or more data elements (which would be the first data elements of multiple blocks in the current mapping) comprises executing a SIMD copy/ block transfer instruction since a block transfer instruction is executed to copy one of the blocks; Examiner notes that the BRI of this limitation allows for multiple SIMD copy instructions to be executed to copy the two or more data elements since it does not require that only one SIMD copy instruction is executed to copy the two or more data elements).

	Regarding claim 13, Holt in view of Dunlap teaches:
13. The method of claim 12, comprising executing the SIMD copy instruction in a background mode without direction by the processor (Holt col 5 lines 40-41 and col 8 lines 19-22: the transfer is performed by the DMA controller after the scatter/gather list is received, i.e., in a background mode without direction by the host system/processor). 

	Regarding claim 20, Holt teaches
20. An apparatus comprising: 
a processor (Fig. 1 104) configured to provide per-element addressing of two or more data elements, comprising a list of two or more source addresses and a list of a corresponding two or more destination addresses, wherein each source address and corresponding destination address corresponds to one of the two or more data elements (col 6 lines 19-42 and col 8 lines 23-29: a scatter/gather list is provided by 104 which includes source addresses and destination addresses of a destination memory for data blocks, see also col 2 lines 14-26, where the first unit of data transferred for each data block is an element, see Fig. 3 304, and each source and destination address corresponds to the first element of the block being transferred from the source address to the destination address; since each entry of the scatter/gather list has a source address and destination address, see also col 9 lines 35-39, the set of source addresses across all entries in the scatter/gather list is a list of source addresses and the set of destination addresses across all entries in the scatter/gather list is a list of destination addresses, see also col 3 52-54 describing that separate source and destination lists may be provided), wherein the two or more source addresses are orthogonal or independent and non- contiguous with one or more data elements at one or more other source addresses arranged between two or more data elements at the two or more source addresses (col 2 lines 14-26 and col 6 lines 19-42: the source data for reads and writes may be stored in non-contiguous blocks, i.e. the source addresses of the non-contiguous blocks are non-contiguous, where two of the non-contiguous blocks will have addresses and data between them since the blocks are non-contiguous; col 9 lines 31-38: the source addresses of the non-contiguous blocks are independent since each source address is specified in separate entries of a list), and wherein the two or more destination addresses are orthogonal and non-contiguous or independent and non-contiguous in memory (col 2 lines 14-26 and col 6 lines 19-42: the destination locations/addresses of reads and writes may be non-contiguous in the destination memory, and the destination addresses are also independent since they are specified in separate entries of a list, see also col 9 lines 31-38); and 
logic circuitry separate from the processor and configured to receive the per-element addressing of the two or more data elements from the processor and to copy (col 6 lines 14-18 and lines 26-28: DMA 100, i.e., logic circuitry, receives the scatter/gather list from 104, and copies the data elements, see also Fig. 1 showing DMA 100 separate from processor 104) one of the two or more data elements from a corresponding source address in the list of two or more source addresses provided by the processor to the corresponding destination address in the list of two or more destination addresses provided by the processor (col 8 lines 43-46: the first data element of each block is transferred from a source address to the corresponding destination address, as specified within the list entry, within the destination memory since a first element transfer from a source to destination according to a first entry of the scatter/gather list happens when processing first enters 304 to transfer a first block and a second element transfer from a second source to a second destination according to second entry of the scatter/gather list happens when processing goes from 318 back to 304 to transfer a second block, see col 9 lines 35-39), without an intermediate copy to a register in the processor (col 2 lines 5-13: the transfers are directly between the host system memory and local memory, which indicates there is no intermediate copy to a register in the host system).
	Holt does not teach:
logic circuitry configured to: execute a sequence of memory-to-memory copy instructions, the sequence consisting of two or more memory-to-memory copy instructions, wherein each memory-to-memory copy instruction in the sequence copies one of the data elements from a corresponding source address to the corresponding destination address.
	However, Dunlap teaches a strided block transfer instruction that transfers a data block between two memories, see col 3 lines 52-57.
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the DMA controller of Holt to use the block transfer instruction of Dunlap to transfer each block from its corresponding source to its corresponding destination. In this combination, Holt will execute a sequence of instructions consisting of two or more block transfer instructions, where each block transfer instruction  in the sequence will copy a corresponding block from a source address to a destination address, where the first element of each block that is copied by each instruction is mapped to the claimed “one of the two or more data elements” which are copied from a corresponding source address in the source list to a corresponding destination address in the destination list. One of ordinary skill in the art would have been motivated to make this modification to improve efficiency (Dunlap col 3 lines 52-57) and because executing instructions to transfer data is a known technique on the known device of a processor for controlling a hardware to transfer data and would yield the predictable result of increasing control of the hardware, since an instruction allows a user to specify operations for the hardware to perform (see MPEP 2143 example D). 

Claims 24, 26-28, and 30-31 are rejected under 35 U.S.C. 103 as being unpatentable over Holt US 6,105,080 in view of Dunlap US 8,432,409 and Agarwal US 5,887,183.
	Regarding claim 24, Holt in view of Dunlap teaches:
	24. The method of claim 9, 
	Holt in view of Dunlap does not teach:
wherein the two or more memory-to-memory copy instructions are executed according to a relaxed memory ordering. 
	However, Agarwal teaches copy instructions are executed according to a relaxed memory ordering (col 8 lines 46-65 describes that the load/copy instructions may be executed out-of-order, which is according to a relaxed memory ordering since out-of-order execution “relaxes” ordering constraints relative to in-order execution). 
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Holt in view Dunlap to execute the copy/load instructions out-of-order as taught by Agarwal. One of ordinary skill in the art would have been motivated to make this modification to enable parallel execution of the load/copy instructions (Agarwal col 8 lines 49-50), which would speed up processing time. 

	Regarding claim 26, Holt in view of Dunlap teaches:
	26. The method of claim 9, 
	Holt in view of Dunlap does not teach:
wherein the two or more memory-to-memory copy instructions are completed out-of-order.
	However, Agarwal teaches copy instructions are completed out-of-order (col 8 lines 46-65 describes that the load/copy instructions may be executed out-of-order, which indicates that the instructions may also complete out-of-order). 
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Holt in view Dunlap to execute/complete the copy/load instructions out-of-order as taught by Agarwal. One of ordinary skill in the art would have been motivated to make this modification to enable parallel execution of the load/copy instructions (Agarwal col 8 lines 49-50), which would speed up processing time. 

	Regarding claim 27, Holt in view of Dunlap teaches:
	27. The method of claim 9, 
	Holt in view of Dunlap does not teach:
wherein the two or more memory-to-memory copy instructions are executed according to a relaxed memory ordering and are completed out-of-order. 
However, Agarwal teaches copy instructions are executed according to a relaxed memory ordering and are completed out-of-order (col 8 lines 46-65 describes that the load/copy instructions may be executed out-of-order, which is according to a relaxed memory ordering since out-of-order execution “relaxes” ordering constraints relative to in-order execution, where out-of-order execution indicates that the instructions may complete out-of-order). 
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Holt in view Dunlap to execute/complete the copy/load instructions out-of-order as taught by Agarwal. One of ordinary skill in the art would have been motivated to make this modification to enable parallel execution of the load/copy instructions (Agarwal col 8 lines 49-50), which would speed up processing time. 

	Regarding claim 28, Holt in view of Dunlap teaches:
	28. The apparatus of claim 20, 
	Holt in view of Dunlap does not teach:
wherein the logic circuitry configured to execute the two or more memory-to-memory copy instructions is configured to execute the memory-to-memory copy instructions according to a relaxed memory ordering. 
	However, Agarwal teaches executing copy instructions according to a relaxed memory ordering (col 8 lines 46-65 describes that the load/copy instructions may be executed out-of-order, which is according to a relaxed memory ordering since out-of-order execution “relaxes” ordering constraints relative to in-order execution). 
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Holt in view Dunlap to execute the copy/load instructions out-of-order as taught by Agarwal. One of ordinary skill in the art would have been motivated to make this modification to enable parallel execution of the load/copy instructions (Agarwal col 8 lines 49-50), which would speed up processing time. 

	Regarding claim 30, Holt in view of Dunlap teaches:
	30. The apparatus of claim 20, 
	Holt in view of Dunlap does not teach:
wherein the logic circuitry configured to execute the two or more memory-to-memory copy instructions is configured to allow the memory-to-memory copy instructions to complete out-of-order. 
	However, Agarwal teaches allowing copy instructions to complete out-of-order (col 8 lines 46-65 describes that the load/copy instructions may be executed out-of-order, which indicates that the instructions may complete out-of-order). 
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Holt in view Dunlap to execute/complete the copy/load instructions out-of-order as taught by Agarwal. One of ordinary skill in the art would have been motivated to make this modification to enable parallel execution of the load/copy instructions (Agarwal col 8 lines 49-50), which would speed up processing time. 

Regarding claim 31, Holt in view of Dunlap teaches:
	31. The apparatus of claim 20, 
	Holt in view of Dunlap does not teach:
wherein the logic circuitry configured to execute the two or more memory-to-memory copy instructions is configured to execute the memory-to-memory copy instructions according to a relaxed memory ordering and to allow the memory-to-memory copy instructions to complete out-of-order. 
However, Agarwal teaches executing copy instructions according to a relaxed memory ordering and allowing the copy instructions to complete out-of-order (col 8 lines 46-65 describes that the load/copy instructions may be executed out-of-order, which is according to a relaxed memory ordering since out-of-order execution “relaxes” ordering constraints relative to in-order execution, where out-of-order execution indicates that the instructions may complete out-of-order). 
	It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Holt in view Dunlap to execute/complete the copy/load instructions out-of-order as taught by Agarwal. One of ordinary skill in the art would have been motivated to make this modification to enable parallel execution of the load/copy instructions (Agarwal col 8 lines 49-50), which would speed up processing time. 

Prior Art Considerations
	While no prior art rejection is given for claims 32-34, these claims are rejected under 112(a) and are thus not allowable at the current point. The prior art considerations are similar to the prior art considerations given in the Non-Final Rejection dated 09/25/2024, as claim 32 recites similar limitation that were previously considered. 
	Specifically, the known prior art of record, taken alone or in combination, was not found to teach, in combination with other limitations in the claims, a processor that provides a list of source addresses and a transaction sequencer, separate from the processor, that copies data elements from a source address in the list to a buffer and from the buffer to a register, as described in claim 32. 
	The closest prior art of record for these claims were found to be Citron (US 2012/0151156, cited in 892 dated 04/09/2018) and Eichenberger US 2012/0060016 (cited in IDS dated 12/05/2017). Citron teaches a vector gather buffer VGB in a processor that handles vector gather instructions that specify distinct source addresses by gathering data from the addresses in memory and then storing the data to a vector register, see [0033]-[0034]. However, Citron does not teach the VGB being separate from the processor. Eichenberger similarly teaches a buffer in a gather unit of a processor, see Fig. 1, however, Eichenberger also does not teach the buffer being separate from the processor. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
US 2015/0347475 teaches a control unit in a processor that writes elements from memory to buffers and then from the buffers to corresponding registers, see Abstract and [0061]
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to KASIM ALLI whose telephone number is (571)270-1476. The examiner can normally be reached Monday - Friday 9am 5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Caldwell can be reached on (571) 272-3702. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ANDREW CALDWELL/Supervisory Patent Examiner, Art Unit 2182                                                                                                                                                                                                        



/KASIM ALLI/Examiner, Art Unit 2182