Patent Application 18815639 - Integrated Multimodal Neural Network Platform

Title: Integrated Multimodal Neural Network Platform for Generating Content based on Scalable Sensor Data

Application Information

Invention Title: Integrated Multimodal Neural Network Platform for Generating Content based on Scalable Sensor Data
Application Number: 18815639
Submission Date: 2025-05-13T00:00:00.000Z
Effective Filing Date: 2024-08-26T00:00:00.000Z
Filing Date: 2024-08-26T00:00:00.000Z
Examiner Employee Number: 95574
Art Unit: 2127
Tech Center: 2100

Rejection Summary

102 Rejections: 0
103 Rejections: 5

Cited Patents

No patents were cited in this rejection.

Office Action Text

Notice of Pre-AIA or AIA Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of the Claims
Claims 1, 11-14, 16, and 18-20 have been amended. Claims 1-20 are pending and have been considered by the Examiner.

Specification
The disclosure is objected to because of the following informalities: Claim 19 recites: “a large language model (LLM) having a self-attention based transformer structure” in lines 2-3. Specification paragraph [0065] in lines 6-9 states, “The LLM 150 is trained and executed on the server system 106. For example, the largest GPT-3 model uses 175 billion parameters, 96 self-attention layers, 2048 tokens window size of a mask, and 96 heads of self-attention per multi-head self-attention layer.” GPT-3 is short for Generative Pre-trained Transformer 3. Since this is the only part of the specification that discloses the LLM having a self-attention based transformer structure, Examiner recommends replacing “GPT-3” in paragraph [0065] with “Generative Pre-trained Transformer 3 (GPT-3)”. Appropriate correction is required.

Claim Objections
Claims 1, 12, 14, 17 and 20 are objected to because of the following informalities: The preamble of claim 1 recites “A method for presenting sensor data” but the final limitation recites presenting the multimodal output. Examiner suggests correcting the preamble because claim 1 does not recite a step of presenting the sensor data.
In claim 12, line 4, Examiner suggests amending the limitation “one or more information items” to recite either “the one or more information items” or “one or more of the information items”.
In claim 14, line 2, the term “having” should recite “has”.
Since all the limitations in claim 17 modify the limitations in claim 16, lines 5-6, Examiner suggests changing “further comprising instructions for” as presently recited in claim 17, line 1 to “wherein processing the sensor data comprises instructions for”.
In claim 20, line 3, “include” should recite “including”. Appropriate correction is required.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

CLAIM 1
Step 2A Prong 1: detecting one or more signature events in the sensor data is an observation mental process which can reasonably be performed in the human mind with the aid of pencil and paper.
generating one or more information items characterizing the one or more signature events detected in the sensor data, independently of the sensor types of the plurality of sensor devices is an judgement and evaluation mental process which can reasonably be performed in the human mind with the aid of pencil and paper. The claim recites an abstract idea.
Step 2A Prong 2: A computer system having one or more processors and memory amounts to generic computer components for applying the abstract ideas on a generic computer under MPEP 2106.05(f).
Streaming the sensor data from a plurality of sensor devices during a time duration, the plurality of sensor devices including at least two distinct senor types and disposed in a physical environment amounts to mere data-gathering, an insignificant extra-solution activity under MPEP 2106.05(g).
While streaming the sensor data: applying a large behavior model to process the one or more information items associated with the sensor data and generate a multimodal output associated with the sensor data in real time while the sensor data are being streamed, the multimodal output describing the one or more signature events associated with the sensor data in one of a plurality of predefined output modalities amounts to insignificant extra-solution activity under MPEP 2106.05(g).
While streaming the sensor data: presenting the multimodal output according to the one of the plurality of predefined output modalities amounts to insignificant extra-solution activity under MPEP 2106.05(g).
The additional elements as disclosed above, alone or in combination, do not integrate the abstract ideas into a practical application as they are mere insignificant extra solution activities as disclosed in combination with generic computer functions that are implemented to perform the abstract ideas disclosed above. The claim is directed to an abstract idea.
Step 2B: A computer system having one or more processors and memory amounts to generic computer components for applying the abstract ideas on a generic computer under MPEP 2106.05(f).
Streaming the sensor data from a plurality of sensor devices during a time duration, the plurality of sensor devices including at least two distinct senor types and disposed in a physical environment amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to receiving data under over a network under MPEP 2106.05(d)(II).
While streaming the sensor data: applying a large behavior model to process the one or more information items associated with the sensor data and generate a multimodal output associated with the sensor data in real time while the sensor data are being streamed, the multimodal output describing the one or more signature events associated with the sensor data in one of a plurality of predefined output modalities amounts to well-understood, routine, conventional activity under MPEP 2106.05(d)(I). De Barros (US 20250036695 A1) et al. at paragraph [0003] provides Berkheimer evidence for large language models configured to generate outputs based upon text inputs set forth by a user and in near real-time.
While streaming the sensor data: presenting the multimodal output according to the one of the plurality of predefined output modalities amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to presenting offers and gathering statistics under MPEP 2106.05(d)(II).
The additional elements as disclosed above, in combination with the abstract ideas, are not sufficient to amount to significantly more than the abstract ideas as they are well-understood, routine and conventional activities as disclosed in combination with generic computer functions that are implemented to perform the abstract ideas disclosed above. The claim is not patent eligible.

CLAIM 2 incorporates the rejection of claim 1.
Step 2A Prong 1: The abstract ideas of claim 1 are incorporated. A first information item is generated based on the subset of sensor data to characterize the first signature event is an judgement and evaluation mental process which can reasonably be performed in the human mind with the aid of pencil and paper.
Step 2A Prong 2 and Step 2B: A subset of sensor data corresponds to a first signature event, and includes a first temporal sequence of sensor samples obtained from a first sensor device and a second temporal sequence of sensor samples obtained from a second sensor device amounts a description of sensor data, which is a field of use and technological environment under MPEP 2106.05(h).
A first sensor type of the first sensor device is different from a second sensor type of the second sensor device amounts to a field of use and technological environment under MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 3 incorporates the rejection of claim 2.
Step 2A Prong 1: The abstract ideas of claim 2 are incorporated.
Step 2A Prong 2: The first temporal sequence of sensor samples and the second temporal sequence of sensor samples are concurrently measured, and wherein the first temporal sequence of sensor samples has a first sampling rate, and the second temporal sequence of sensor samples has a second sampling rate that is different from the first sampling rate amounts to mere data-gathering, an insignificant extra-solution activity under MPEP 2106.05(g).
Step 2B: The first temporal sequence of sensor samples and the second temporal sequence of sensor samples are concurrently measured, and wherein the first temporal sequence of sensor samples has a first sampling rate, and the second temporal sequence of sensor samples has a second sampling rate that is different from the first sampling rate amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to receiving data over a network under MPEP 2106.05(d)(II). The claim is not patent eligible.

CLAIM 4 incorporates the rejection of claim 2.
Step 2A Prong 1: The abstract ideas of claim 2 are incorporated.
Step 2A Prong 2 and Step 2B: Applying at least a universal event projection model to process the first temporal sequence of sensor samples and the second temporal sequence of sensor samples jointly to generate the first information item amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f). The claim is not patent eligible.

CLAIM 5 incorporates the rejection of claim 2.
Step 2A Prong 1: The abstract ideas of claim 2 are incorporated.
Step 2A Prong 2 and Step 2B: Applying at least a first event projection model to process the first temporal sequence of sensor samples to generate the first information item amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f).
Applying at least a second event projection model to process the second temporal sequence of sensor samples to generate the first information item, the first event projection model distinct from the second event projection model amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f).

CLAIM 6 incorporates the rejection of claim 5.
Step 2A Prong 1: The abstract ideas of claim 5 are incorporated. Selecting each of the first event projection model and the second event projection model based on a respective device type of the first sensor device and the second sensor device is a judgement and evaluation mental process which can reasonably be performed in the human mind with the aid of pencil and paper.
Step 2A Prong 2 and Step 2B: The claim does not recite any additional elements which, alone or in combination, would integrate the abstract ideas into a practical application. The claim does not recite any additional elements which, in combination with the abstract ideas, would be sufficient to amount to significantly more than the abstract ideas. The claim is not patent eligible.

CLAIM 7 incorporates the rejection of claim 1.
Step 2A Prong 1: The abstract ideas of claim 1 are incorporated. Generating an ordered sequence of respective sensor data features defining a respective parametric representation of the temporal sequence of respective sensor samples, independently of a sensor type of the respective sensor device is a judgement and evaluation mental process which can reasonably be performed in the human mind with the aid of pencil and paper.
Step 2A Prong 2 and Step 2B: Each sensor device corresponds to a temporal sequence of respective sensor samples amounts to a field of use and technological environment under MPEP 2106.05(h).
Providing the ordered sequence of respective sensor data features to an event projection model amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f). The claim is not patent eligible.

CLAIM 8 incorporates the rejection of claim 1.
Step 2A Prong 1: The abstract ideas of claim 1 are incorporated. Associating each sensor data item of the temporal sequence of sensor data with a respective timestamp and a subset of respective sensor samples that are grouped based on the temporal window is a judgement and evaluation mental process which can reasonably be performed in the human mind with the aid of pencil and paper.
Step 2A Prong 2: The sensor data includes a temporal sequence of sensor data amounts a description of sensor data, which is a field of use and technological environment under MPEP 2106.05(h).
Obtaining the sensor data further comprises: obtaining a stream of context data measured continuously by the plurality of sensor devices, the stream of context data including the temporal sequence of respective sensor samples that are grouped for each sensor device based on a temporal window, the temporal window configured to move with a time axis amounts to mere data-gathering, an insignificant extra-solution activity under MPEP 2106.05(g).
Step 2B: The sensor data includes a temporal sequence of sensor data amounts a description of sensor data, which is a field of use and technological environment under MPEP 2106.05(h).
Obtaining the sensor data further comprises: obtaining a stream of context data measured continuously by the plurality of sensor devices, the stream of context data including the temporal sequence of respective sensor samples that are grouped for each sensor device based on a temporal window, the temporal window configured to move with a time axis amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to receiving data over a network under MPEP 2106.05(d)(II). The claim is not patent eligible.

CLAIM 9 incorporates the rejection of claim 1.
Step 2A Prong 1: The abstract ideas of claim 1 are incorporated.
Step 2A Prong 2: Storing the one or more information items associated with the one or more signature events, the one or more information items including a timestamp and a location of each of the one or more signature events amounts to insignificant extra-solution activity under MPEP 2106.05(g).
Step 2B: Storing the one or more information items associated with the one or more signature events, the one or more information items including a timestamp and a location of each of the one or more signature events amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to storing information in memory under MPEP 2106.05(d)(II). The claim is not patent eligible.

CLAIM 10 incorporates the rejection of claim 1.
Step 2A Prong 1: The abstract ideas of claim 1 are incorporated. Determining a behavior pattern based on the one or more signature events for the time duration is a judgement and evaluation mental process which can reasonably be performed in the human mind with the aid of pencil and paper.
Generating a subset of the one or more information items describing the behavior pattern is a judgement and evaluation mental process which can reasonably be performed in the human mind with the aid of pencil and paper.
Step 2A Prong 2: Examiner treats “providing the subset of the one or more information items of the behavior pattern associated with the sensor data” as outputting the subset, which amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f) and insignificant extra-solution activity under MPEP 2106.05(g).
Step 2B: Providing the subset of the one or more information items of the behavior pattern associated with the sensor data amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f) and well-understood, routine, conventional activity recognized by the courts which is analogous to presenting offers and gathering statistics under MPEP 2106.05(d)(II). The claim is not patent eligible.

CLAIM 11 incorporates the rejection of claim 1.
Step 2A Prong 1: The abstract ideas of claim 1 are incorporated. Based on a predefined loss function, training the large behavior model using the plurality of training inputs and associated ground truths is a mathematical calculation.
Step 2A Prong 2: Obtaining a plurality of training inputs, each training input including a training text prompt and an information item associated with a training signature event amounts to mere data-gathering, an insignificant extra-solution activity under MPEP 2106.05(g).
Obtaining a ground truth corresponding to each training input, the ground truth including a sample multimodal output preferred for the training input amounts to mere data-gathering, an insignificant extra-solution activity under MPEP 2106.05(g).
Step 2B: Obtaining a plurality of training inputs, each training input including a training text prompt and an information item associated with a training signature event amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to receiving data over a network under MPEP 2106.05(d)(II).
Obtaining a ground truth corresponding to each training input, the ground truth including a sample multimodal output preferred for the training input amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to receiving data over a network under MPEP 2106.05(d)(II). The claim is not patent eligible.

CLAIM 12 incorporates the rejection of claim 1.
Step 2A Prong 1: The abstract ideas of claim 1 are incorporated.
Step 2A Prong 2: Obtaining a plurality of training inputs, each training input including one or more test tags of a sequence of signature events, the one or more test tags having a predefined description format in which one or more information items and an associated timestamp of each signature event is organized amounts to mere data-gathering, an insignificant extra-solution activity under MPEP 2106.05(g).
Step 2B: Obtaining a plurality of training inputs, each training input including one or more test tags of a sequence of signature events, the one or more test tags having a predefined description format in which one or more information items and an associated timestamp of each signature event is organized amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to receiving data over a network under MPEP 2106.05(d)(II). The claim is not patent eligible.

Claim 13 is directed to a system which recites the same features as the method of claim 1 and is therefore rejected for at least the same reasons therein.
In Step 2A Prong 2 and in Step 2B, one or more processors and memory having instructions stored thereon, which when executed by the one or more processors cause the processors to perform operations amount to generic computer components for applying the abstract ideas on a generic computer under MPEP 2106.05(f). The claim is not patent eligible.

CLAIM 14 incorporates the rejection of claim 13.
Step 2A Prong 1: The abstract ideas of claim 13 are incorporated.
Step 2A Prong 2 and Step 2B: For a temporal window corresponding to a subset of sensor data, the memory further having instructions for: applying at least a universal event projection model to process the subset of sensor data within the respective temporal window and detect the one or more signature events amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f). The claim is not patent eligible.

CLAIM 15 incorporates the rejection of claim 13.
Step 2A Prong 1: The abstract ideas of claim 13 are incorporated.
Step 2A Prong 2 and Step 2B: The plurality of sensor devices include one or more of: a presence sensor, a proximity sensor, a microphone, a motion sensor, a gyroscope, an accelerometer, a Radar, a Lidar scanner, a camera, a temperature sensor, a heartbeat sensor, and a respiration sensor amount to generic computer components for applying the abstract ideas on a generic computer under MPEP 2106.05(f), and a field of use and technological environment under MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 16 incorporates the rejection of claim 13.
Step 2A Prong 1: The abstract ideas of claim 13 are incorporated..
Step 2A Prong 2: Processing the sensor data to generate one or more sets of intermediate items successively and iteratively, until generating the one or more information items amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f).
Storing the one or more information items or the multimodal output in a database amounts to insignificant extra-solution activity under MPEP 2106.05(g).
Step 2B: Processing the sensor data to generate one or more sets of intermediate items successively and iteratively, until generating the one or more information items amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f).
Storing the one or more information items or the multimodal output in a database amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to storing information in memory and electronic recordkeeping under MPEP 2106.05(d)(II). The claim is not patent eligible.

CLAIM 17 incorporates the rejection of claim 16.
Step 2A Prong 1: The abstract ideas of claim 13 are incorporated.
Step 2A Prong 1: Processing the sensor data to generate a first set of intermediate items at a first time amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f).
Storing the first set of intermediate items in the database amounts to insignificant extra-solution activity under MPEP 2106.05(g).
Processing the first set of intermediate items to generate one or more second sets of intermediate items successively at one or more successive second times following the first time amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f).
Successively storing the one or more second sets of intermediate items in the database, and deleting the first set of intermediate items from the database amounts to insignificant extra-solution activity under MPEP 2106.05(g).
Processing a most recent intermediate set of the one or more second sets of intermediate items to generate the one or more information items at a third time following the one or more successive second times amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f).
Step 2B: Processing the sensor data to generate a first set of intermediate items at a first time amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f).
Storing the first set of intermediate items in the database amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to storing information in memory and electronic recordkeeping under MPEP 2106.05(d)(II).
Processing the first set of intermediate items to generate one or more second sets of intermediate items successively at one or more successive second times following the first time amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f).
Successively storing the one or more second sets of intermediate items in the database, and deleting the first set of intermediate items from the database amounts to storing information in memory and electronic recordkeeping under MPEP 2106.05(d)(II).
Processing a most recent intermediate set of the one or more second sets of intermediate items to generate the one or more information items at a third time following the one or more successive second times amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f). The claim is not patent eligible.

Claim 18 is directed to a product which recites the same features as the method of claim 1 and is therefore rejected for at least the same reasons therein.
In Step 2A Prong 2 and Step 2B, a non-transitory computer-readable storage medium, having instructions stored thereon, which when executed by one or more processors cause the one or more processors to perform operations amount to generic computer components for applying the abstract ideas on a generic computer under MPEP 2106.05(f). The claim is not patent eligible.

CLAIM 19 incorporates the rejection of claim 18.
Step 2A Prong 1: The abstract ideas of claim 18 are incorporated.
Step 2A Prong 2 and Step 2B: This limitation “wherein the large behavior model includes a large language model (LLM) having a self-attention based transformer structure” modifies the limitation of applying a behavior model, which amounts to mere instructions to apply the abstract ideas on a generic computer under MPEP 2106.05(f).
The multimodal output includes one or more of: description, timestamp, numeral information, statistic summary, warning message, and recommended action associated with the one or more signature events amounts a description of output data, which is a field of use and technological environment under MPEP 2106.05(h). The claim is not patent eligible.

CLAIM 20 incorporates the rejection of claim 18.
Step 2A Prong 1: The abstract ideas of claim 18 are incorporated. Identifying the plurality of predefined output modalities including two one or more distinct modalities of: textual statements, software code, an image or video, an information dashboard having a predefined format, a user interface, and a heatmap is a judgement and evaluation mental process which can reasonably be performed in the human mind with the aid of pencil and paper.
Selecting the one of the plurality of predefined output modalities is a judgement and evaluation mental process which can reasonably be performed in the human mind with the aid of pencil and paper.
Step 2A Prong 2 and Step 2B: The claim does not recite any additional elements which, alone or in combination, would integrate the abstract ideas into a practical application. The claim does not recite any additional elements which, in combination with the abstract ideas, would be sufficient to amount to significantly more than the abstract ideas. The claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA 35 U.S.C. 102 and 103 (or as subject to pre-AIA 35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis (i.e., changing from AIA to pre-AIA ) for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-4, 7-10, 13-15, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Pandya et al. (US 11803955 B1) in view of Newcomb (US 20220031105 A1).

Regarding claim 1, Pandya teaches: A method for presenting sensor data, comprising: at a computer system having one or more processors and memory: (C. 26, L. 29-35 discloses an edge computing system 310. C. 27, L. 65 to C. 28, L. 2 and 13-15 discloses the edge computing system may be a server comprising a processor and memory.)
streaming the sensor data from a plurality of sensor devices during a time duration, the plurality of sensor devices including at least two distinct senor types and disposed in a physical environment; and (C. 8, L. 23 to “site” in line 34 disclose a sensor data stream being sent to an edge computing device for inference, and L. 39-41 disclose a hazard worksite which is a physical environment. C. 17, L. 12-18 discloses sensors. A sensor data stream contains sensor data during a time duration.)
while streaming the sensor data: (C. 7, L. 37-53 discloses issuing a “real-time alert or warning to workers” by utilizing “real-time multimodal sensor data”. This indicates the that processing sensor data streams and delivering an alert to workers happens continuously while streaming the sensor data.)
…
generating one or more information items characterizing the one or more signature events detected in the sensor data, independently of the sensor types of the plurality of sensor devices; (Each of Fig. 2, C. 17, L. 59-62, and C. 18, L. 12-17 discloses a module 211 for generating an input feature dataset to be processed by the trained predictive model 213. A signature event is an incident, a behavior/action not in compliance with safety protocol, or fatigue level, as disclosed in C. 17, L. 62-66. Information items are input features in the input feature dataset. The module 211 generates features independently of the sensor types because it processes data from every sensor without consideration for the sensor type.)
applying a large behavior model to process the one or more information items associated with the sensor data and generate a multimodal output associated with the sensor data in real time while the sensor data are being streamed, the multimodal output describing the one or more signature events associated with the sensor data in one of a plurality of predefined output modalities; and (With regard to a “multimodal output”, C. 10, L. 3-10 teaches delivering an alert (e.g., vibration or audio alarm) in response to a detection of an incident or an intervention for changing behavior. An alert is a first predefined output modality, and an intervention is a second predefined output modality. C. 10, L. 10-15 lists types of an interventions. With regard to a “large behavior model”, each of Fig. 2, C. 17, L. 12-18, 47-55, and 59 to C. 18, L. 2 teaches a predictive model 213 for processing input features to generate the output 215. Alerts and interventions at the output 215 describe an incident, a behavior/action not in compliance with safety protocol, or fatigue level. The predictive model 213 is a “large behavior model” because it processes behaviors of construction workers.)
presenting the multimodal output according to the one of the plurality of predefined output modalities. (C. 10, L. 3-15 teaches delivering an alert as a vibration and delivering an intervention as a rhythmic cue. Delivering the alert or intervention to the worker corresponds to “presenting the multimodal output.”)
However, Pandya does not explicitly teach: while streaming the sensor data: detecting one or more signature events in the sensor data;
But Newcomb teaches: while streaming the sensor data: detecting one or more signature events in the sensor data; (On page 9 in left column, lines 33-40 teaches, “Computer vision uses software of the one or more cameras 16 for analyzing sequential frames of a live video feed for differences, and registers a motion event when a large enough change is detected. In one embodiment, a significant change in pixels over short periods of time are used for comparison to the longer term average to determine that something may have occurred.” The limitation of a signature event as claimed corresponds to the motion event.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Newcomb’s computer vision software into Pandya’s computer vision system. A motivation for the combination is to execute Pandya’s safety inference engine 210 only when a significant change in the environment happens (Newcomb, page 9, left column, lines 33-40). Furthermore, it would have been obvious to have incorporated similar software into Pandya’s LIDAR system 203 that analyzes sequential frames of LIDAR data for differences and registers a motion event when a large enough change is detected.

Regarding claim 2, the combination of Pandya and Newcomb teaches: The method of claim 1, wherein:
Pandya teaches: a subset of sensor data corresponds to a first signature event, and includes a first temporal sequence of sensor samples obtained from a first sensor device and a second temporal sequence of sensor samples obtained from a second sensor device; (C. 10, L. 3-6 discloses detection of an incident such as a worker tripping. C. 17, L. 12-18 discloses a first sensor is a computer vision (CV) system 201 and a second sensor is a LIDAR system 203. A first signature event is the incident, and a subset of sensor data includes sequences of CV and LIDAR data samples corresponding to this incident. Additionally, C. 18, L. 21 to “time” in L. 29 discloses that data captured by camera and LIDAR may be aligned with respect to time, which indicates there is a temporal sequence of camera/CV data samples and a temporal sequence of LIDAR data samples.)
a first sensor type of the first sensor device is different from a second sensor type of the second sensor device; and (C. 17, L. 12-18 discloses a first sensor is a CV system 201 and a second sensor is a LIDAR system 203.)
a first information item is generated based on the subset of sensor data to characterize the first signature event. (Each of Fig. 2, C. 17, L. 59-62, and C. 18, L. 12-17 discloses a module 211 for generating an input feature dataset. A first information item includes an input feature within the dataset that characterize the incident.)

Regarding claim 3, the combination of Pandya and Newcomb teaches: The method of claim 2,
Pandya teaches: wherein the first temporal sequence of sensor samples and the second temporal sequence of sensor samples are concurrently measured, and (C. 2, L. 22-37 discloses fusing CV and LIDAR sensor data, which indicates they had been concurrently measured.)
wherein the first temporal sequence of sensor samples has a first sampling rate, and the second temporal sequence of sensor samples has a second sampling rate that is different from the first sampling rate. (C. 18, L. 21 to “frequency” in L. 25 teaches the CV sensor may capture data at a different frequency from the LIDAR sensor. Data captured at different frequencies have different sampling rates.)

Regarding claim 4, the combination of Pandya and Newcomb teaches: The method of claim 2, further comprising:
Pandya teaches: applying at least a universal event projection model to process the first temporal sequence of sensor samples and the second temporal sequence of sensor samples jointly to generate the first information item. (Each of Fig. 2, C. 17, L. 59-62, and C. 18, L. 12-17 discloses a module 211 for generating an input feature dataset to be processed by the trained predictive model 213. A universal event projection model is module 211, and the first information item includes an input feature within the dataset that characterizes the tripping incident.)

Regarding claim 7, the combination of Pandya and Newcomb teaches: The method of claim 1,
Pandya teaches: wherein each sensor device corresponds to a temporal sequence of respective sensor samples, (C. 18, L. 21 to “time” in L. 29 discloses that data captured by camera and LIDAR may be aligned with respect to time, which indicates the system captures a temporal sequence of camera/CV data samples and a temporal sequence of LIDAR data samples.)
the method further comprising, for each sensor device: generating an ordered sequence of respective sensor data features defining a respective parametric representation of the temporal sequence of respective sensor samples, independently of a sensor type of the respective sensor device; and (C. 18, L. 21 to “time” in L. 29 teaches pre-processing raw sensor data by aligning the data with respect to time. Each “temporal sequence of respective sensor samples” corresponds to raw sensor data from a particular sensor (e.g., camera or LIDAR). Each “ordered sequence of respective sensor features defining a respective parametric representation” corresponds to time-aligned sensor data from a particular sensor. The time-aligned sensor data contains sensor features, and it would have been aligned based on parameters of time. The module 211 performs data alignment independently of the sensor types because it aligns data from every sensor without consideration for the sensor type.)
providing the ordered sequence of respective sensor data features to an event projection model. (C. 18, L. 14-17 and 21-29 teaches the input feature generation module 211 generates an input feature dataset based on the time-aligned sensor data. An event projection model is the portion of module 211 which extracts/generates features from sensor data.)

Regarding claim 8, the combination of Pandya and Newcomb teaches: The method of claim 1,
Pandya teaches: wherein the sensor data includes a temporal sequence of sensor data, and (C. 18, L. 21 to “time” in L. 29 discloses that data captured by camera and LIDAR may be aligned with respect to time, which indicates the system captures a temporal sequence of camera/CV data samples and a temporal sequence of LIDAR data samples.)
obtaining the sensor data further comprises: obtaining a stream of context data measured continuously by the plurality of sensor devices, the stream of context data including the temporal sequence of respective sensor samples that are grouped for each sensor device based on a temporal window, the temporal window configured to move with a time axis; and (C. 3, L. 56 to C. 4, L. 4 discloses receiving a data stream including the CV output data generated by the CV component and the 3D point cloud data generated by the LIDAR component. Examiner treats this data stream as “a stream of context data” It is measured continuously by the sensors, and it includes the CV data samples and LIDAR data samples. The CV data samples over time form a first group and the LIDAR data samples over time form a second group. A temporal window for each data type is configured to move along the time axis from an initial time to a final time.)
associating each sensor data item of the temporal sequence of sensor data with a respective timestamp and a subset of respective sensor samples that are grouped based on the temporal window. (C. 10, L. 3-6 discloses detection of an incident such as a worker tripping. Sensor data items include CV data samples corresponding to this incident and LIDAR data samples corresponding to this incident. In C. 19, L. 40-42, the term “time stamps” indicates each data sample for each sensor is associated with a timestamp. Thus, CV data samples corresponding to the incident are associated with timestamps and LIDAR data samples corresponding to the incident are associated with the same timestamps.)

Regarding claim 9, the combination of Pandya and Newcomb teaches: The method of claim 1, further comprising:
Pandya teaches: storing the one or more information items associated with the one or more signature events, the one or more information items including a timestamp and a location of each of the one or more signature events. (C. 18, L. 12-17 discloses a module 211 for generating an input feature dataset. Information items are input features in the input feature dataset. C. 18, L. 21 to “time” in L. 29 discloses that data captured by camera and LIDAR may be aligned with respect to time, which indicates that the input feature dataset are associated with timestamps. A location of every incident is the construction site, based on C. 8, L. 31 to “site” in L. 34.)

Regarding claim 10, the combination of Pandya and Newcomb teaches: The method of claim 1, further comprising:
Pandya teaches: generating a subset of the one or more information items describing the behavior pattern; and (Based on C. 17, L. 62-67, the signature event (incident, a behavior/action not in compliance with safety protocol, or fatigue level) is also a behavior pattern that occurs during a time duration. C. 18, L. 12-17 discloses a module 211 for generating an input feature dataset to be processed by the trained predictive model 213. Since information items are input features in the input feature dataset, any information item itself is a subset as claimed.)
providing the subset of the one or more information items of the behavior pattern associated with the sensor data. (C. 18, L. 12-14 discloses providing the input features to the trained predictive model 213 for processing.)
However, Pandya does not explicitly teach: determining a behavior pattern based on the one or more signature events for the time duration;
But Newcomb teaches: determining a behavior pattern based on the one or more signature events for the time duration; (On page 9 in left column, lines 33-40 teaches analyzing sequential frames of a live video feed for differences, and registers a motion event when a large enough change is detected. The limitations of a behavior pattern and a signature event as claimed correspond to the motion event.)
A motivation for the combination is to execute Pandya’s safety inference engine 210 only when a significant change in the environment happens (Newcomb, page 9, left column, lines 33-40).

Claim 13 is directed to a system which recites the same features as the method of claim 1 and is therefore rejected for at least the same reasons therein.
Pandya teaches: one or more processors and memory having instructions stored thereon, which when executed by the one or more processors cause the processors to perform: (C. 28, 10-13)

Regarding claim 14, the combination of Pandya and Newcomb teaches: The computer system of claim 13,
Pandya teaches: memory further having instructions (C. 28, 10-13)
However, Pandya does not explicitly teach: wherein for a temporal window corresponding to a subset of sensor data, the memory further having instructions for: applying at least a universal event projection model to process the subset of sensor data within the respective temporal window and detect the one or more signature events.
But Newcomb teaches: wherein for a temporal window corresponding to a subset of sensor data,
A motivation for the combination is to execute Pandya’s safety inference engine 210 only when a significant change in the environment happens (Newcomb, page 9, left column, lines 33-40).

Regarding claim 15, the combination of Pandya and Newcomb teaches: The computer system of claim 13,
Pandya teaches: wherein the plurality of sensor devices include one or more of: a presence sensor, a proximity sensor, a microphone, a motion sensor, a gyroscope, an accelerometer, a Radar, a Lidar scanner, a camera, a temperature sensor, a heartbeat sensor, and a respiration sensor. (C. 10, L. 29-32 teaches at least a Lidar scanner and a camera.)

Claim 18 is directed to a product which recites the same features as the method of claim 1 and is therefore rejected for at least the same reasons therein.
Pandya teaches: A non-transitory computer-readable storage medium, having instructions stored thereon, which when executed by one or more processors cause the one or more processors to perform: (C. 28, 10-13)

Regarding claim 20, the combination of Pandya and Newcomb teaches: The non-transitory computer-readable storage medium of claim 18, further comprising instructions for:
Pandya teaches: identifying the plurality of predefined output modalities include two or more distinct modalities of: textual statements, software code, an image or video, an information dashboard having a predefined format, a user interface, and a heatmap; and (C. 10, L. 3-6 states, “In some cases, the mobile tag device may be capable of delivering an alert (e.g., vibration, audio alarm, etc.) in response to a detection of an incident”. C. 10, L. 13-15 states, “intervention such as rhythmic cue, audio, visual, or tactile stimulus may be delivered to the worker via the wearable device”. C. 17, L. 33-34 states, “In some cases, the output 215 may include feedback information such as an alert” and L. 49-51 states, “For example, the output 215 may further include interventions delivered to the associated individual”. An audio device for delivering an audio alarm alert is a first type of user interface. A visual device for delivering an visual intervention stimulus is a second type of user interface.)
selecting the one of the plurality of predefined output modalities. (Based on C. 10, L. 3-7 and 13-16, the audio device would be selected when the predictive model outputs an audio alarm alert and the visual device would be selected when the predictive model outputs a visual intervention stimulus.)

Claims 5-6 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Pandya et al. (US 11803955 B1) in view of Newcomb (US 20220031105 A1) and Lee et al. (US 20220067479 A1, cited in PTO-892 issued 01/23/2025).

Regarding claim 5, the combination of Pandya and Newcomb teaches: The method of claim 2, further comprising:
Pandya teaches: applying at least a first event projection model to process the first temporal sequence of sensor samples to generate the first information item; and (C. 18, L. 12-17 discloses a module 211 for generating an input feature dataset. A first event projection model is module 211.)
Pandya discloses applying the same feature generation module 211 to extract features from all the sensor data, but Pandya and Newcomb do not explicitly teach: applying at least a second event projection model to process the second temporal sequence of sensor samples to generate the first information item, the first event projection model distinct from the second event projection model.
But Lee teaches: applying at least a second event projection model to process the second temporal sequence of sensor samples to generate the first information item, the first event projection model distinct from the second event projection model. ([0058], lines 1-2 teaches, “Continuous signal streams from multiple sensors may be received and synchronized.” Each continuous signal stream would contain a temporal sequence of sensor samples. [0064], from line 12 to “2)” in line 16 teaches feature extraction modules. First and second event projection models correspond to feature extraction modules 1 and 2, respectively. The limitation of “first information item” corresponds to all extracted features collectively.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Lee’s feature extraction modules into Pandya’s input feature generation module 211. A motivation for the combination is that different types of sensor data may require different algorithms to perform feature extraction. For example, Lee at [0066] discloses that a mel-frequency spectral feature may be extracted from an audio signal and other features may be extracted from a three-channel motion signal.

Regarding claim 6, the combination of Pandya, Newcomb, and Lee teaches: The method of claim 5.
However, Pandya and Newcomb do not explicitly teach: further comprising: selecting each of the first event projection model and the second event projection model based on a respective device type of the first sensor device and the second sensor device.
But Lee teaches: selecting each of the first event projection model and the second event projection model based on a respective device type of the first sensor device and the second sensor device. ([0060], lines 1-4 discloses receiving an audio signal and a motion sensor signal. The audio signal is detected by an audio sensor and the motion sensor signal is detected by a motion sensor. [0064], from line 12 to “2)” in line 16 and all of [0066] discloses that feature extraction module 1 extracts mel-frequency spectral features when the sensor device type is an audio sensor, and feature extraction module 2 extracts features from motion signals when the sensor device type is a motion signal sensor. Thus, the respective feature extraction modules have been selected based on respective device types.)
A motivation for the combination is that different types of sensor data may require different algorithms to perform feature extraction. (Lee, [0066])

Regarding claim 12, the combination of Pandya and Newcomb teaches: The method of claim 1, further comprising:
Pandya teaches: obtaining a plurality of training inputs, each training input including one or more test tags of a contains timestamps). Timestamps describe the incident and thus constitute a predefined description format.)
However, Pandya and Newcomb do not explicitly teach: a sequence of signature events
But Lee teaches: a sequence of signature events ([0067], lines 13-23 teach sub-event states such as walking, stopped, etc. and a sequence model calculates detection confidence based on a sequence of the sub-event states. The limitation of “signature events” corresponds to sub-event states and the limitation of “a sequence of signature events” corresponds to a sequence of the sub-event states.)
Lee’s different sub-event states are analogous to Pandya’s different types of incidents such as a worker tripping and falling. It would have been obvious to a person having ordinary skill in the art to have incorporated Lee’s sequence of sub-event states into the combination of Pandya and Newcomb as a sequence of incidents such as a worker tripping and then falling. A motivation for the combination is that training a predictive model to predict sequences of events/incidents would allow for more specific types of classifications when compared to predicting single events/incidents. (Lee, [0067])

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Pandya et al. (US 11803955 B1) in view of Newcomb (US 20220031105 A1) and Daredia et al. (US 20200403818 A1, cited in PTO-892 issued 01/23/2025).

Regarding claim 11, the combination of Pandya and Newcomb teaches: The method of claim 1, further comprising:
Pandya teaches: obtaining a plurality of training inputs, each training input including
obtaining a ground truth corresponding to each training input, the ground truth including a sample multimodal output preferred for the training input; and (C. 20, L. 36-41 discloses the validators accumulate statistics of true and false positives and true and false negatives. By validating the predictive model output, the validators assign a ground truth label to each safety-related event. When the model outputs an alert or an intervention, the validator would assign a correct alert or intervention based on each safety-related event.)

However, Pandya and Newcomb do not explicitly teach: each training input including a training text prompt and
based on a predefined loss function, training the large behavior model
But Daredia teaches: obtaining a plurality of training inputs, each training input including a training text prompt ([0161], lines 3-6 discloses generating a redacted copy of the digital transcript from an unredacted copy of the digital transcript, and [0165] discloses training a transcript redaction neural network. The limitation of training inputs each including a training text prompt corresponds to unredacted transcripts. The transcript redaction neural network generates a redacted transcript at the output based on the unredacted transcript received at the input.)
based on a predefined loss function, training the large behavior model ([0135]-[0136] disclose training the digital transcription neural network using a transcription training loss model (“predefined loss function”) which determines a transcription error amount between the neural network output and ground truth. The transcript redaction neural network at [0165] would be trained analogously. The neural network output is the redacted transcript predicted by the neural network and the ground truth is the correct redacted transcript.)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Daredia’s training loss models and ground truth data into the combination of Pandya and Newcomb. A motivation for the combination is to improve the prediction accuracy of a machine learning model that processes text. (Daredia, [0136])

Claims 16-17 is rejected under 35 U.S.C. 103 as being unpatentable over Pandya et al. (US 11803955 B1) in view of Newcomb (US 20220031105 A1) and Aimone et al. (US 20220027712 A1, cited in PTO-892 issued 01/23/2025).

Regarding claim 16, the combination of Pandya and Newcomb teaches: The computer system of claim 13, further comprising instructions for:
Pandya teaches: processing the sensor data
storing the one or more information items or the multimodal output in a database. (C. 13, L. 51-67 teaches that a local database 141 may store data about a predictive model and data generated by a predictive model including an output of the model. The input feature dataset is the model input and thus comprise data about the model.)
However, Pandya and Newcomb do not explicitly teach: processing the sensor data to generate one or more sets of intermediate items successively and iteratively,
But Aimone teaches: processing the
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Aimone’s spiking neural network and temporal buffer circuit into Pandya’s input feature generation module. A motivation for the combination is to perform efficient analog vector matrix operations that underpin many of the relevant computations in neural computation. (Aimone, [0019], lines 1-5)

Regarding claim 17, the combination of Pandya, Newcomb, and Aimone teaches: The computer system of claim 16, further comprising instructions for:
Pandya at C. 18, L. 12-17 teaches processing sensor data to generate an input feature dataset. However, Pandya and Newcomb do not explicitly teach: processing the sensor data to generate a first set of intermediate items at a first time; storing the first set of intermediate items in the database; processing the first set of intermediate items to generate one or more second sets of intermediate items successively at one or more successive second times following the first time; successively storing the one or more second sets of intermediate items in the database, and deleting the first set of intermediate items from the database; and processing a most recent intermediate set of the one or more second sets of intermediate items to generate the one or more information items at a third time following the one or more successive second times.
But Aimone teaches: processing the
storing the first set of intermediate items in the database; ([0025], lines 1-3 discloses a temporal buffer circuit that holds spiking activation signals for a delay time. The database corresponds to the temporal buffer circuit, and the spiking activation signal generated at the first time step corresponds to the first set of intermediate items.)
processing the first set of intermediate items to generate one or more second sets of intermediate items successively at one or more successive second times following the first time; ([0025] and [0031] discloses that each mosaic can be sequentially computed using the temporal buffer. [0054], lines 1-6 and [0055], lines 1-3 disclose inputting spiking activation signals (“the first set of intermediate items”) back into crossbar stack as second input data for a second time step. At the second time step, the second input data is processed to generate an output spiking activation signal (“one or more second sets of intermediate items”) according to [0051]-[0052].)
successively storing the one or more second sets of intermediate items in the database, and deleting the first set of intermediate items from the database; and ([0052], lines 3-5 together with [0055], lines 1-3 discloses storing the output spiking activation signal generated at the second time step. The output spiking activation signal at the first time step would be an intermediate result for an intermediate layer according to [0045]-[0046]. Therefore, the temporal buffer circuit would be overwritten with a new output spiking activation signal at each time step.)
processing a most recent intermediate set of the one or more second sets of intermediate items to generate the one or more information items at a third time following the one or more successive second times. ([0025] and [0031] discloses that each mosaic can be sequentially computed using the temporal buffer. [0054], lines 1-6 and [0055], lines 1-3 disclose inputting spiking activation signals (“a most recent intermediate set of the one or more sets of intermediate items”) back into crossbar stack as third input data for a third time step. At the third time step, the third input data is processed to generate an output spiking activation signal (“the one or more information items”) according to [0051]-[0052].)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Aimone’s spiking neural network and temporal buffer circuit into Pandya’s input feature generation module. A motivation for the combination is to perform efficient analog vector matrix operations that underpin many of the relevant computations in neural computation. (Aimone, [0019], lines 1-5)

Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Pandya et al. (US 11803955 B1) in view of Newcomb (US 20220031105 A1) and Fabian et al. (US 20240303422 A1).

Regarding claim 19, the combination of Pandya and Newcomb teaches: The non-transitory computer-readable storage medium of claim 18.
Pandya teaches: the multimodal output includes one or more of: description, timestamp, numeral information, statistic summary, warning message, and recommended action associated with the one or more signature events. (C. 10, L. 3-10 discloses an alert (“warning message”) and an intervention (“recommended action”).)
Pandya, C. 18, L. 1-11 teaches different types machine learning networks used for the training predictive model 213. However, Pandya and Newcomb do not explicitly teach: wherein the large behavior model includes a large language model (LLM) having a self-attention based transformer structure
But Fabian teaches: wherein the large behavior model includes a large language model (LLM) having a self-attention based transformer structure ([0025], lines 1-6)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Fabian’s LLM into Pandya’s trained predictive model 213. A motivation for the combination is to capture contextual relationships between words in a sentence or text passage. (Fabian, [0025])

Response to Arguments
The following is the Examiner’s response to Applicant’s arguments filed 04/14/2025.
Applicant’s Arguments IV. (Remarks pages 11-12): Applicant argues that no human mind with the aid of pencil and paper can continuously accomplish the concurrent activities from claim 1 as a computer system can. Even if assuming arguendo the independent claims recite an abstract idea, the independent claims are directed to processing sensor data and content data to generate a user-defined output in real time while or after the sensor data are collected, which constitutes a practical application.
Examiner’s Response: Applicant's arguments have been fully considered but they are not persuasive. In Step 2A Prong 1, the limitation “while streaming the sensor data: detecting one or more signature events in the sensor data” is an observation mental process which can reasonably be performed in the human mind with the aid of pencil and paper.
The limitation “while streaming the sensor data: generating one or more information items characterizing the one or more signature events detected in the sensor data, independently of the sensor types of the plurality of sensor devices” is an judgement and evaluation mental process which can reasonably be performed in the human mind with the aid of pencil and paper. The claim recites an abstract idea.
Applicant's arguments that the detecting and generating limitations in claim 1, lines 7-10 allegedly cannot be performed in the human mind fail to comply with 37 CFR 1.111(b) because they amount to a general allegation. Applicant has not persuasively explained why the human mind is allegedly incapable of detecting one or more signature events in the sensor data and generating one or more information items while sensor data is being streamed.
In the previous and current rejections of claim 1 under 35 U.S.C. 101, the limitation of “presenting the multimodal output according to the one of the plurality of predefined output modalities” has been treated as an additional element to be evaluated in Step 2A Prong 2, not as a mental process.
In Step 2A Prong 2, the limitation “while streaming the sensor data: applying a large behavior model to process the one or more information items associated with the sensor data and generate a multimodal output associated with the sensor data in real time while the sensor data are being streamed, the multimodal output describing the one or more signature events associated with the sensor data in one of a plurality of predefined output modalities” amounts to insignificant extra-solution activity under MPEP 2106.05(g).
The limitation “while streaming the sensor data: presenting the multimodal output according to the one of the plurality of predefined output modalities” amounts to insignificant extra-solution activity under MPEP 2106.05(g).
The additional elements as disclosed above, alone or in combination, do not integrate the abstract ideas into a practical application as they are mere insignificant extra solution activities as disclosed in combination with generic computer functions that are implemented to perform the abstract ideas disclosed above. The claim is directed to an abstract idea.
In Step 2B, the limitation “while streaming the sensor data: applying a large behavior model to process the one or more information items associated with the sensor data and generate a multimodal output associated with the sensor data in real time while the sensor data are being streamed, the multimodal output describing the one or more signature events associated with the sensor data in one of a plurality of predefined output modalities” amounts to well-understood, routine, conventional activity under MPEP 2106.05(d)(I). De Barros (US 20250036695 A1) et al. at paragraph [0003] provides Berkheimer evidence for large language models configured to generate a outputs based upon text inputs set forth by a user and in near real-time. This additional element does not recite any improvement to the large behavior model itself, but merely describes how it operates in a streaming environment. Amending the claim with details from the written disclosure that demonstrate an improvement to the large behavior model might advance prosecution under 35 U.S.C. 101.
The limitation “while streaming the sensor data: presenting the multimodal output according to the one of the plurality of predefined output modalities” amounts to well-understood, routine, conventional activity recognized by the courts and is analogous to presenting offers and gathering statistics under MPEP 2106.05(d)(II).
The additional elements as disclosed above, in combination with the abstract ideas, are not sufficient to amount to significantly more than the abstract ideas as they are well-understood, routine and conventional activities as disclosed in combination with generic computer functions that are implemented to perform the abstract ideas disclosed above. The claim is not patent eligible.

Applicant’s Arguments V. (Remarks pages 12-13): Applicant argues that the combination of Lee and Daredia does not teach or suggest “applying a large behavior model to process the one or more information items associated with the sensor data and generate a multimodal output associated with the sensor data in real time while the sensor data are being streamed.”
Examiner’s Response: Applicant’s arguments with respect to claim 1 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any nonprovisional extension fee (37 CFR 1.17(a)) pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached at (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/A.H.J./Examiner, Art Unit 2127

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127