18086921. REALISTIC SAFETY VERIFICATION FOR DEEP REINFORCEMENT LEARNING simplified abstract (International Business Machines Corporation)

From WikiPatents
Jump to navigation Jump to search

REALISTIC SAFETY VERIFICATION FOR DEEP REINFORCEMENT LEARNING

Organization Name

International Business Machines Corporation

Inventor(s)

Kevin Eykholt of White Plains NY (US)

Wenbo Guo of State College PA (US)

Taesung Lee of Ridgefield CT (US)

Jiyong Jang of Chappaqua NY (US)

REALISTIC SAFETY VERIFICATION FOR DEEP REINFORCEMENT LEARNING - A simplified explanation of the abstract

This abstract first appeared for US patent application 18086921 titled 'REALISTIC SAFETY VERIFICATION FOR DEEP REINFORCEMENT LEARNING

Abstract: Safety verification for reinforcement learning can include receiving a policy generated by deep reinforced learning, where the policy is used in acting in an environment having a set of states. Responsive to determining that the policy is a non-deterministic policy, the non-deterministic policy can be decomposed into a set of deterministic policies. Responsive to determining that a state-transition function associated with the set of states is unknown, the state-transition function can be approximated at least by training a deep neural network and transforming the deep neural network into a polynomial. Using a constraint solver the policy with the state-transition function can be verified. Runtime shielding can be performed.

    • Key Features and Innovation:**

- Safety verification for reinforcement learning - Decomposition of non-deterministic policies into deterministic policies - Approximation of unknown state-transition functions using deep neural networks - Transformation of deep neural networks into polynomials - Verification of policies with state-transition functions using constraint solvers - Implementation of runtime shielding

    • Potential Applications:**

- Autonomous vehicles - Robotics - Gaming industry - Industrial automation - Healthcare systems

    • Problems Solved:**

- Ensuring safety in reinforcement learning systems - Handling non-deterministic policies effectively - Addressing unknown state-transition functions - Providing runtime shielding for enhanced safety

    • Benefits:**

- Improved safety in AI systems - Enhanced decision-making capabilities - Increased reliability of reinforcement learning algorithms - Potential for real-time adaptation to changing environments

    • Commercial Applications:**

The technology can be applied in various industries such as autonomous vehicles, robotics, gaming, industrial automation, and healthcare systems to enhance safety and decision-making processes.

    • Questions about Safety Verification for Reinforcement Learning:**

1. How does the technology of approximating unknown state-transition functions using deep neural networks improve safety in reinforcement learning systems? 2. What are the potential implications of implementing runtime shielding in AI applications?

    • Frequently Updated Research:**

Researchers are continually exploring new methods to improve safety verification techniques in reinforcement learning systems, including advancements in deep learning algorithms and constraint solving approaches. Stay updated on recent developments in this field for the latest insights and innovations.


Original Abstract Submitted

Safety verification for reinforcement learning can include receiving a policy generated by deep reinforced learning, where the policy is used in acting in an environment having a set of states. Responsive to determining that the policy is a non-deterministic policy, the non-deterministic policy can be decomposed into a set of deterministic policies. Responsive to determining that a state-transition function associated with the set of states is unknown, the state-transition function can be approximated at least by training a deep neural network and transforming the deep neural network into a polynomial. Using a constraint solver the policy with the state-transition function can be verified. Runtime shielding can be performed.