17805946. GENERATING MULTI-TURN DIALOG DATASETS simplified abstract (INTERNATIONAL BUSINESS MACHINES CORPORATION)

From WikiPatents
Jump to navigation Jump to search

GENERATING MULTI-TURN DIALOG DATASETS

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor(s)

Zilu Tang of Cambridge MA (US)

Zhongshen Zeng of Shenzhen (CN)

Yara Rizk of Cambridge MA (US)

GENERATING MULTI-TURN DIALOG DATASETS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17805946 titled 'GENERATING MULTI-TURN DIALOG DATASETS

Simplified Explanation

The embodiment described in this patent application is a method for generating multi-turn dialog datasets to train dialog or conversational agents. Here are the key points:

  • The embodiment selects an agent from a set of agents.
  • It automatically identifies sentences from the training data of the selected agent that meet a specific condition in a random dialog node.
  • It determines an approach to respond to the condition in the dialog node, either by satisfying the condition or inserting a multi-turn conversational property.
  • It generates a response based on the determined approach.
  • It repeats the process for subsequent sequential child nodes, determining approaches to respond to each condition and generating corresponding responses.
  • It collects and stores data related to the selected agent and the generated responses.

Potential applications of this technology:

  • Training dialog or conversational agents: This method can be used to generate datasets for training AI agents to engage in multi-turn conversations.
  • Virtual assistants: The generated datasets can be used to improve the conversational abilities of virtual assistants, making them more effective in understanding and responding to user queries.
  • Customer service chatbots: By training chatbots with multi-turn dialog datasets, they can provide more natural and context-aware responses to customer inquiries.

Problems solved by this technology:

  • Lack of multi-turn dialog datasets: Generating high-quality datasets for training dialog agents can be time-consuming and resource-intensive. This method automates the process, making it more efficient.
  • Contextual understanding: By considering the sequential nature of dialog nodes and generating responses that maintain context, this method helps dialog agents better understand and respond to user queries.

Benefits of this technology:

  • Improved training data quality: The method ensures that the generated datasets satisfy specific conditions and include multi-turn conversational properties, leading to more effective training of dialog agents.
  • Time and resource efficiency: Automating the dataset generation process saves time and resources compared to manual creation.
  • Enhanced conversational abilities: Dialog agents trained with datasets generated using this method can provide more context-aware and natural responses, improving user experience.


Original Abstract Submitted

An embodiment for generating multi-turn dialog datasets for training of dialog or conversational agents. The embodiment may select an agent from a set of agents. The embodiment may automatically identify sentences from training data of the selected agent that satisfy a first sequential node condition of the selected random dialog node. The embodiment may automatically determine an approach for responding to the first sequential node condition of the selected random dialog node that either satisfies the first sequential dialog node condition, or inserts a multi-turn conversational property, and generate a corresponding response. The embodiment may automatically determine additional approaches for responding to each condition within subsequent sequential child nodes of the selected random dialog node that either satisfy each subsequent sequential child node condition or insert a multi-turn conversational property, and generate corresponding responses. The embodiment may collect and store data relating to the selected agent and the generated responses.