ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS

Organization Name

microsoft technology licensing, llc

Inventor(s)

Hamid Palangi of Bellevue WA (US)

Saadia Kai Gabriel of Seattle WA (US)

Thomas Hartvigsen of Cambridge MA (US)

Dipankar Ray of Redmond WA (US)

Semiha Ece Kamar Eden of Redmond WA (US)

ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240339111 titled 'ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS

The abstract discusses devices, systems, and methods for generating a confusing phrase to a language classifier.

Determining a classification score of a prompt to identify if it belongs to a first or second class.
Predicting likely next words and their probabilities based on a pre-trained language model.
Determining a second classification score for each likely next word.
Using an adversarial classifier to determine scores for each likely next word based on the prompt's classification score, the second classification score, and the probabilities of the likely next words.
Selecting the next word based on the respective scores.

Potential Applications: - Enhancing the security of language classifiers. - Improving the accuracy of language processing systems. - Preventing malicious actors from manipulating language models.

Problems Solved: - Increasing the robustness of language classifiers. - Mitigating the risk of adversarial attacks on language models.

Benefits: - Enhanced protection against adversarial attacks. - Improved performance of language processing systems. - Increased trust in the accuracy of language classifiers.

Commercial Applications: Title: Advanced Language Security System This technology can be utilized in cybersecurity systems, AI-powered chatbots, and automated content moderation platforms. It has implications for industries such as finance, healthcare, and e-commerce.

Questions about Language Security Systems: 1. How does this technology impact the field of natural language processing? This technology significantly enhances the security and reliability of language processing systems by preventing adversarial attacks and improving classification accuracy.

2. What are the potential risks associated with using language classifiers in sensitive applications? Sensitive applications relying on language classifiers may be vulnerable to adversarial attacks, leading to misinformation or security breaches.

Original Abstract Submitted

generally discussed herein are devices, systems, and methods for generating a phrase that is confusing to a language classifier. a method can include determining, by the lc, a first classification score (cs) of a prompt indicating whether the prompt is a first class or a second class, predicting, based on the prompt and by a pre-trained language model (plm), likely next words and a corresponding probability for each of the likely next words, determining, by the lc, a second cs for each of the likely next words, determining, by an adversarial classifier, respective scores for each of the likely next words, the respective scores determined based on the first cs of the prompt, the second cs of the likely next words, and the probabilities of the likely next words, and selecting, by an adversarial classifier, a next word of the likely next words based on the respective scores.

Microsoft technology licensing, llc (20240339111). ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS simplified abstract

Contents

ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS

Organization Name

Inventor(s)

ADVERSARIAL LANGUAGE IMITATION WITH CONSTRAINED EXEMPLARS - A simplified explanation of the abstract

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools