18660104. PHISHING URL DETECTION USING TRANSFORMERS simplified abstract (Microsoft Technology Licensing, LLC)

From WikiPatents
Jump to navigation Jump to search

PHISHING URL DETECTION USING TRANSFORMERS

Organization Name

Microsoft Technology Licensing, LLC

Inventor(s)

Jack Wilson Stokes Iii of North Bend WA (US)

Pranav Ravindra Maneriker of Columbus OH (US)

Arunkumar Gururajan of Sammamish WA (US)

Diana Anca Carutasu of Bellevue WA (US)

Edir Vinicio Garcia Lazo of Seattle WA (US)

PHISHING URL DETECTION USING TRANSFORMERS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18660104 titled 'PHISHING URL DETECTION USING TRANSFORMERS

The technology described in this patent application utilizes transformers to identify phishing URLs by tokenizing useful features from the subject URL, such as text and associated data like certificate information, referrer URLs, and IP addresses. These features are encoded using Byte Pair Encoding and processed through a transformer to generate a token embedding, which is then input to a classifier to determine if the URL is a phishing URL. The technology also includes methods for generating additional training data by permuting token order, simulating homoglyph attacks, and simulating compound word attacks.

  • Tokenizes useful features from URLs
  • Uses transformers to process token encoding
  • Generates token embeddings for classification
  • Includes methods for generating additional training data
  • Utilizes Byte Pair Encoding for feature encoding

Potential Applications: - Cybersecurity and threat detection systems - Phishing prevention tools for individuals and organizations - Enhancing URL filtering and blocking mechanisms

Problems Solved: - Identification of phishing URLs - Improving accuracy of phishing detection - Generating diverse training data for machine learning models

Benefits: - Enhanced security against phishing attacks - Improved accuracy in identifying malicious URLs - Scalable and adaptable technology for evolving threats

Commercial Applications: Title: Advanced Phishing URL Detection Technology This technology can be used in cybersecurity software products for businesses, financial institutions, and government agencies to enhance their defenses against phishing attacks. It can also be integrated into web browsers and email clients to provide real-time protection for individual users.

Questions about Phishing URL Detection: 1. How does this technology compare to traditional methods of detecting phishing URLs? This technology improves upon traditional methods by utilizing transformers and token embeddings for more accurate and efficient phishing detection.

2. Can this technology adapt to new types of phishing attacks? Yes, the technology's ability to generate diverse training data allows it to adapt to new and evolving phishing tactics.


Original Abstract Submitted

The technology described herein can identify phishing URLs using transformers. The technology tokenizes useful features from the subject URL. The useful features can include the text of the URL and other data associated with the URL, such as certificate data for the subject URL, a referrer URL, an IP address, etc. The technology may build a joint Byte Pair Encoding for the features. The token encoding may be processed through a transformer, resulting in a transformer output. The transformer output, which may be described as a token embedding, may be input to a classifier to determine whether the URL is a phishing URL. Additional or improved URL training data may be generated by permuting token order, by simulating a homoglyph attack, and by simulating a compound word attack.