IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING

Organization Name

International Business Machines Corporation

Inventor(s)

Jin Shi of Ningbo (CN)

Chih-Yuan Lin of Xindian Dist. (TW)

Shu-Chih Chen of Banqiao Dist. (TW)

Pei-Yi Lin of New Taipei City 234 (TW)

Chao Yuan Huang of Taipei (TW)

IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING - A simplified explanation of the abstract

This abstract first appeared for US patent application 17897123 titled 'IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING

Simplified Explanation

The abstract describes a method for identifying hard-coded strings in source code by parsing source code and localization resource files, assigning confidence scores to determine translatability, transforming strings into equivalence words, preparing training data, training a parts-of-speech tagging model, and tagging potential hard-coded strings at runtime.

Method for identifying hard-coded strings in source code
Parses source code and localization resource files
Assigns confidence scores for translatability
Transforms strings into equivalence words
Prepares training data by tagging strings as translatable or non-translatable
Trains a parts-of-speech tagging model
Tags potential hard-coded strings at runtime

Potential Applications

Software development
Localization of software
Quality assurance in software development

Problems Solved

Identifying hard-coded strings in source code
Improving efficiency in localization efforts
Ensuring accurate translation of software

Benefits

Streamlining localization processes
Enhancing accuracy in translation efforts
Improving overall quality of software products

Original Abstract Submitted

A method for identifying hard-coded strings in source code is disclosed. In one embodiment, such a method parses source code and associated localization resource files to identify hard-coded strings and their associated context. The method provides a confidence score for each hard-coded string that indicates whether the hard-coded string is translatable or non-translatable. Based on the confidence score for each hard-coded string, the method transforms each hard-coded string into a single equivalence word. The method then prepares training data by tagging the hard-coded strings in the source code and associated localization resource files as one of translatable and non-translatable. The method then trains a parts-of-speech (POS) tagging model using the training data. At runtime, the method fetches potential hard-coded strings and tags each hard-coded string as one of translatable and non-translatable using the POS tagging model. A corresponding system and computer program product are also disclosed.

17897123. IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING simplified abstract (International Business Machines Corporation)

Contents

IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING

Organization Name

Inventor(s)

IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING - A simplified explanation of the abstract

Simplified Explanation

Potential Applications

Problems Solved

Benefits

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools