17897123. IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING simplified abstract (International Business Machines Corporation)
IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING
Organization Name
International Business Machines Corporation
Inventor(s)
Chih-Yuan Lin of Xindian Dist. (TW)
Shu-Chih Chen of Banqiao Dist. (TW)
Pei-Yi Lin of New Taipei City 234 (TW)
Chao Yuan Huang of Taipei (TW)
IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING - A simplified explanation of the abstract
This abstract first appeared for US patent application 17897123 titled 'IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING
Simplified Explanation
The abstract describes a method for identifying hard-coded strings in source code by parsing source code and localization resource files, assigning confidence scores to determine translatability, transforming strings into equivalence words, preparing training data, training a parts-of-speech tagging model, and tagging potential hard-coded strings at runtime.
- Method for identifying hard-coded strings in source code
- Parses source code and localization resource files
- Assigns confidence scores for translatability
- Transforms strings into equivalence words
- Prepares training data by tagging strings as translatable or non-translatable
- Trains a parts-of-speech tagging model
- Tags potential hard-coded strings at runtime
Potential Applications
- Software development
- Localization of software
- Quality assurance in software development
Problems Solved
- Identifying hard-coded strings in source code
- Improving efficiency in localization efforts
- Ensuring accurate translation of software
Benefits
- Streamlining localization processes
- Enhancing accuracy in translation efforts
- Improving overall quality of software products
Original Abstract Submitted
A method for identifying hard-coded strings in source code is disclosed. In one embodiment, such a method parses source code and associated localization resource files to identify hard-coded strings and their associated context. The method provides a confidence score for each hard-coded string that indicates whether the hard-coded string is translatable or non-translatable. Based on the confidence score for each hard-coded string, the method transforms each hard-coded string into a single equivalence word. The method then prepares training data by tagging the hard-coded strings in the source code and associated localization resource files as one of translatable and non-translatable. The method then trains a parts-of-speech (POS) tagging model using the training data. At runtime, the method fetches potential hard-coded strings and tags each hard-coded string as one of translatable and non-translatable using the POS tagging model. A corresponding system and computer program product are also disclosed.