17897123. IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING simplified abstract (International Business Machines Corporation)

From WikiPatents
Jump to navigation Jump to search

IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING

Organization Name

International Business Machines Corporation

Inventor(s)

Jin Shi of Ningbo (CN)

Chih-Yuan Lin of Xindian Dist. (TW)

Shu-Chih Chen of Banqiao Dist. (TW)

Pei-Yi Lin of New Taipei City 234 (TW)

Chao Yuan Huang of Taipei (TW)

IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING - A simplified explanation of the abstract

This abstract first appeared for US patent application 17897123 titled 'IDENTIFYING THE TRANLATABILITY OF HARD-CODED STRINGS IN SOURCE CODE VIA POS TAGGING

Simplified Explanation

The abstract describes a method for identifying hard-coded strings in source code by parsing source code and localization resource files, assigning confidence scores to determine translatability, transforming strings into equivalence words, preparing training data, training a parts-of-speech tagging model, and tagging potential hard-coded strings at runtime.

  • Method for identifying hard-coded strings in source code
  • Parses source code and localization resource files
  • Assigns confidence scores for translatability
  • Transforms strings into equivalence words
  • Prepares training data by tagging strings as translatable or non-translatable
  • Trains a parts-of-speech tagging model
  • Tags potential hard-coded strings at runtime

Potential Applications

  • Software development
  • Localization of software
  • Quality assurance in software development

Problems Solved

  • Identifying hard-coded strings in source code
  • Improving efficiency in localization efforts
  • Ensuring accurate translation of software

Benefits

  • Streamlining localization processes
  • Enhancing accuracy in translation efforts
  • Improving overall quality of software products


Original Abstract Submitted

A method for identifying hard-coded strings in source code is disclosed. In one embodiment, such a method parses source code and associated localization resource files to identify hard-coded strings and their associated context. The method provides a confidence score for each hard-coded string that indicates whether the hard-coded string is translatable or non-translatable. Based on the confidence score for each hard-coded string, the method transforms each hard-coded string into a single equivalence word. The method then prepares training data by tagging the hard-coded strings in the source code and associated localization resource files as one of translatable and non-translatable. The method then trains a parts-of-speech (POS) tagging model using the training data. At runtime, the method fetches potential hard-coded strings and tags each hard-coded string as one of translatable and non-translatable using the POS tagging model. A corresponding system and computer program product are also disclosed.