file2xliff4j
Class XliffImporter

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by file2xliff4j.XliffImporter
All Implemented Interfaces:
Converter, org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler, org.xml.sax.ext.LexicalHandler

public class XliffImporter
extends org.xml.sax.helpers.DefaultHandler
implements Converter, org.xml.sax.ext.LexicalHandler

The XliffImporter is used to normalize "outside" XLIFF to a smaller subset of XLIFF. This importer assumes that the "owner" of the XLIFF file used as input is responsible for its associated skeleton and other files. This converter creates skeleton and format files, but only for the purpose of exporting the reduced XLIFF file to the original format. (Upon export, the XLIFF file will have additional translation unit targets, and existing targets might be modified.)

This importer replaces bpt, ept, sub, it and ph elements (which "mask off codes left inline") with x, bx and ex elements (which "remove codes"). The codes removed from the bpt, ept, sub, it and ph elements are placed in a format file. At export time, the information from the format file is used to restore the original bpt, ept, sub, it and ph elements.

This importer also replaces opening g tags with bx elements and closing g tags with ex elements.

Author:
Weldon Whipple <weldon@whipple.org>

Field Summary
 java.util.ArrayList<file2xliff4j.TuListItem> subTuList
           
 java.util.ArrayList<file2xliff4j.TuListItem> tuList
           
 
Fields inherited from interface file2xliff4j.Converter
BLKSIZE, formatSuffix, skeletonSuffix, startXliff, stylesTSkeletonSuffix, tSkeletonSuffix, xliffSuffix, xmlDeclaration
 
Constructor Summary
XliffImporter()
          Constructor for the XLIFF importer.
 
Method Summary
 void characters(char[] ch, int start, int length)
          Called whenever characters are encountered
 void comment(char[] text, int start, int length)
          Method defined by the LexicalHandler interface that we don't care about.
 ConversionStatus convert(ConversionMode mode, java.util.Locale language, java.lang.String phaseName, int maxPhase, java.nio.charset.Charset nativeEncoding, FileType nativeFileType, java.lang.String inputXliffFileName, java.lang.String baseDir, Notifier notifier)
          Deprecated. 
 ConversionStatus convert(ConversionMode mode, java.util.Locale language, java.lang.String phaseName, int maxPhase, java.nio.charset.Charset nativeEncoding, FileType nativeFileType, java.lang.String inputXliffFileName, java.lang.String baseDir, Notifier notifier, SegmentBoundary boundary, java.io.StringWriter generatedFileName)
          Convert an XLIFF file to a reduced subset of XLIFF for storage within a repository.
 ConversionStatus convert(ConversionMode mode, java.util.Locale language, java.lang.String phaseName, int maxPhase, java.nio.charset.Charset nativeEncoding, FileType nativeFileType, java.lang.String inputXliffFileName, java.lang.String baseDir, Notifier notifier, SegmentBoundary boundary, java.io.StringWriter generatedFileName, java.util.Set<f2xutils.XMLTuXPath> skipList)
          Convert an XLIFF file to a reduced subset of XLIFF for storage within a repository.
 void endCDATA()
          Method defined by the LexicalHandler interface that we don't care about.
 void endDocument()
          When the end-of-document is encountered, write what follows the final translation unit.
 void endDTD()
          Method defined by the LexicalHandler interface that we don't care about.
 void endElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qualifiedName)
          Method called whenever an end element is encountered
 void endEntity(java.lang.String name)
          Method that the SAX parser calls whenever it reaches the end of an entity (e.g.
 java.lang.Object getConversionProperty(java.lang.String property)
          Return an object representing a format-specific (and converter-specific) property.
 FileType getFileType()
          Return the file type that this converter handles.
 void setConversionProperty(java.lang.String property, java.lang.Object value)
          Set a format-specific property that might affect the way that the conversion occurs.
 void startCDATA()
          Method defined by the LexicalHandler interface that we don't care about.
 void startDocument()
          Method called by the SAX parser at the beginning of document parsing.
 void startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
          Method defined by the LexicalHandler interface that we don't care about.
 void startElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qualifiedName, org.xml.sax.Attributes atts)
          Method called whenever a start element is encountered
 void startEntity(java.lang.String name)
          Method that the SAX parser calls whenever it encounters an entity (e.g.
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tuList

public java.util.ArrayList<file2xliff4j.TuListItem> tuList

subTuList

public java.util.ArrayList<file2xliff4j.TuListItem> subTuList
Constructor Detail

XliffImporter

public XliffImporter()
Constructor for the XLIFF importer.

Method Detail

convert

public ConversionStatus convert(ConversionMode mode,
                                java.util.Locale language,
                                java.lang.String phaseName,
                                int maxPhase,
                                java.nio.charset.Charset nativeEncoding,
                                FileType nativeFileType,
                                java.lang.String inputXliffFileName,
                                java.lang.String baseDir,
                                Notifier notifier,
                                SegmentBoundary boundary,
                                java.io.StringWriter generatedFileName)
                         throws ConversionException
Convert an XLIFF file to a reduced subset of XLIFF for storage within a repository. Create a second XLIFF file (normalized to meet our internal constraints), as well as skeleton and format files. (The skeleton and format files are used on export to "un-normalize" our XLIFF and return it to the format used by the original XLIFF.)

Specified by:
convert in interface Converter
Parameters:
mode - The mode of conversion (to or from XLIFF). The value must be TO_XLIFF.
language - The primary source language of the XLIFF file to be imported. (This is the language of the <source> elements in the input XLIFF file.)
phaseName - The name of the phase to convert. (This parameter is currently ignored by this importer.)
maxPhase - The maximum phase number. This value is currently ignored.
nativeEncoding - The encoding of the input XLIFF file--probably UTF-8, but the XLIFF specification allows UTF-16 as well). If null, the encoding specified in the input XLIFF file's XML header is used.
nativeFileType - The type of the input file. Must be XLIFF.
inputXliffFileName - The name of the input XLIFF file.
baseDir - The directory that contains the input XLIFF file--from which we will read the input file. This is also the directory in which the output xliff, skeleton and format files will be written. The output files will be named as follows:
  • <original_file_name>.xliff. (Note: If the input file already ends in ".xliff", the output file will end in ".xliff.xliff"
  • <original_file_name>.skeleton
  • <original_file_name>.format
notifier - Instance of a class that implements the Notifier interface (to send notifications in case of conversion error).
boundary - The boundary on which to segment translation units (e.g., on paragraph or sentence boundaries)
generatedFileName - If non-null, the converter will write the name of the file (without parent directories) to which the generated XLIFF file was written.
Returns:
Indicator of the status of the conversion.
Throws:
ConversionException - If a conversion exception is encountered.

convert

public ConversionStatus convert(ConversionMode mode,
                                java.util.Locale language,
                                java.lang.String phaseName,
                                int maxPhase,
                                java.nio.charset.Charset nativeEncoding,
                                FileType nativeFileType,
                                java.lang.String inputXliffFileName,
                                java.lang.String baseDir,
                                Notifier notifier,
                                SegmentBoundary boundary,
                                java.io.StringWriter generatedFileName,
                                java.util.Set<f2xutils.XMLTuXPath> skipList)
                         throws ConversionException
Convert an XLIFF file to a reduced subset of XLIFF for storage within a repository. Create a second XLIFF file (normalized to meet our internal constraints), as well as skeleton and format files. (The skeleton and format files are used on export to "un-normalize" our XLIFF and return it to the format used by the original XLIFF.)

Specified by:
convert in interface Converter
Parameters:
mode - The mode of conversion (to or from XLIFF). The value must be TO_XLIFF.
language - The primary source language of the XLIFF file to be imported. (This is the language of the <source> elements in the input XLIFF file.)
phaseName - The name of the phase to convert. (This parameter is currently ignored by this importer.)
maxPhase - The maximum phase number. This value is currently ignored.
nativeEncoding - The encoding of the input XLIFF file--probably UTF-8, but the XLIFF specification allows UTF-16 as well). If null, the encoding specified in the input XLIFF file's XML header is used.
nativeFileType - The type of the input file. Must be XLIFF.
inputXliffFileName - The name of the input XLIFF file.
baseDir - The directory that contains the input XLIFF file--from which we will read the input file. This is also the directory in which the output xliff, skeleton and format files will be written. The output files will be named as follows:
  • <original_file_name>.xliff. (Note: If the input file already ends in ".xliff", the output file will end in ".xliff.xliff"
  • <original_file_name>.skeleton
  • <original_file_name>.format
notifier - Instance of a class that implements the Notifier interface (to send notifications in case of conversion error).
boundary - The boundary on which to segment translation units (e.g., on paragraph or sentence boundaries)
generatedFileName - If non-null, the converter will write the name of the file (without parent directories) to which the generated XLIFF file was written.
skipList - (Not used by this converter.)
Returns:
Indicator of the status of the conversion.
Throws:
ConversionException - If a conversion exception is encountered.

convert

@Deprecated
public ConversionStatus convert(ConversionMode mode,
                                           java.util.Locale language,
                                           java.lang.String phaseName,
                                           int maxPhase,
                                           java.nio.charset.Charset nativeEncoding,
                                           FileType nativeFileType,
                                           java.lang.String inputXliffFileName,
                                           java.lang.String baseDir,
                                           Notifier notifier)
                         throws ConversionException
Deprecated. 

Convert an XLIFF file to a reduced subset of XLIFF for storage within a repository. Create a second XLIFF file (normalized to meet our internal constraints), as well as skeleton and format files. (The skeleton and format files are used on export to "un-normalize" our XLIFF and return it to the format used by the original XLIFF.)

Specified by:
convert in interface Converter
Parameters:
mode - The mode of conversion (to or from XLIFF). The value must be TO_XLIFF.
language - The primary source language of the XLIFF file to be imported. (This is the language of the <source> elements in the input XLIFF file.)
phaseName - The name of the phase to convert. (This parameter is currently ignored by this importer.)
maxPhase - The maximum phase number. This value is currently ignored.
nativeEncoding - The encoding of the input XLIFF file--probably UTF-8, but the XLIFF specification allows UTF-16 as well). If null, the encoding specified in the input XLIFF file's XML header is used.
nativeFileType - The type of the input file. Must be XLIFF.
inputXliffFileName - The name of the input XLIFF file.
baseDir - The directory that contains the input XLIFF file--from which we will read the input file. This is also the directory in which the output xliff, skeleton and format files will be written. The output files will be named as follows:
  • <original_file_name>.xliff. (Note: If the input file already ends in ".xliff", the output file will end in ".xliff.xliff"
  • <original_file_name>.skeleton
  • <original_file_name>.format
notifier - Instance of a class that implements the Notifier interface (to send notifications in case of conversion error).
Returns:
Indicator of the status of the conversion.
Throws:
ConversionException - If a conversion exception is encountered.

startDocument

public void startDocument()
                   throws org.xml.sax.SAXException
Method called by the SAX parser at the beginning of document parsing.

Specified by:
startDocument in interface org.xml.sax.ContentHandler
Overrides:
startDocument in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException - I if any problems are found.

startElement

public void startElement(java.lang.String namespaceURI,
                         java.lang.String localName,
                         java.lang.String qualifiedName,
                         org.xml.sax.Attributes atts)
                  throws org.xml.sax.SAXException
Method called whenever a start element is encountered

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Parameters:
namespaceURI - The URI of the namespace
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qualifiedName - The qualified name (with prefix), or the empty string if qualified names are not available
atts - The specified or defaulted attributes.
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String namespaceURI,
                       java.lang.String localName,
                       java.lang.String qualifiedName)
                throws org.xml.sax.SAXException
Method called whenever an end element is encountered

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler
Parameters:
namespaceURI - The URI of the namespace
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qualifiedName - The qualified name (with prefix), or the empty string if qualified names are not available
Throws:
org.xml.sax.SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Called whenever characters are encountered

Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler
Parameters:
ch - Array containing characters encountered
start - Position in array of first applicable character
length - How many characters are of interest?
Throws:
org.xml.sax.SAXException

endDocument

public void endDocument()
                 throws org.xml.sax.SAXException
When the end-of-document is encountered, write what follows the final translation unit.

Specified by:
endDocument in interface org.xml.sax.ContentHandler
Overrides:
endDocument in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

startEntity

public void startEntity(java.lang.String name)
                 throws org.xml.sax.SAXException
Method that the SAX parser calls whenever it encounters an entity (e.g. gt, lt, apos, ...). We implement this method (an implementation of the method by the same name in the LexicalHandler interface) in order to preserve the XML entities in the original XLIFF as we import it into "our" XLIFF.

The inEntity instance variable is checked by the characters method of the ContentHandler (DefaultHandler) extension (above). The SAX parser calls the characters method whenever it expands an entity, passing it *only* the expansion of the entity it just encountered. Since we want to write out the unexpanded version of the entity, this (startEntity) method writes out the entity, and characters() just returns without outputting the expansion of the entity (if inEntity is true).

Note: the endEntity method (below) sets the inEntity variable to false.

Specified by:
startEntity in interface org.xml.sax.ext.LexicalHandler
Parameters:
name - The name of the entity (e.g. "lt", "gt", etc.--without a leading ampersand or trailing semicolon.)
Throws:
org.xml.sax.SAXException

endEntity

public void endEntity(java.lang.String name)
               throws org.xml.sax.SAXException
Method that the SAX parser calls whenever it reaches the end of an entity (e.g. gt, lt, apos, ...). See comments for startEntity (above) for more information on how this works.

Specified by:
endEntity in interface org.xml.sax.ext.LexicalHandler
Parameters:
name - The name of the entity (e.g. "lt", "gt", etc.--without a leading ampersand or trailing semicolon.)
Throws:
org.xml.sax.SAXException

startDTD

public void startDTD(java.lang.String name,
                     java.lang.String publicId,
                     java.lang.String systemId)
              throws org.xml.sax.SAXException
Method defined by the LexicalHandler interface that we don't care about.

Specified by:
startDTD in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

endDTD

public void endDTD()
            throws org.xml.sax.SAXException
Method defined by the LexicalHandler interface that we don't care about.

Specified by:
endDTD in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

startCDATA

public void startCDATA()
                throws org.xml.sax.SAXException
Method defined by the LexicalHandler interface that we don't care about.

Specified by:
startCDATA in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

endCDATA

public void endCDATA()
              throws org.xml.sax.SAXException
Method defined by the LexicalHandler interface that we don't care about.

Specified by:
endCDATA in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

comment

public void comment(char[] text,
                    int start,
                    int length)
             throws org.xml.sax.SAXException
Method defined by the LexicalHandler interface that we don't care about.

Specified by:
comment in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException

getConversionProperty

public java.lang.Object getConversionProperty(java.lang.String property)
Return an object representing a format-specific (and converter-specific) property.

Specified by:
getConversionProperty in interface Converter
Parameters:
property - The name of the property to return.
Returns:
An Object that represents the property's value.

getFileType

public FileType getFileType()
Return the file type that this converter handles. (For importers, this means the file type that it imports to XLIFF; for exporters, it is the file type that ie exports to (from XLIFF).

Specified by:
getFileType in interface Converter
Returns:
the XLIFF file type. (Note: This is an anomaly ...)

setConversionProperty

public void setConversionProperty(java.lang.String property,
                                  java.lang.Object value)
                           throws ConversionException
Set a format-specific property that might affect the way that the conversion occurs.

Note: This converter needs no format-specific properties. If any are passed, they will be silently ignored.

Specified by:
setConversionProperty in interface Converter
Parameters:
property - The name of the property
value - The value of the property
Throws:
ConversionException - If the property isn't recognized (and if it matters).