file2xliff4j
Class XMLImporter

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by file2xliff4j.XMLImporter
All Implemented Interfaces:
Converter, org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler, org.xml.sax.ext.LexicalHandler

public class XMLImporter
extends org.xml.sax.helpers.DefaultHandler
implements Converter, org.xml.sax.ext.LexicalHandler

The generic XML importer will convert most XML to XLIFF.

Author:
Weldon Whipple <weldon@whipple.org>

Field Summary
 
Fields inherited from interface file2xliff4j.Converter
BLKSIZE, formatSuffix, skeletonSuffix, startXliff, stylesTSkeletonSuffix, tSkeletonSuffix, xliffSuffix, xmlDeclaration
 
Constructor Summary
XMLImporter()
          Constructor for the XML importer.
 
Method Summary
 void characters(char[] ch, int start, int length)
          Called whenever characters are encountered
 void comment(char[] text, int start, int length)
          Method defined by the LexicalHandler interface that we *probably* don't care about.
 ConversionStatus convert(ConversionMode mode, java.util.Locale language, java.lang.String phaseName, int maxPhase, java.nio.charset.Charset nativeEncoding, FileType nativeFileType, java.lang.String inputXmlFileName, java.lang.String baseDir, Notifier notifier)
          Deprecated. 
 ConversionStatus convert(ConversionMode mode, java.util.Locale language, java.lang.String phaseName, int maxPhase, java.nio.charset.Charset nativeEncoding, FileType nativeFileType, java.lang.String inputXmlFileName, java.lang.String baseDir, Notifier notifier, SegmentBoundary boundary, java.io.StringWriter generatedFileName)
          Convert an XML file to XLIFF.
 ConversionStatus convert(ConversionMode mode, java.util.Locale language, java.lang.String phaseName, int maxPhase, java.nio.charset.Charset nativeEncoding, FileType nativeFileType, java.lang.String inputXmlFileName, java.lang.String baseDir, Notifier notifier, SegmentBoundary boundary, java.io.StringWriter generatedFileName, java.util.Set<f2xutils.XMLTuXPath> skipList)
          Convert an XML file to XLIFF.
 void endCDATA()
          Method called by the SAX parser when it encounters the end of a CDATA section.
 void endDocument()
          When the end-of-document is encountered, write what follows the final translation unit.
 void endDTD()
          Method defined by the LexicalHandler interface that we don't care about.
 void endElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qualifiedName)
          Method called whenever an end element is encountered
 void endEntity(java.lang.String name)
          Method that the SAX parser calls whenever it reaches the end of an entity (e.g.
 java.lang.Object getConversionProperty(java.lang.String property)
          Return an object representing a format-specific (and converter-specific) property.
 FileType getFileType()
          Return the file type that this converter handles.
 void setConversionProperty(java.lang.String property, java.lang.Object value)
          Set a format-specific property that might affect the way that the conversion occurs.
 void setDocumentLocator(org.xml.sax.Locator locator)
          Method called by the SAX parser before it calls startDocument.
 void startCDATA()
          Method called by the SAX parser when it encounters the start of a CDATA section.
 void startDocument()
          Method called by the SAX parser at the beginning of document parsing.
 void startDTD(java.lang.String name, java.lang.String publicId, java.lang.String systemId)
          Method defined by the LexicalHandler interface that we don't care about.
 void startElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qualifiedName, org.xml.sax.Attributes atts)
          Method called whenever a start element is encountered
 void startEntity(java.lang.String name)
          Method that the SAX parser calls whenever it encounters an entity (e.g.
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

XMLImporter

public XMLImporter()
Constructor for the XML importer.

Method Detail

convert

public ConversionStatus convert(ConversionMode mode,
                                java.util.Locale language,
                                java.lang.String phaseName,
                                int maxPhase,
                                java.nio.charset.Charset nativeEncoding,
                                FileType nativeFileType,
                                java.lang.String inputXmlFileName,
                                java.lang.String baseDir,
                                Notifier notifier,
                                SegmentBoundary boundary,
                                java.io.StringWriter generatedFileName,
                                java.util.Set<f2xutils.XMLTuXPath> skipList)
                         throws ConversionException
Convert an XML file to XLIFF. Additionally create skeleton and format files. (The skeleton and format files are used to export translated targets back to the original XML format.)

Specified by:
convert in interface Converter
Parameters:
mode - The mode of conversion (to or from XLIFF). The value must be TO_XLIFF.
language - The language of the XML file to be imported.
phaseName - The name of the phase to convert. (This parameter is currently ignored by this importer.)
maxPhase - The maximum phase number. This value is currently ignored.
nativeEncoding - The encoding of the input XML file. This value is currently ignored, allowing the SAX parser to interpret any byte order marks and encoding specified in the input file.
nativeFileType - The type of the input file. Must be XML.
inputXmlFileName - The name of the input XML file.
baseDir - The directory that contains the input XML file--from which we will read the input file. This is also the directory in which the output xliff, skeleton and format files will be written. The output files will be named as follows:
  • <original_file_name>.xliff.
  • <original_file_name>.skeleton
  • <original_file_name>.format
notifier - Instance of a class that implements the Notifier interface (to send notifications in case of conversion error).
boundary - The boundary on which to segment translation units (e.g., on paragraph or sentence boundaries)
generatedFileName - If non-null, the converter will write the name of the file (without parent directories) to which the generated XLIFF file was written.
skipList - A set of potential translatable structures to omit. This converter requires that the Set consist of XMLTuXPath objects.
Returns:
Indicator of the status of the conversion.
Throws:
ConversionException - If a conversion exception is encountered.

convert

public ConversionStatus convert(ConversionMode mode,
                                java.util.Locale language,
                                java.lang.String phaseName,
                                int maxPhase,
                                java.nio.charset.Charset nativeEncoding,
                                FileType nativeFileType,
                                java.lang.String inputXmlFileName,
                                java.lang.String baseDir,
                                Notifier notifier,
                                SegmentBoundary boundary,
                                java.io.StringWriter generatedFileName)
                         throws ConversionException
Convert an XML file to XLIFF. Additionally create skeleton and format files. (The skeleton and format files are used to export translated targets back to the original XML format.)

Specified by:
convert in interface Converter
Parameters:
mode - The mode of conversion (to or from XLIFF). The value must be TO_XLIFF.
language - The language of the XML file to be imported.
phaseName - The name of the phase to convert. (This parameter is currently ignored by this importer.)
maxPhase - The maximum phase number. This value is currently ignored.
nativeEncoding - The encoding of the input XML file. This value is currently ignored, allowing the SAX parser to interpret any byte order marks and encoding specified in the input file.
nativeFileType - The type of the input file. Must be XML.
inputXmlFileName - The name of the input XML file.
baseDir - The directory that contains the input XML file--from which we will read the input file. This is also the directory in which the output xliff, skeleton and format files will be written. The output files will be named as follows:
  • <original_file_name>.xliff.
  • <original_file_name>.skeleton
  • <original_file_name>.format
notifier - Instance of a class that implements the Notifier interface (to send notifications in case of conversion error).
boundary - The boundary on which to segment translation units (e.g., on paragraph or sentence boundaries)
generatedFileName - If non-null, the converter will write the name of the file (without parent directories) to which the generated XLIFF file was written.
Returns:
Indicator of the status of the conversion.
Throws:
ConversionException - If a conversion exception is encountered.

convert

@Deprecated
public ConversionStatus convert(ConversionMode mode,
                                           java.util.Locale language,
                                           java.lang.String phaseName,
                                           int maxPhase,
                                           java.nio.charset.Charset nativeEncoding,
                                           FileType nativeFileType,
                                           java.lang.String inputXmlFileName,
                                           java.lang.String baseDir,
                                           Notifier notifier)
                         throws ConversionException
Deprecated. 

Convert an XML file to XLIFF. Additionally create skeleton and format files. (The skeleton and format files are used to export translated targets back to the original XML format.)

Specified by:
convert in interface Converter
Parameters:
mode - The mode of conversion (to or from XLIFF). The value must be TO_XLIFF.
language - The language of the XML file to be imported.
phaseName - The name of the phase to convert. (This parameter is currently ignored by this importer.)
maxPhase - The maximum phase number. This value is currently ignored.
nativeEncoding - The encoding of the input XML file. This value is currently ignored, allowing the SAX parser to interpret any byte order marks and encoding specified in the input file.
nativeFileType - The type of the input file. Must be XML.
inputXmlFileName - The name of the input XML file.
baseDir - The directory that contains the input XML file--from which we will read the input file. This is also the directory in which the output xliff, skeleton and format files will be written. The output files will be named as follows:
  • <original_file_name>.xliff.
  • <original_file_name>.skeleton
  • <original_file_name>.format
notifier - Instance of a class that implements the Notifier interface (to send notifications in case of conversion error).
Returns:
Indicator of the status of the conversion.
Throws:
ConversionException - If a conversion exception is encountered.

setDocumentLocator

public void setDocumentLocator(org.xml.sax.Locator locator)
Method called by the SAX parser before it calls startDocument. (The locator provides access to the line and column number where a start tag occurs.)

Specified by:
setDocumentLocator in interface org.xml.sax.ContentHandler
Overrides:
setDocumentLocator in class org.xml.sax.helpers.DefaultHandler
Parameters:
locator - A reference to a document locator

startDocument

public void startDocument()
                   throws org.xml.sax.SAXException
Method called by the SAX parser at the beginning of document parsing.

Specified by:
startDocument in interface org.xml.sax.ContentHandler
Overrides:
startDocument in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

startElement

public void startElement(java.lang.String namespaceURI,
                         java.lang.String localName,
                         java.lang.String qualifiedName,
                         org.xml.sax.Attributes atts)
                  throws org.xml.sax.SAXException
Method called whenever a start element is encountered

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Parameters:
namespaceURI - The URI of the namespace
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qualifiedName - The qualified name (with prefix), or the empty string if qualified names are not available
atts - The specified or defaulted attributes.
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

endElement

public void endElement(java.lang.String namespaceURI,
                       java.lang.String localName,
                       java.lang.String qualifiedName)
                throws org.xml.sax.SAXException
Method called whenever an end element is encountered

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler
Parameters:
namespaceURI - The URI of the namespace
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qualifiedName - The qualified name (with prefix), or the empty string if qualified names are not available
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Called whenever characters are encountered

Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler
Parameters:
ch - Array containing characters encountered
start - Position in array of first applicable character
length - How many characters are of interest?
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

endDocument

public void endDocument()
                 throws org.xml.sax.SAXException
When the end-of-document is encountered, write what follows the final translation unit.

Specified by:
endDocument in interface org.xml.sax.ContentHandler
Overrides:
endDocument in class org.xml.sax.helpers.DefaultHandler
Throws:
java.lang.IOException - If unable to flush the output streams.
org.xml.sax.SAXException

startEntity

public void startEntity(java.lang.String name)
                 throws org.xml.sax.SAXException
Method that the SAX parser calls whenever it encounters an entity (e.g. gt, lt, apos, ...). We implement this method (an implementation of the method by the same name in the LexicalHandler interface) in order to preserve the XML entities in the original XLIFF as we import it into "our" XLIFF.

The inEntity instance variable is checked by the characters method of the ContentHandler (DefaultHandler) extension (above). The SAX parser calls the characters method whenever it expands an entity, passing it *only* the expansion of the entity it just encountered. Since we want to write out the unexpanded version of the entity, this (startEntity) method writes out the entity, and characters() just returns without outputting the expansion of the entity (if inEntity is true).

Note: the endEntity method (below) sets the inEntity variable to false.

Specified by:
startEntity in interface org.xml.sax.ext.LexicalHandler
Parameters:
name - The name of the entity (e.g. "lt", "gt", etc.--without a leading ampersand or trailing semicolon.)
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

endEntity

public void endEntity(java.lang.String name)
               throws org.xml.sax.SAXException
Method that the SAX parser calls whenever it reaches the end of an entity (e.g. gt, lt, apos, ...). See comments for startEntity (above) for more information on how this works.

Specified by:
endEntity in interface org.xml.sax.ext.LexicalHandler
Parameters:
name - The name of the entity (e.g. "lt", "gt", etc.--without a leading ampersand or trailing semicolon.)
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

startDTD

public void startDTD(java.lang.String name,
                     java.lang.String publicId,
                     java.lang.String systemId)
              throws org.xml.sax.SAXException
Method defined by the LexicalHandler interface that we don't care about.

Specified by:
startDTD in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

endDTD

public void endDTD()
            throws org.xml.sax.SAXException
Method defined by the LexicalHandler interface that we don't care about.

Specified by:
endDTD in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

startCDATA

public void startCDATA()
                throws org.xml.sax.SAXException
Method called by the SAX parser when it encounters the start of a CDATA section. We will (for now, at least) treat CDATA the same way as comments (ignore them, skipping past them)

Specified by:
startCDATA in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

endCDATA

public void endCDATA()
              throws org.xml.sax.SAXException
Method called by the SAX parser when it encounters the end of a CDATA section. We will (for now, at least) treat CDATA the same way as comments (ignore them, skipping past them).

Specified by:
endCDATA in interface org.xml.sax.ext.LexicalHandler
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

comment

public void comment(char[] text,
                    int start,
                    int length)
             throws org.xml.sax.SAXException
Method defined by the LexicalHandler interface that we *probably* don't care about. If it is called, however (and if we are inside a CDATA section within a text node [!!!???]) then save the comment with the rest of the CDATA in the format file to be mapped to by an x tag ...

Specified by:
comment in interface org.xml.sax.ext.LexicalHandler
Parameters:
text - Array containing characters encountered
start - Position in array of first applicable character
length - How many characters are of interest?
Throws:
org.xml.sax.SAXException - If the SAX parser needs to report errors.

getConversionProperty

public java.lang.Object getConversionProperty(java.lang.String property)
Return an object representing a format-specific (and converter-specific) property.

Specified by:
getConversionProperty in interface Converter
Parameters:
property - The name of the property to return.
Returns:
An Object that represents the property's value.

getFileType

public FileType getFileType()
Return the file type that this converter handles. (For importers, this means the file type that it imports to XLIFF; for exporters, it is the file type that ie exports to (from XLIFF).

Specified by:
getFileType in interface Converter
Returns:
the XML file type.

setConversionProperty

public void setConversionProperty(java.lang.String property,
                                  java.lang.Object value)
                           throws ConversionException
Set a format-specific property that might affect the way that the conversion occurs.

Note: This converter needs no format-specific properties. If any are passed, they will be silently ignored.

Specified by:
setConversionProperty in interface Converter
Parameters:
property - The name of the property
value - The value of the property
Throws:
ConversionException - If the property isn't recognized (and if it matters).