file2xliff4j
Class PdfImporter

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by file2xliff4j.PdfImporter
All Implemented Interfaces:
Converter, org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class PdfImporter
extends org.xml.sax.helpers.DefaultHandler
implements Converter

The PdfImporter imports Portable Document Format to (what else?) XLIFF.

Author:
Weldon Whipple <weldon@lingotek.com>

Field Summary
 
Fields inherited from interface file2xliff4j.Converter
BLKSIZE, formatSuffix, skeletonSuffix, startXliff, stylesTSkeletonSuffix, tSkeletonSuffix, xliffSuffix, xmlDeclaration
 
Constructor Summary
PdfImporter()
          Create a PDF Importer
 
Method Summary
 boolean addTuDelimiter(java.lang.String tag)
          Add an HTML tag to the set of tags that signal the start of a in the XLIFF generated from HTML.
 ConversionStatus convert(ConversionMode mode, java.util.Locale language, java.lang.String phaseName, int maxPhase, java.nio.charset.Charset nativeEncoding, FileType nativeFileType, java.lang.String nativeFileName, java.lang.String baseDir, Notifier notifier)
          Deprecated. 
 ConversionStatus convert(ConversionMode mode, java.util.Locale language, java.lang.String phaseName, int maxPhase, java.nio.charset.Charset nativeEncoding, FileType nativeFileType, java.lang.String nativeFileName, java.lang.String baseDir, Notifier notifier, SegmentBoundary boundary, java.io.StringWriter generatedFileName)
          Convert a PDF file to XLIFF, creating (in some future release?) xliff, skeleton and format files as output.
 ConversionStatus convert(ConversionMode mode, java.util.Locale language, java.lang.String phaseName, int maxPhase, java.nio.charset.Charset nativeEncoding, FileType nativeFileType, java.lang.String nativeFileName, java.lang.String baseDir, Notifier notifier, SegmentBoundary boundary, java.io.StringWriter generatedFileName, java.util.Set<f2xutils.XMLTuXPath> skipList)
          Convert a PDF file to XLIFF, creating (in some future release?) xliff, skeleton and format files as output.
 java.lang.Object getConversionProperty(java.lang.String property)
          Return an object representing a format-specific (and converter-specific) property.
 FileType getFileType()
          Return the file type that this converter handles.
 java.lang.String[] getTuDelimiterList()
          Remove an HTML tag from the set of tags that signal the start of a in XLIFF generated from the HTML.
 void setConversionProperty(java.lang.String property, java.lang.Object value)
          Set a format-specific property that might affect the way that the conversion occurs.
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
characters, endDocument, endElement, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startElement, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PdfImporter

public PdfImporter()
Create a PDF Importer

Method Detail

addTuDelimiter

public boolean addTuDelimiter(java.lang.String tag)
Add an HTML tag to the set of tags that signal the start of a in the XLIFF generated from HTML.

Parameters:
tag - HTML tag to add to the set (Examples: "p", "h1", "dl", ...)
Returns:
true=the tag was added; false=not added (already present).

getConversionProperty

public java.lang.Object getConversionProperty(java.lang.String property)
Return an object representing a format-specific (and converter-specific) property.

Specified by:
getConversionProperty in interface Converter
Parameters:
property - The name of the property to return.
Returns:
An Object that represents the property's value.

getFileType

public FileType getFileType()
Return the file type that this converter handles. (For importers, this means the file type that it imports to XLIFF; for exporters, it is the file type that ie exports to (from XLIFF).

Specified by:
getFileType in interface Converter
Returns:
the PDF file type.

getTuDelimiterList

public java.lang.String[] getTuDelimiterList()
Remove an HTML tag from the set of tags that signal the start of a in XLIFF generated from the HTML.

Returns:
an array of Strings containing the current list of tags that start in XLIFF

convert

public ConversionStatus convert(ConversionMode mode,
                                java.util.Locale language,
                                java.lang.String phaseName,
                                int maxPhase,
                                java.nio.charset.Charset nativeEncoding,
                                FileType nativeFileType,
                                java.lang.String nativeFileName,
                                java.lang.String baseDir,
                                Notifier notifier,
                                SegmentBoundary boundary,
                                java.io.StringWriter generatedFileName)
                         throws ConversionException
Convert a PDF file to XLIFF, creating (in some future release?) xliff, skeleton and format files as output.

Specified by:
convert in interface Converter
Parameters:
mode - The mode of conversion (to XLIFF in this case).
language - The language of the input file.
phaseName - The target phase-name. This value is ignored.
maxPhase - The maximum phase number. This value is ignored.
nativeEncoding - The encoding of the input file. This value is ignored for PDF files.
nativeFileType - The type of the native file. This value must be "PDF". (Note: The value is stored in the the datatype attribute of the XLIFF's file element.)
nativeFileName - The name of the input PDF file (without directory prefix).
baseDir - The directory that contains the input PDF file--from which we will read the input file. This is also the directory in which the output xliff, skeleton and format files will be written. The output files will be named as follows:
  • nativeFileName.xliff
  • nativeFileName.skeleton (future?)
  • nativeFileName.format (future?)
where nativeFileName is the file name specified in the nativeFileName parameter.
notifier - Instance of a class that implements the Notifier interface (to send notifications in case of conversion error).
boundary - The boundary on which to segment translation units (e.g., on paragraph or sentence boundaries)
generatedFileName - If non-null, the converter will write the name of the file (without parent directories) to which the generated XLIFF file was written.
Returns:
Indicator of the status of the conversion.
Throws:
ConversionException - If a conversion exception is encountered.

convert

public ConversionStatus convert(ConversionMode mode,
                                java.util.Locale language,
                                java.lang.String phaseName,
                                int maxPhase,
                                java.nio.charset.Charset nativeEncoding,
                                FileType nativeFileType,
                                java.lang.String nativeFileName,
                                java.lang.String baseDir,
                                Notifier notifier,
                                SegmentBoundary boundary,
                                java.io.StringWriter generatedFileName,
                                java.util.Set<f2xutils.XMLTuXPath> skipList)
                         throws ConversionException
Convert a PDF file to XLIFF, creating (in some future release?) xliff, skeleton and format files as output.

Specified by:
convert in interface Converter
Parameters:
mode - The mode of conversion (to XLIFF in this case).
language - The language of the input file.
phaseName - The target phase-name. This value is ignored.
maxPhase - The maximum phase number. This value is ignored.
nativeEncoding - The encoding of the input file. This value is ignored for PDF files.
nativeFileType - The type of the native file. This value must be "PDF". (Note: The value is stored in the the datatype attribute of the XLIFF's file element.)
nativeFileName - The name of the input PDF file (without directory prefix).
baseDir - The directory that contains the input PDF file--from which we will read the input file. This is also the directory in which the output xliff, skeleton and format files will be written. The output files will be named as follows:
  • nativeFileName.xliff
  • nativeFileName.skeleton (future?)
  • nativeFileName.format (future?)
where nativeFileName is the file name specified in the nativeFileName parameter.
notifier - Instance of a class that implements the Notifier interface (to send notifications in case of conversion error).
boundary - The boundary on which to segment translation units (e.g., on paragraph or sentence boundaries)
generatedFileName - If non-null, the converter will write the name of the file (without parent directories) to which the generated XLIFF file was written.
skipList - (Not used by this converter.)
Returns:
Indicator of the status of the conversion.
Throws:
ConversionException - If a conversion exception is encountered.

convert

@Deprecated
public ConversionStatus convert(ConversionMode mode,
                                           java.util.Locale language,
                                           java.lang.String phaseName,
                                           int maxPhase,
                                           java.nio.charset.Charset nativeEncoding,
                                           FileType nativeFileType,
                                           java.lang.String nativeFileName,
                                           java.lang.String baseDir,
                                           Notifier notifier)
                         throws ConversionException
Deprecated. 

Convert a PDF file to XLIFF, creating (in some future release?) xliff, skeleton and format files as output.

Specified by:
convert in interface Converter
Parameters:
mode - The mode of conversion (to XLIFF in this case).
language - The language of the input file.
phaseName - The target phase-name. This value is ignored.
maxPhase - The maximum phase number. This value is ignored.
nativeEncoding - The encoding of the input file. This value is ignored for PDF files.
nativeFileType - The type of the native file. This value must be "PDF". (Note: The value is stored in the the datatype attribute of the XLIFF's file element.)
nativeFileName - The name of the input PDF file (without directory prefix).
baseDir - The directory that contains the input PDF file--from which we will read the input file. This is also the directory in which the output xliff, skeleton and format files will be written. The output files will be named as follows:
  • nativeFileName.xliff
  • nativeFileName.skeleton (future?)
  • nativeFileName.format (future?)
where nativeFileName is the file name specified in the nativeFileName parameter.
notifier - Instance of a class that implements the Notifier interface (to send notifications in case of conversion error).
Returns:
Indicator of the status of the conversion.
Throws:
ConversionException - If a conversion exception is encountered.

setConversionProperty

public void setConversionProperty(java.lang.String property,
                                  java.lang.Object value)
                           throws ConversionException
Set a format-specific property that might affect the way that the conversion occurs.

Note: This converter needs no format-specific properties. If any are passed, they will be silently ignored.

Specified by:
setConversionProperty in interface Converter
Parameters:
property - The name of the property
value - The value of the property
Throws:
ConversionException - If the property isn't recognized (and if it matters).