file2xliff4j
Class HtmlHandler

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by file2xliff4j.HtmlHandler
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class HtmlHandler
extends org.xml.sax.helpers.DefaultHandler

Class HtmlHandler uses HotSAX to read and parse HTML documents as if they were well-formed XML documents using a SAX-like API.

Author:
Weldon Whipple <weldon@lingotek.com>

Constructor Summary
HtmlHandler()
          For testing only
HtmlHandler(java.util.Set<java.lang.String> tuTags, java.io.OutputStreamWriter outXliff, java.io.OutputStreamWriter outSkeleton, java.io.OutputStreamWriter outFormat, java.util.Locale sourceLang, java.lang.String docType, java.lang.String originalFileName, SegmentBoundary boundary)
          This constructor sets up the HtmlHandler to be notified as each tag (etc.) is encountered in the HTML input stream.
 
Method Summary
 void characters(char[] ch, int start, int length)
          Called whenever characters are encountered
 void endDocument()
          When the end-of-document is encountered, save the "candidate epilog" (the characters that follow the final TU), etc.
 void endElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qualifiedName)
          Method called whenever an end element is encountered
 void setDocumentLocator(org.xml.sax.Locator locator)
          Method (inherited from Default Handler or one if its ancestors) that sets the "locator"--in this case the org.xml.sax.helpers.LocatorImpl class--which has methods that (among other things) return the current line and column numbers in the stream being parsed.
 void startDocument()
          Method called by SAX parser at the beginning of document parsing.
 void startElement(java.lang.String namespaceURI, java.lang.String localName, java.lang.String qualifiedName, org.xml.sax.Attributes atts)
          Method called whenever a start element is encountered
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, skippedEntity, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlHandler

public HtmlHandler()
For testing only


HtmlHandler

public HtmlHandler(java.util.Set<java.lang.String> tuTags,
                   java.io.OutputStreamWriter outXliff,
                   java.io.OutputStreamWriter outSkeleton,
                   java.io.OutputStreamWriter outFormat,
                   java.util.Locale sourceLang,
                   java.lang.String docType,
                   java.lang.String originalFileName,
                   SegmentBoundary boundary)
This constructor sets up the HtmlHandler to be notified as each tag (etc.) is encountered in the HTML input stream. This handler then does whatever is appropriate with the HTML input.

Parameters:
tuTags - The set of HTML tags that signal a TU break.
outXliff - Where to write the XLIFF
outSkeleton - Where to write the target
outFormat - Where to write the format
sourceLang - The language of this original
docType - What kind of document is this (Probably HTML)
originalFileName - The original file's name (required as an attribute in the "file" element)
Method Detail

setDocumentLocator

public void setDocumentLocator(org.xml.sax.Locator locator)
Method (inherited from Default Handler or one if its ancestors) that sets the "locator"--in this case the org.xml.sax.helpers.LocatorImpl class--which has methods that (among other things) return the current line and column numbers in the stream being parsed.

Specified by:
setDocumentLocator in interface org.xml.sax.ContentHandler
Overrides:
setDocumentLocator in class org.xml.sax.helpers.DefaultHandler
Parameters:
locator - A reference to the locator implementation class

startDocument

public void startDocument()
                   throws org.xml.sax.SAXException
Method called by SAX parser at the beginning of document parsing.

Specified by:
startDocument in interface org.xml.sax.ContentHandler
Overrides:
startDocument in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException

startElement

public void startElement(java.lang.String namespaceURI,
                         java.lang.String localName,
                         java.lang.String qualifiedName,
                         org.xml.sax.Attributes atts)
                  throws org.xml.sax.SAXException
Method called whenever a start element is encountered

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Parameters:
namespaceURI - The URI of the namespace
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qualifiedName - The qualified name (with prefix), or the empty string if qualified names are not available
atts - The specified or defaulted attributes.
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String namespaceURI,
                       java.lang.String localName,
                       java.lang.String qualifiedName)
                throws org.xml.sax.SAXException
Method called whenever an end element is encountered

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler
Parameters:
namespaceURI - The URI of the namespace
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qualifiedName - The qualified name (with prefix), or the empty string if qualified names are not available
Throws:
org.xml.sax.SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Called whenever characters are encountered

Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler
Parameters:
ch - Array containing characters encountered
start - Position in array of first applicable character
length - How many characters are of interest?
Throws:
org.xml.sax.SAXException

endDocument

public void endDocument()
                 throws org.xml.sax.SAXException
When the end-of-document is encountered, save the "candidate epilog" (the characters that follow the final TU), etc.

Specified by:
endDocument in interface org.xml.sax.ContentHandler
Overrides:
endDocument in class org.xml.sax.helpers.DefaultHandler
Throws:
org.xml.sax.SAXException