|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectfile2xliff4j.TuPreener
public class TuPreener
Class to "preen" translation units--i.e. to identify the "core" text that is completely enclosed by paired bx/ex tags. (If x tags are located outside the core text, then are so identified as well.)
The class also includes methods for retrieving and updating the core text.
Field Summary | |
---|---|
static java.lang.String |
CORE_END_MRK
For the present, we can assume that signals the end of core text (since the only other occurrence of the mrk tag is as an "empty" mrk element with mtype='x-mergeboundary'). |
static java.lang.String |
CORE_END_TAG
Deprecated. |
static java.lang.String |
CORE_START_MRK
Instead of lt:core elements, use XLIFF mrk with the mtype='x-coretext' attribute. |
static java.lang.String |
CORE_START_TAG
Deprecated. |
static java.lang.String |
HTML_TAGS_AS_ENTITIES
|
static java.lang.String |
NAME_SPACE_URI
The namespace URI of the lt:core tags |
static java.lang.String |
ORRED_WHITE_SPACE
|
static java.lang.String |
SECONDARY_WHITE_SPACE_CLASS
|
static java.lang.String |
WHITE_SPACE_CLASS
|
Method Summary | |
---|---|
static java.lang.String |
checkAndRepairTuTags(java.lang.String tuText)
Passed the core text of a tu that originates from a format that doesn't necessarily map to well-formed XML (non-XHTML HTML, for example), verify that the only tags present are bx, ex and x tags (for our implementation, at least). |
static file2xliff4j.SegmentInfo[] |
getCoreSegments(java.lang.String in,
SegmentBoundary bdyType,
java.util.Locale locale)
Passed a String that contains a the text of a "paragraph," a segment boundary type indicator and the locale of the text in the string, divide the input string into segments, marking each segment's "cores." Return an array of segment objects. |
static file2xliff4j.SegmentInfo[] |
getCoreSegments(java.lang.String in,
SegmentBoundary bdyType,
java.util.Locale locale,
boolean preenHtmlFromXML)
Passed a String that contains a the text of a "paragraph," a segment boundary type indicator and the locale of the text in the string, divide the input string into segments, marking each segment's "cores." Return an array of segment objects. |
static java.lang.String |
getCoreText(java.lang.String fullText)
Return the text between the core start and end tags |
static java.lang.String |
getPrefixText(java.lang.String fullText)
Passed the full text of a Translation Unit source or target, return the text before the core start tag |
static java.lang.String |
getSuffixText(java.lang.String fullText)
Passed the full text of a Translation Unit source or target, return the text after the core end tag |
static boolean |
isSingleton(java.lang.String tag)
Is this a singleton tag? (For now, that means an empty x tag.) |
static java.lang.String |
markCoreTu(java.lang.String in)
Mark the core text of a translation unit: Passed a string to be stored in a trans-unit source or target, determine if the string consists exclusively of white-space and/or tags. |
static java.lang.String |
markCoreTu(java.lang.String in,
SegmentBoundary segment)
Mark the core text of a translation unit: Passed a string to be stored in a trans-unit source or target, determine if the string consists exclusively of white-space and/or tags. |
static java.lang.String |
markCoreTu(java.lang.String in,
SegmentBoundary segment,
boolean preenHtmlFromXML)
Mark the core text of a translation unit: Passed a string to be stored in a trans-unit source or target, determine if the string consists exclusively of white-space and/or tags. |
static java.lang.String |
removeCoreMarks(java.lang.String fullText)
Passed the full text of a Translation Unit source or target (including core start and end markers) remove the tags and return what is left |
static java.lang.String |
removeMergerMarks(java.lang.String fullText)
Passed the text of a Translation Unit source or target (with or without core start and end marks), remove the mrk tags of mtype x-mergeboundary. |
static java.lang.String |
replaceCoreText(java.lang.String fullText,
java.lang.String newCore)
Passed the full text of a Translation Unit source or target (including core start and end markers) and new core text, replace the old core text with the new and return the new full text |
static java.lang.String |
validateAndRepairTu(java.lang.String tuText)
Passed the core text of a tu, verify that there is a one-to-one relationship between bx and ex tags (related by their rid's), and that they are properly nested. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
@Deprecated public static final java.lang.String CORE_START_TAG
"It is not possible to add non-XLIFF elements in either the <source> or <target> elements. However, the <mrk> element can be used to markup sections of the text with user-defined values assigned to the mtype attribute. You can also add non-XLIFF attributes to most of the inline elements used in <source> and <target<
@Deprecated public static final java.lang.String CORE_END_TAG
public static final java.lang.String NAME_SPACE_URI
public static final java.lang.String CORE_START_MRK
public static final java.lang.String CORE_END_MRK
public static final java.lang.String WHITE_SPACE_CLASS
public static final java.lang.String ORRED_WHITE_SPACE
public static final java.lang.String SECONDARY_WHITE_SPACE_CLASS
public static final java.lang.String HTML_TAGS_AS_ENTITIES
Method Detail |
---|
public static java.lang.String getCoreText(java.lang.String fullText)
fullText
- The full text of the Translation Unit source or target,
complete with core text marker tags.
public static java.lang.String getPrefixText(java.lang.String fullText)
fullText
- The full text of the Translation Unit source or target,
complete with core text marker tags.
public static java.lang.String getSuffixText(java.lang.String fullText)
fullText
- The full text of the Translation Unit source or target,
complete with core text marker tags.
public static java.lang.String removeCoreMarks(java.lang.String fullText)
fullText
- The full text of the Translation Unit source or target,
complete with core text marker tags.
public static java.lang.String removeMergerMarks(java.lang.String fullText)
fullText
- Text of the Translation Unit source or target,
with merge boundary
public static java.lang.String replaceCoreText(java.lang.String fullText, java.lang.String newCore)
fullText
- The full text of the Translation Unit source or target,
complete with core text marker tags.newCore
- The new core text to substitute for the old core text
A value of null for newCore implies not to change
the fullText.
public static file2xliff4j.SegmentInfo[] getCoreSegments(java.lang.String in, SegmentBoundary bdyType, java.util.Locale locale)
in
- The input string that contains (potentially) a paragraph
segmentbdyType
- Segment boundary type (e.g. paragraph, sentence)locale
- The language of the string--used by the sentence break
iterator to break into sentences.
public static file2xliff4j.SegmentInfo[] getCoreSegments(java.lang.String in, SegmentBoundary bdyType, java.util.Locale locale, boolean preenHtmlFromXML)
in
- The input string that contains (potentially) a paragraph
segmentbdyType
- Segment boundary type (e.g. paragraph, sentence)locale
- The language of the string--used by the sentence break
iterator to break into sentences.preenHtmlFromXML
- If true, look for HTML-like tags that are possibly
outside the "core"--tags that represent "less-than" and "greater-than"
as entities. If found on the edges of segments, move them outside
the core.
public static java.lang.String markCoreTu(java.lang.String in)
in
- The candidate input TU text to be examined
public static java.lang.String markCoreTu(java.lang.String in, SegmentBoundary segment)
in
- The candidate input TU text to be examinedsegment
- The type of segmentation boundary. (If PARAGRAPH, markCoreTu
assumes that all tags are balanced. If SENTENCE, it will look for
bx tags without ending ex tags (which might be in a later sentence,
in the same paragraph, for example), or ex tags without start bx
tags (which might be in an earlier sentence in the same paragraph)
public static java.lang.String markCoreTu(java.lang.String in, SegmentBoundary segment, boolean preenHtmlFromXML)
in
- The candidate input TU text to be examinedsegment
- The type of segmentation boundary. (If PARAGRAPH, markCoreTu
assumes that all tags are balanced. If SENTENCE, it will look for
bx tags without ending ex tags (which might be in a later sentence,
in the same paragraph, for example), or ex tags without start bx
tags (which might be in an earlier sentence in the same paragraph)preenHtmlFromXML
- If true, look for HTML-like tags that are possibly
outside the "core"--tags that represent "less-than" and "greater-than"
as entities. If found on the edges of segments, move them outside
the core.
public static boolean isSingleton(java.lang.String tag)
tag
- The tag to examine for singletonness
public static java.lang.String checkAndRepairTuTags(java.lang.String tuText)
While validating, remove non-bx/ex/x tags.
Note: Although XLIFF allows source and target elements to include tags/elements other than bx, ex and x, this particular implementation allows only those three (empty) elements. (Since the text we are passed is the core of the TU, it doesn't include our start and end mrk tags.)
tuText
- The text of the TU
public static java.lang.String validateAndRepairTu(java.lang.String tuText)
While validating, also repair the TU.
tuText
- The text of the TU
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |