PAJES 3.0.21

org.pajes.html
Class PajeParser

java.lang.Object
  extended by org.pajes.html.PajeParser

public class PajeParser
extends java.lang.Object

The PajeParser class takes an input stream and parses it into "tokens", allowing the tokens to be read one at a time. The parser can recognize tags and raw text.

The most usual use of this class is to be passed to PajeTemplate.generate(org.pajes.html.PajeParser) methods to create a Paje instance. However, a Paje can be generated directly from the parser with the toPaje() method.

Alternatively, a typical application first constructs an instance of this class and then repeatedly loops, calling the next method in each iteration of the loop until it returns the value EOF.

Note:EOF will also be returned in the case of invalid HTML code.

See Also:
PajeTemplate

Field Summary
static int EOF
          A constant indicating that the end of the stream has been read.
static int READY
          A constant used to indicate that no tokens have been read.
static int TAG
          A constant indicating that an HTML tag has been read.
static int TEXT
          A constant indicating that raw text has been read.
 
Constructor Summary
PajeParser(java.io.File f)
          Creates an PajeParser from a given file system file.
PajeParser(java.io.InputStream i)
          Creates an PajeParser from an input stream.
PajeParser(java.io.Reader r)
          Creates an PajeParser from a Reader.
PajeParser(java.lang.String f)
          Creates an PajeParser from a given file system file.
PajeParser(java.net.URL u)
          Creates an PajeParser from a file from the given URL.
 
Method Summary
 java.lang.String getAttribute(java.lang.String name)
          Returns the value of the specified attribute name.
 java.lang.String[][] getAttributes()
          Returns the attributes of the current token.
 int getColumnNumber()
          Returns the current column number.
 int getLineNumber()
          Returns the current line number.
 java.lang.String getToken()
          Returns the text of the current token.
static void main(java.lang.String[] args)
          Allows the PajeParser to be called from the command line to validate one or more files.
 int next()
          Reads the next token from the file.
 void pushBack()
          The pushBack() method allows you to 'unread' the last token so that the next call to next() will return the same value.
 void setDecodeCharacterEntities(boolean decode)
          Determines whether any HTML named character entities (", &, < and >) found in TEXT are to be converted to the actual character.
 Paje toPaje()
          Parses the HTML source and returns a Paje instance.
 Paje toPaje(PajeTemplate template)
          Parses the HTML source using the specified template, and returns a Paje instance.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

EOF

public static final int EOF
A constant indicating that the end of the stream has been read. In the case of an open-ended tag or open-ended entity (because of invalid HTML), this constant will be returned.

See Also:
Constant Field Values

READY

public static final int READY
A constant used to indicate that no tokens have been read.

See Also:
Constant Field Values

TAG

public static final int TAG
A constant indicating that an HTML tag has been read.

See Also:
Constant Field Values

TEXT

public static final int TEXT
A constant indicating that raw text has been read.

See Also:
Constant Field Values
Constructor Detail

PajeParser

public PajeParser(java.io.File f)
           throws java.io.IOException
Creates an PajeParser from a given file system file.

Parameters:
f - the file to be read.
Throws:
java.io.IOException - if an exception occurs parsing the file.
java.io.FileNotFoundException - if the specified file does not exist or is a directory.

PajeParser

public PajeParser(java.io.InputStream i)
           throws java.io.IOException
Creates an PajeParser from an input stream.

Parameters:
i - the InputStream from which the file will be read.
Throws:
java.io.IOException - if an exception occurs parsing the file.
java.io.FileNotFoundException - if the specified InputStream is null.

PajeParser

public PajeParser(java.io.Reader r)
           throws java.io.IOException
Creates an PajeParser from a Reader.

Parameters:
r - the Reader from which the file will be read.
Throws:
java.io.IOException - if an exception occurs parsing the file.

PajeParser

public PajeParser(java.lang.String f)
           throws java.io.IOException
Creates an PajeParser from a given file system file.

Parameters:
f - the name of the file to be read.
Throws:
java.io.IOException - if an exception occurs parsing the file.
java.io.FileNotFoundException - if the specified file cannot be located.

PajeParser

public PajeParser(java.net.URL u)
           throws java.net.MalformedURLException,
                  java.io.IOException
Creates an PajeParser from a file from the given URL.

Parameters:
u - the URL from which the file will be read.
Throws:
java.net.MalformedURLException - if a null URL is specified.
java.io.IOException - if an exception occurs parsing the file.
Method Detail

main

public static void main(java.lang.String[] args)
Allows the PajeParser to be called from the command line to validate one or more files.

Parameters:
args - the first argument is a file to be parsed and verified, or a directory in which all files ending in .htm or .html will be verified.

getAttribute

public java.lang.String getAttribute(java.lang.String name)
Returns the value of the specified attribute name.

Parameters:
name - The attribute name to be located.
Returns:
the value of the specified attribute name, or -1 if it was not found or the current type is not TAG.
See Also:
getAttributes()

getAttributes

public java.lang.String[][] getAttributes()
Returns the attributes of the current token. If the type is TAG, an array of the tag attributes will be returned. Otherwise, null will be returned.

The value at index [0][0] will always be the tag name (e.g. input or a). The value at index [0][1] will always be null, unless the TAG is empty (i.e. it has the form <tag_name/>), in which case [0][1] will contain "/". The attribute name (element 0 in the second array dimension) will always be returned in LOWER case. The attribute value (element 1 in the second array dimension) will always have any surrounding single or double quotes removed. For attributes with no value (e.g. mayscript or nowrap), the value will always be null. A third null element (i.e. element 2 in the second array dimension) is always returned in the second array dimension for whatever purpose may be made of it.

Returns:
The attributes array.
See Also:
getAttribute(java.lang.String)

getColumnNumber

public int getColumnNumber()
Returns the current column number.

Returns:
The column number.

getLineNumber

public int getLineNumber()
Returns the current line number.

Returns:
The line number.

getToken

public java.lang.String getToken()
Returns the text of the current token. If the type is TEXT, or TAG, the corresponding text will be returned. Otherwise, null will be returned.

For type TEXT, the text returned will be the text between tags. For TAG, the contents of the tag (without the < and > characters) will be returned.

Returns:
The token value

next

public int next()
         throws java.io.IOException
Reads the next token from the file.

Returns:
the type of the token just read. The value returned is one of the following:
  • EOF indicates that the end of the input stream has been reached.
  • TAG indicates that the token is an HTML tag.
  • TEXT indicates that the token is text between HTML tags.
To get the string of the token, use getToken(). To get the attributes of a TAG token, use getAttributes() or getAttribute(java.lang.String).
Throws:
java.io.IOException - if an exception occurs parsing the file.
PajeParserException - if an unmatched quote is detected.

pushBack

public void pushBack()
              throws java.io.IOException
The pushBack() method allows you to 'unread' the last token so that the next call to next() will return the same value.

Throws:
java.io.IOException - if an exception occurs 'unreading' the token.

setDecodeCharacterEntities

public void setDecodeCharacterEntities(boolean decode)
Determines whether any HTML named character entities (&quot;, &amp;, &lt; and &gt;) found in TEXT are to be converted to the actual character.

Parameters:
decode - if true, any named character entities found in a TEXT token will be returned as their actual character, or if false (the default), they will be returned as named entities.

toPaje

public Paje toPaje()
            throws java.io.IOException
Parses the HTML source and returns a Paje instance.

Returns:
the Paje representing the HTML source file.
Throws:
java.io.IOException - if an error occurs reading the HTML source.

toPaje

public Paje toPaje(PajeTemplate template)
            throws java.io.IOException
Parses the HTML source using the specified template, and returns a Paje instance.

Parameters:
template - the PajeTemplate that will be applied to the HTML source to generate the Paje.
Returns:
the Paje representing the HTML source file.
Throws:
java.io.IOException - if an error occurs reading the HTML source.

PAJES 3.0.21

Copyright 1998-2007 Viridian Pty Limited. All Rights Reserved.