ionflux.org | Impressum

Ionflux::Tools::Tokenizer Class Reference
[String tokenizer]

Generic byte string tokenizer. More...

#include <Tokenizer.hpp>

Collaboration diagram for Ionflux::Tools::Tokenizer:

Collaboration graph
[legend]
List of all members.

Public Member Functions

 Tokenizer ()
 Constructor.
 Tokenizer (const std::string &initInput)
 Constructor.
 Tokenizer (const std::vector< TokenType > &initTokenTypes)
 Constructor.
 Tokenizer (const std::vector< TokenType > &initTokenTypes, const std::string &initInput)
 Constructor.
virtual ~Tokenizer ()
 Destructor.
virtual void clearTokenTypes ()
 Clear token types.
virtual void useDefaultTokenTypes ()
 Use default token types.
virtual void setTokenTypes (const std::vector< TokenType > &newTokenTypes)
 Set token types.
virtual void addTokenType (const TokenType &newTokenType)
 Add a token type.
virtual void addTokenTypes (const std::vector< TokenType > &newTokenTypes)
 Add token types.
virtual void setInput (const std::string &newInput)
 Set input.
virtual Token nextToken ()
 Get next token.
virtual Token getNextToken (const TokenTypeMap &otherMap)
 Get next token.
virtual Token getNextToken ()
 Get next token.
virtual Token getCurrentToken ()
 Get current token.
virtual int getCurrentTokenType ()
 Get type of current token.
virtual void reset ()
 Reset the parser.
virtual void setTokenTypeAnything ()
 Set special token type TT_ANYTHING.
virtual void setExtractQuoted (bool newExtractQuoted)
 Set quoted string extraction flag.
virtual void setExtractEscaped (bool newExtractEscaped)
 Set escaped character extraction flag.
virtual unsigned int getCurrentPos ()
 Get current position.
virtual unsigned int getCurrentTokenPos ()
 Get position of current token.
virtual char getQuoteChar ()
 Get quote character.

Static Public Member Functions

static bool isOneOf (char c, const std::string &testChars, bool invert)
 Check type of a character.
static bool isValid (Token &token)
 Check whether a token is valid.

Public Attributes

TokenType TT_ANYTHING
 Token type: Anything. (special).

Static Public Attributes

static const TokenType TT_INVALID = {-1, "", false, 0}
 Token type: Invalid token. (special).
static const TokenType TT_NONE = {0, "", false, 0}
 Token type: No token. (special).
static const TokenType TT_QUOTED = {2, "", false, 0}
 Token type: Quoted string. (special).
static const TokenType TT_ESCAPED = {3, "", false, 0}
 Token type: Escaped character. (special).
static const TokenType TT_WHITESPACE = {4, " \t", false, 0}
 Token type: Linear whitespace.
static const TokenType TT_LINETERM = {5, "\n\r", false, 1}
 Token type: Line terminator.
static const TokenType TT_NUMBER = {7, "0123456789", false, 0}
 Token type: Number.
static const TokenType TT_ALPHA
 Token type: Alpha (latin).
static const TokenType TT_DEFAULT_SEP = {7, "_-.", false, 0}
 Token type: Default separator characters.
static const TokenType TT_IDENTIFIER
 Token type: Identifier.
static const Token TOK_INVALID = {Tokenizer::TT_INVALID.typeID, ""}
 Token: Invalid token. (special).
static const Token TOK_NONE = {Tokenizer::TT_NONE.typeID, ""}
 Token: No token. (special).
static const int TT_ANYTHING_TYPE_ID = 1
 Type ID of the TT_ANYTHING token type.
static const std::string QUOTE_CHARS = "\"'"
 Quote characters.
static const char ESCAPE_CHAR = '\\'
 Escape character.

Protected Attributes

std::string theInput
 The input string to be parsed.
unsigned int currentPos
 Current parsing position in the input string.
unsigned int currentTokenPos
 Position of current token in the input string.
Token currentToken
 Current token.
bool extractQuoted
 Extract quoted strings flag.
char currentQuoteChar
 Quote character.
bool extractEscaped
 Extract escaped characters flag.
TokenTypeMaptypeMap
 Token type map.

Detailed Description

Generic byte string tokenizer.

A generic tokenizer for parsing byte strings. To set up a tokenizer, first create a Tokenizer object. This will be set up using the default token types Tokenizer::TT_WHITESPACE, Tokenizer::TT_LINETERM and Tokenizer::TT_IDENTIFIER. You may then add your own custom token types and optionally set up the Tokenizer::TT_ANYTHING token type (which will match anything not matched by previously defined token types). To enable extraction of quoted strings and escaped characters, call Tokenizer::setExtractQuoted() with true as an argument.
To get a token from the token stream, call Tokenizer::getNextToken(). Make sure your code handles the Tokenizer::TT_NONE and Tokenizer::TT_INVALID special token types (which cannot be disabled). Tokenizer::getNextToken() will always return Tokenizer::TT_NONE at the end of the token stream and Tokenizer::TT_INVALID if an invalid token is encountered.


Constructor & Destructor Documentation

Ionflux::Tools::Tokenizer::Tokenizer  ) 
 

Constructor.

Construct new Tokenizer object.

Ionflux::Tools::Tokenizer::Tokenizer const std::string &  initInput  ) 
 

Constructor.

Construct new Tokenizer object.

Parameters:
initInput The input string to be parsed.

Ionflux::Tools::Tokenizer::Tokenizer const std::vector< TokenType > &  initTokenTypes  ) 
 

Constructor.

Construct new Tokenizer object.

Parameters:
initTokenTypes Token types this tokenizer recognizes.

Ionflux::Tools::Tokenizer::Tokenizer const std::vector< TokenType > &  initTokenTypes,
const std::string &  initInput
 

Constructor.

Construct new Tokenizer object.

Parameters:
initTokenTypes Token types this tokenizer recognizes.
initInput The input string to be parsed.

Ionflux::Tools::Tokenizer::~Tokenizer  )  [virtual]
 

Destructor.

Destruct Tokenizer object.


Member Function Documentation

void Ionflux::Tools::Tokenizer::addTokenType const TokenType newTokenType  )  [virtual]
 

Add a token type.

Adds a token type (possibly user defined) to the set of token types recognized by this tokenizer.

Parameters:
newTokenType Token type to be added.

void Ionflux::Tools::Tokenizer::addTokenTypes const std::vector< TokenType > &  newTokenTypes  )  [virtual]
 

Add token types.

Adds token types (possibly user defined) to the set of token types recognized by this Tokenizer.

Parameters:
newTokenTypes Set of token types to be added.

void Ionflux::Tools::Tokenizer::clearTokenTypes  )  [virtual]
 

Clear token types.

Removes all token types from the set of recognized token types.

Note:
Special token types will still be available to the tokenizer. You can always restore the default set of token types with useDefaultTokenTypes().
See also:
useDefaultTokenTypes()

unsigned int Ionflux::Tools::Tokenizer::getCurrentPos  )  [virtual]
 

Get current position.

Get the current parsing position relative to the first character of the input string.

Returns:
Current parsing position.

Token Ionflux::Tools::Tokenizer::getCurrentToken  )  [virtual]
 

Get current token.

Get the current token.

Returns:
The current token.

unsigned int Ionflux::Tools::Tokenizer::getCurrentTokenPos  )  [virtual]
 

Get position of current token.

Get the position of the current token relative to the first character of the input string.

Returns:
Position of current token.

int Ionflux::Tools::Tokenizer::getCurrentTokenType  )  [virtual]
 

Get type of current token.

Get the type of the current token.

Returns:
Type ID of the current token.

Token Ionflux::Tools::Tokenizer::getNextToken  )  [virtual]
 

Get next token.

Parse the input string and get the next token.

Returns:
The next token from the current input.

Token Ionflux::Tools::Tokenizer::getNextToken const TokenTypeMap otherMap  )  [virtual]
 

Get next token.

Parse the input string and get the next token.

Parameters:
otherMap Token type map to be used for extracting the next token.
Returns:
The next token from the current input.

char Ionflux::Tools::Tokenizer::getQuoteChar  )  [virtual]
 

Get quote character.

Get the quote character of a quoted string.

Returns:
Quote character of the current token if this token is a quoted string, or 0, if the current token is not a quoted string.

bool Ionflux::Tools::Tokenizer::isOneOf char  c,
const std::string &  testChars,
bool  invert
[static]
 

Check type of a character.

Returns true if the character c is one of the characters of testChars (if invert is false). If you pass true to invert, the return value is inverted, i.e. the function returns true if c is not one of the characters of testChars.

Deprecated:
You should not use this, since it is obsolete and may be removed in future versions. Use Ionflux::Tools::isOneOf() instead. This function is provided for backward compatibility only.
Parameters:
c Character to be checked.
testChars String of characters.
invert Whether to invert the result.
Returns:
true if the character is one of testChars, false otherwise. The result is inverted if true is passed to invert.

bool Ionflux::Tools::Tokenizer::isValid Token token  )  [static]
 

Check whether a token is valid.

Check whether a token is a valid and well defined token (i.e., not TT_NONE or TT_INVALID).

Parameters:
token Token to be checked.

Token Ionflux::Tools::Tokenizer::nextToken  )  [virtual]
 

Get next token.

Parse the input string and get the next token.

Deprecated:
You should not use this function because its name is inconsistent with the interface. Use getNextToken() instead. This function is provided for backward compatibility only.
Returns:
The next token from the current input.
See also:
getNextToken()

void Ionflux::Tools::Tokenizer::reset  )  [virtual]
 

Reset the parser.

Reset the parser so the input can be parsed again from the beginning.

void Ionflux::Tools::Tokenizer::setExtractEscaped bool  newExtractEscaped  )  [virtual]
 

Set escaped character extraction flag.

Pass true to this function to enable extraction of escaped characters, or disable this feature by passing false.

Note:
If you enable extraction of escaped characters, you should make sure that your code handles the TT_ESCAPED special token type. If you have enabled quoted string extraction, escaped character extraction will also be enabled by default (and cannot be disabled).
Parameters:
newExtractEscaped Whether to extract escaped characters.

void Ionflux::Tools::Tokenizer::setExtractQuoted bool  newExtractQuoted  )  [virtual]
 

Set quoted string extraction flag.

Pass true to this function to enable extraction of quoted strings (and escaped characters), or disable this feature by passing false.

Note:
If you enable extraction of quoted strings, you should make sure that your code handles the TT_QUOTED and TT_ESCAPED special token types.
Parameters:
newExtractQuoted Whether to extract quoted strings.

void Ionflux::Tools::Tokenizer::setInput const std::string &  newInput  )  [virtual]
 

Set input.

Sets the input string to be parsed.

Parameters:
newInput The input string to be parsed.

void Ionflux::Tools::Tokenizer::setTokenTypeAnything  )  [virtual]
 

Set special token type TT_ANYTHING.

This sets up a special token type TT_ANYTHING that will match any characters not matched by any of the previously defined token types.

Note:
You may call this again to update TT_ANYTHING if you add further token types after a call to setTokenTypeAnything().

void Ionflux::Tools::Tokenizer::setTokenTypes const std::vector< TokenType > &  newTokenTypes  )  [virtual]
 

Set token types.

Set the set of token types recognized by this tokenizer.

Note:
The special token types are always available, regardless of whether they are added or not.
Parameters:
newTokenTypes Set of token types.

void Ionflux::Tools::Tokenizer::useDefaultTokenTypes  )  [virtual]
 

Use default token types.

Initializes the set of recognized token types with the default token types.


Member Data Documentation

unsigned int Ionflux::Tools::Tokenizer::currentPos [protected]
 

Current parsing position in the input string.

char Ionflux::Tools::Tokenizer::currentQuoteChar [protected]
 

Quote character.

Token Ionflux::Tools::Tokenizer::currentToken [protected]
 

Current token.

unsigned int Ionflux::Tools::Tokenizer::currentTokenPos [protected]
 

Position of current token in the input string.

const char Ionflux::Tools::Tokenizer::ESCAPE_CHAR = '\\' [static]
 

Escape character.

bool Ionflux::Tools::Tokenizer::extractEscaped [protected]
 

Extract escaped characters flag.

bool Ionflux::Tools::Tokenizer::extractQuoted [protected]
 

Extract quoted strings flag.

const std::string Ionflux::Tools::Tokenizer::QUOTE_CHARS = "\"'" [static]
 

Quote characters.

std::string Ionflux::Tools::Tokenizer::theInput [protected]
 

The input string to be parsed.

const Token Ionflux::Tools::Tokenizer::TOK_INVALID = {Tokenizer::TT_INVALID.typeID, ""} [static]
 

Token: Invalid token. (special).

const Token Ionflux::Tools::Tokenizer::TOK_NONE = {Tokenizer::TT_NONE.typeID, ""} [static]
 

Token: No token. (special).

const TokenType Ionflux::Tools::Tokenizer::TT_ALPHA [static]
 

Initial value:

 {8, 
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ", 
    false, 0}
Token type: Alpha (latin).

TokenType Ionflux::Tools::Tokenizer::TT_ANYTHING
 

Token type: Anything. (special).

const int Ionflux::Tools::Tokenizer::TT_ANYTHING_TYPE_ID = 1 [static]
 

Type ID of the TT_ANYTHING token type.

const TokenType Ionflux::Tools::Tokenizer::TT_DEFAULT_SEP = {7, "_-.", false, 0} [static]
 

Token type: Default separator characters.

const TokenType Ionflux::Tools::Tokenizer::TT_ESCAPED = {3, "", false, 0} [static]
 

Token type: Escaped character. (special).

const TokenType Ionflux::Tools::Tokenizer::TT_IDENTIFIER [static]
 

Initial value:

 {6, 
    "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_", 
    false, 0}
Token type: Identifier.

const TokenType Ionflux::Tools::Tokenizer::TT_INVALID = {-1, "", false, 0} [static]
 

Token type: Invalid token. (special).

const TokenType Ionflux::Tools::Tokenizer::TT_LINETERM = {5, "\n\r", false, 1} [static]
 

Token type: Line terminator.

const TokenType Ionflux::Tools::Tokenizer::TT_NONE = {0, "", false, 0} [static]
 

Token type: No token. (special).

const TokenType Ionflux::Tools::Tokenizer::TT_NUMBER = {7, "0123456789", false, 0} [static]
 

Token type: Number.

const TokenType Ionflux::Tools::Tokenizer::TT_QUOTED = {2, "", false, 0} [static]
 

Token type: Quoted string. (special).

const TokenType Ionflux::Tools::Tokenizer::TT_WHITESPACE = {4, " \t", false, 0} [static]
 

Token type: Linear whitespace.

TokenTypeMap* Ionflux::Tools::Tokenizer::typeMap [protected]
 

Token type map.


The documentation for this class was generated from the following files:
Generated on Tue Mar 14 21:11:19 2006 for Ionflux Tools Class Library (iftools) by  doxygen 1.4.6