#include <Tokenizer.hpp>
Collaboration diagram for Ionflux::Tools::Tokenizer:

Public Member Functions | |
| Tokenizer () | |
| Constructor. | |
| Tokenizer (const std::string &initInput) | |
| Constructor. | |
| Tokenizer (const std::vector< TokenType > &initTokenTypes) | |
| Constructor. | |
| Tokenizer (const std::vector< TokenType > &initTokenTypes, const std::string &initInput) | |
| Constructor. | |
| virtual | ~Tokenizer () |
| Destructor. | |
| virtual void | clearTokenTypes () |
| Clear token types. | |
| virtual void | useDefaultTokenTypes () |
| Use default token types. | |
| virtual void | setTokenTypes (const std::vector< TokenType > &newTokenTypes) |
| Set token types. | |
| virtual void | addTokenType (const TokenType &newTokenType) |
| Add a token type. | |
| virtual void | addTokenTypes (const std::vector< TokenType > &newTokenTypes) |
| Add token types. | |
| virtual void | setInput (const std::string &newInput) |
| Set input. | |
| virtual Token | nextToken () |
| Get next token. | |
| virtual Token | getNextToken (const TokenTypeMap &otherMap) |
| Get next token. | |
| virtual Token | getNextToken () |
| Get next token. | |
| virtual Token | getCurrentToken () |
| Get current token. | |
| virtual int | getCurrentTokenType () |
| Get type of current token. | |
| virtual void | reset () |
| Reset the parser. | |
| virtual void | setTokenTypeAnything () |
| Set special token type TT_ANYTHING. | |
| virtual void | setExtractQuoted (bool newExtractQuoted) |
| Set quoted string extraction flag. | |
| virtual void | setExtractEscaped (bool newExtractEscaped) |
| Set escaped character extraction flag. | |
| virtual unsigned int | getCurrentPos () |
| Get current position. | |
| virtual unsigned int | getCurrentTokenPos () |
| Get position of current token. | |
| virtual char | getQuoteChar () |
| Get quote character. | |
Static Public Member Functions | |
| static bool | isOneOf (char c, const std::string &testChars, bool invert) |
| Check type of a character. | |
| static bool | isValid (Token &token) |
| Check whether a token is valid. | |
Public Attributes | |
| TokenType | TT_ANYTHING |
| Token type: Anything. (special). | |
Static Public Attributes | |
| static const TokenType | TT_INVALID = {-1, "", false, 0} |
| Token type: Invalid token. (special). | |
| static const TokenType | TT_NONE = {0, "", false, 0} |
| Token type: No token. (special). | |
| static const TokenType | TT_QUOTED = {2, "", false, 0} |
| Token type: Quoted string. (special). | |
| static const TokenType | TT_ESCAPED = {3, "", false, 0} |
| Token type: Escaped character. (special). | |
| static const TokenType | TT_WHITESPACE = {4, " \t", false, 0} |
| Token type: Linear whitespace. | |
| static const TokenType | TT_LINETERM = {5, "\n\r", false, 1} |
| Token type: Line terminator. | |
| static const TokenType | TT_NUMBER = {7, "0123456789", false, 0} |
| Token type: Number. | |
| static const TokenType | TT_ALPHA |
| Token type: Alpha (latin). | |
| static const TokenType | TT_DEFAULT_SEP = {7, "_-.", false, 0} |
| Token type: Default separator characters. | |
| static const TokenType | TT_IDENTIFIER |
| Token type: Identifier. | |
| static const Token | TOK_INVALID = {Tokenizer::TT_INVALID.typeID, ""} |
| Token: Invalid token. (special). | |
| static const Token | TOK_NONE = {Tokenizer::TT_NONE.typeID, ""} |
| Token: No token. (special). | |
| static const int | TT_ANYTHING_TYPE_ID = 1 |
| Type ID of the TT_ANYTHING token type. | |
| static const std::string | QUOTE_CHARS = "\"'" |
| Quote characters. | |
| static const char | ESCAPE_CHAR = '\\' |
| Escape character. | |
Protected Attributes | |
| std::string | theInput |
| The input string to be parsed. | |
| unsigned int | currentPos |
| Current parsing position in the input string. | |
| unsigned int | currentTokenPos |
| Position of current token in the input string. | |
| Token | currentToken |
| Current token. | |
| bool | extractQuoted |
| Extract quoted strings flag. | |
| char | currentQuoteChar |
| Quote character. | |
| bool | extractEscaped |
| Extract escaped characters flag. | |
| TokenTypeMap * | typeMap |
| Token type map. | |
A generic tokenizer for parsing byte strings. To set up a tokenizer, first create a Tokenizer object. This will be set up using the default token types Tokenizer::TT_WHITESPACE, Tokenizer::TT_LINETERM and Tokenizer::TT_IDENTIFIER. You may then add your own custom token types and optionally set up the Tokenizer::TT_ANYTHING token type (which will match anything not matched by previously defined token types). To enable extraction of quoted strings and escaped characters, call Tokenizer::setExtractQuoted() with true as an argument.
To get a token from the token stream, call Tokenizer::getNextToken(). Make sure your code handles the Tokenizer::TT_NONE and Tokenizer::TT_INVALID special token types (which cannot be disabled). Tokenizer::getNextToken() will always return Tokenizer::TT_NONE at the end of the token stream and Tokenizer::TT_INVALID if an invalid token is encountered.
|
|
Constructor. Construct new Tokenizer object. |
|
|
Constructor. Construct new Tokenizer object.
|
|
|
Constructor. Construct new Tokenizer object.
|
|
||||||||||||
|
Constructor. Construct new Tokenizer object.
|
|
|
Destructor. Destruct Tokenizer object. |
|
|
Add a token type. Adds a token type (possibly user defined) to the set of token types recognized by this tokenizer.
|
|
|
Add token types. Adds token types (possibly user defined) to the set of token types recognized by this Tokenizer.
|
|
|
Clear token types. Removes all token types from the set of recognized token types.
|
|
|
Get current position. Get the current parsing position relative to the first character of the input string.
|
|
|
Get current token. Get the current token.
|
|
|
Get position of current token. Get the position of the current token relative to the first character of the input string.
|
|
|
Get type of current token. Get the type of the current token.
|
|
|
Get next token. Parse the input string and get the next token.
|
|
|
Get next token. Parse the input string and get the next token.
|
|
|
Get quote character. Get the quote character of a quoted string.
|
|
||||||||||||||||
|
Check type of a character.
Returns
|
|
|
Check whether a token is valid. Check whether a token is a valid and well defined token (i.e., not TT_NONE or TT_INVALID).
|
|
|
Get next token. Parse the input string and get the next token.
|
|
|
Reset the parser. Reset the parser so the input can be parsed again from the beginning. |
|
|
Set escaped character extraction flag.
Pass
|
|
|
Set quoted string extraction flag.
Pass
|
|
|
Set input. Sets the input string to be parsed.
|
|
|
Set special token type TT_ANYTHING. This sets up a special token type TT_ANYTHING that will match any characters not matched by any of the previously defined token types.
|
|
|
Set token types.
Set the set of token types recognized by this tokenizer.
|
|
|
Use default token types. Initializes the set of recognized token types with the default token types. |
|
|
Current parsing position in the input string.
|
|
|
Quote character.
|
|
|
Current token.
|
|
|
Position of current token in the input string.
|
|
|
Escape character.
|
|
|
Extract escaped characters flag.
|
|
|
Extract quoted strings flag.
|
|
|
Quote characters.
|
|
|
The input string to be parsed.
|
|
|
Token: Invalid token. (special).
|
|
|
Token: No token. (special).
|
|
|
Initial value: {8,
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ",
false, 0}
|
|
|
Token type: Anything. (special).
|
|
|
Type ID of the TT_ANYTHING token type.
|
|
|
Token type: Default separator characters.
|
|
|
Token type: Escaped character. (special).
|
|
|
Initial value: {6,
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_",
false, 0}
|
|
|
Token type: Invalid token. (special).
|
|
|
Token type: Line terminator.
|
|
|
Token type: No token. (special).
|
|
|
Token type: Number.
|
|
|
Token type: Quoted string. (special).
|
|
|
Token type: Linear whitespace.
|
|
|
Token type map.
|
1.4.6