The following tables list common elements of lexemes.
Concept |
Rule |
Representation |
Description |
---|---|---|---|
Decimal digit | DG | [0-9] | One character from '0'..'9'. |
Octal digit | OC | [0-7] | One character from '0'..'7'. |
Hexadecimal digit | HX | [0-9a-fA-F] | Any of the characters '0'..'9' and any of the letters 'A'..'F' and 'a'..'f'. |
Single letter | LT | [A-Za-z_$] | Any of the characters 'A'..'Z', 'a'..'z', and the underscore (_) and dollar sign ($) characters. |
Single
letter from the International Character Set |
LT18N | [A-Za-z_$\200-\377] | Any of the characters 'A'..'Z', 'a'..'z', the underscore (_) and dollar sign ($) characters, and any character in the top half of the 8-bit character set. |
Shell 'word' | WD | [^ \t;\n'"] | Any character except space, tab, semicolon (;), linefeed, less than (<), greater than (>), and quotes (' or "). |
File name | FL | [^ \t\n\}\;\>\<] | Any character except space, tab, semicolon (;), linefeed, right brace (}), less than (<), greater than (>), and tick (`). |
Optional exponent | Exponent | [eE][+-]?{DG}+ | Numbers often allow an optional exponent. It is represented as an 'e' or 'E' followed by an optional plus (+) or minus (-), and then one or more decimal digits. |
Whitespace | Whitespace | [ \t]+ | Whitespace is often used to separate two lexemes that would otherwise be misconstrued as a single lexeme. For example, stop in is two keywords, but stopin is an identifier. Apart from this separating property, Whitespace is usually ignored. Whitespace is a sequence of one or more tabs or spaces. |
String literal | stringChar | ([^"\\\n]|([\\]({simpleEscape}| {octalEscape}|{hexEscape}))) |
Any character except the terminating quote character ("), or a newline (\n). If the character is a backslash (\), it is followed by an escaped sequence of characters. |
Character literal | charChar | ([^'\\\n]|([\\]({simpleEscape}| {octalEscape}|{hexEscape}))) |
Any character except the terminating quote (') character, or a newline (\n). If the character is a backslash (\), it is followed by an escaped sequence of characters. |
Environment variable identifier | EID | [^ \t\n;='"&\|] | Any character except space, tab, linefeed, less-than (<), greater-than (>), semicolon (;), equal sign (=), quotes (' or "), ampersand (&), backslash (\), and bar (|). |
Universal character name | UCN | \\u{HX}{4}|\\U{HX}{8} | A universal character name is a backslash (\) followed by either a lowercase 'u' and 4 hexadecimal digits, or an uppercase 'U' and 8 hexadecimal digits. |
The escaped sequence of characters can be one of following three forms: