Lexical elements

This section describes the lexical elements (regular expressions) in Q. Comments, spaces, tabs, carriage returns, and new lines are ignored between tokens. A comment is a sequence of characters delimited by (* and *). A comment can span more than one line.
Note: Comments cannot be nested.

Punctuation

The following are the punctuation symbols:

= <> ~= ~<> < <= > >= + - * / -> ( ) { } [ ] " => |=> , $

Identifiers (ID)

An identifier is a letter or an underscore followed by any number of letters, digits, or underscores. The case of an identifier is significant.

Keywords

The following are the keywords. Keywords cannot be used as names of variables or functions.

Table 1. Lexical element keywords
and in real
Boolean integer sort
current let string
else map then
false model there_exists
filter not this
for_all of traverse
if or true
implies over  

Integer literals (INTEGER_LITERAL)

An integer literal is an optional minus sign followed by one or more decimal digits.

Real literals (REAL_LITERAL)

A real literal is an optional minus sign followed by one or more decimal digits, a decimal point, and one or more decimal digits.

Boolean literals (BOOLEAN_LITERAL)

The Boolean literals are true and false.

Association literals (ASSOC_LITERAL)

Association literals are essentially string literals with square brackets ([]) in place of quotation marks. The sequence \] escapes the closing delimiter.

String literals (STRING_LITERAL)

A string literal is a sequence of characters enclosed in quotation marks. A string cannot extend across lines (that is, a string cannot contain an unescaped new line). The supported escape sequences are listed in the following table.

Escape Sequence
Replacement Text
\"
\\
\
\t
tab
\r
carriage return
\n
new line

Regular expression literals (REGEXP_LITERAL)

A regular expression literal is a sequence of characters enclosed in back quotes. Several characters have special meanings within regular expression literals. These characters have their special meaning unless preceded by a back slash. The special characters are ., \, [, ], ?, *, +, ^, and $. A regular expression literal can contain character escape sequences. The allowed escape sequences are \t, \r, \n, and \xdd, which stand for a tab, a carriage return, a new line, and the character with the ASCII code equal to the hexadecimal number dd.

Regular expressions have a recursive structure. They are formed from smaller subexpressions. The building blocks of all regular expressions are the expressions to match a single character. These fundamental expressions have the following three forms:

  • Period (.) is a regular expression that matches any single character. For example, `.` matches a, 5, #, \n, ", and so on.
  • Any character other than a special character, or a special character preceded by a back slash, is a regular expression that matches that character. For example, `a`, `5`, and `\*` match a, 5, and *, respectively.
  • A set of characters enclosed in square brackets is a regular expression that matches any one character in the set. For example, `[abc]` matches any one of a, b, or c. If the first character in the set is a caret (^), then the regular expression matches the complement of the given set of characters. So `[^abc]` matches any character other than a, b, and c. As a matter of convenience, a range of characters can be specified with a dash. The range includes all characters between the lower and upper bounds, inclusively. For example, `[a-zA-Z0-9]` matches any letter or digit.

From these building blocks, larger regular expressions can be formed in the following way:

  • If re1 and re2 are regular expressions, then re1 re2 (concatenation) is a regular expression that matches all strings of the form s1s2, where s1 is matchable by re1 and s2 is matchable by re2. For example, `[ab][01]` matches a0, a1, b0, and b1.
  • If re is a regular expression, then re? is a regular expression that matches zero or one occurrence of re. For example, `ab?` matches a or ab, and `a[01]?` matches a, a0, or a1.
  • If re is a regular expression, then re* is a regular expression that matches zero or more occurrences of re. For example, `ab*` matches a, ab, abb, abbb, …, and `a[01]*` matches a, a0, a1, a00, a01, a11, ….
  • If re is a regular expression, then re+ is a regular expression that matches one or more occurrences of re. For example, `ab+` matches ab, abb, abbb, …, and `a[01]+` matches a0, a1, a00, a01, a11, ….
    Note: The postfix operators ?, *, and + bind more tightly than concatenation. Therefore, ab* means a(b*) and not (ab)*.

A complete regular expression can be anchored to the beginning or end of a string with ^ and $, respectively. If re is a regular expression, then ^re is a regular expression that matches all strings matchable by re but only if they occur at the beginning of a string. Similarly, re$ is a regular expression that matches all strings matchable by re but only if they occur at the end of a string. For example, `^[01]+` matches 0 and 0110 but not a0 or a0110; and `[01]+$` matches 0 and 0110 but not 0a or 0110a.


Feedback