When used in the context of user-defined words, the term multibyte refers to the following types of words:
- Words formed of DBCS characters, possibly combined with single-byte
characters
- Words formed of UTF-8 characters that are composed of one or more bytes
- Words formed of EUC characters that are composed of one or more bytes.
The following are the rules for forming user-defined words with multibyte
characters:
- Contained characters
-
A user-defined word can consist of both
single-byte and multibyte characters. If a character exists in both single-byte and
multibyte forms, its single-byte and multibyte representations are not
equivalent.
The single-byte characters in the user-defined word are limited to the
following characters:
- Latin letters uppercase A through Z
- Latin letters lowercase a through z
- digits 0 through 9
- - (hyphen)
- _ (underscore)
The single-byte encoded hyphen cannot appear as the first or last character in
such words.
The single-byte encoded underscore cannot appear as the first character in
such words.
- Uppercase and lowercase letters
- In COBOL words, each lowercase single-byte encoded character "a" through "z"
is considered to be equivalent to its corresponding single-byte encoded uppercase
character. Multibyte-encoded uppercase and lowercase letters are not
equivalent.
- Value range
- Valid value ranges for multibyte
characters depend on the specific code page being used.
- Maximum length
-
30 bytes. The number of characters that you can specify in 30 bytes
varies depending on the source code page and the characters used in the user-defined
word.
- Continuation
- Words formed with multibyte characters cannot be continued across lines.
- Use of shift-out and shift-in characters
-
Applicable only when the dummy shift-out/shift-in
(SOSI) compiler option is in effect.
See
SOSI in
the COBOL for AIX Programming Guide for
details of the SOSI compiler option.