You can parse XML documents that are encoded in Unicode UTF-8 in a manner similar to parsing other XML documents. However, some additional requirements apply.
To parse a UTF-8 XML document, code the XML PARSE statement as you would normally for parsing XML documents:
XML PARSE xml-document
PROCESSING PROCEDURE xml-event-handler
. . .
END-XML
The parser returns the XML document fragments in the alphanumeric special register XML-TEXT.
UTF-8 characters are encoded using a variable number of bytes per character. Most COBOL operations on alphanumeric data assume a single-byte encoding, in which each character is encoded in 1 byte. When you operate on UTF-8 characters as alphanumeric data, you must ensure that the data is processed correctly. Avoid operations (such as reference modification and moves that involve truncation) that can split a multibyte character between bytes. You cannot reliably use statements such as INSPECT to process multibyte characters in alphanumeric data.
related concepts
XML-TEXT and XML-NTEXT
related references
CHAR
The encoding of XML documents
XML PARSE statement (COBOL for AIX Language Reference)