XML input document encoding

To parse an XML document using the XML PARSE statement, the document must be encoded in a supported encoding.

The supported encodings for a given parse operation depend on the type of the data item that contains the XML document. The parser supports the following types of data items and encodings:

The supported code pages are described in the related reference about the encoding of XML documents.

The parser determines the actual document encoding by examining the first few bytes of the XML document. If the actual document encoding is ASCII or EBCDIC, the parser needs specific code-page information to be able to parse correctly. This additional code-page information is acquired from the document encoding declaration or from the external code-page information.

The document encoding declaration is an optional part of the XML declaration at the beginning of the document. For details, see the related task about specifying the encoding.

The external code page for ASCII XML documents (the external ASCII code page) is the code page indicated by the current runtime locale. The external code page for EBCDIC XML documents (the external EBCDIC code page) is one of these:

If the specified encoding is not one of the supported coded character sets, the parser signals an XML exception event before beginning the parse operation. If the actual document encoding does not match the specified encoding, the parser signals an appropriate XML exception after beginning the parse operation.

To parse an XML document that is encoded in an unsupported code page, first convert the document to national character data (UTF-16) by using the NATIONAL-OF intrinsic function. You can convert the individual pieces of document text that are passed to the processing procedure in special register XML-NTEXT back to the original code page by using the DISPLAY-OF intrinsic function.

XML declaration and white space:

XML documents can begin with white space only if they do not have an XML declaration:

White-space characters have the hexadecimal values shown in the following table.

Table 1. Hexadecimal values of white-space characters
White-space character EBCDIC Unicode / ASCII
Space X'40' X'20'
Horizontal tabulation X'05' X'09'
Carriage return X'0D' X'0D'
Line feed X'25' X'0A'
New line / next line X'15' X'85'

related tasks  
Converting to or from national (Unicode) representation  
Specifying the encoding
Parsing XML documents encoded in UTF-8  
Handling XML PARSE exceptions

related references  
Locales and code pages that are supported
  
The encoding of XML documents  
EBCDIC code-page-sensitive characters in XML markup  
XML PARSE exceptions