XML GENERATE statement

The XML GENERATE statement converts data to XML format.

Read syntax diagramSkip visual syntax diagram
Format

>>-XML GENERATE--identifier-1--FROM--identifier-2--------------->

>--+-----------------------------+------------------------------>
   '-COUNT--+----+--identifier-3-'   
            '-IN-'                   

>--+------------------------------+----------------------------->
   '-+------+--ENCODING--codepage-'   
     '-WITH-'                         

>--+---------------------------+--+----------------------+------>
   '-+------+--XML-DECLARATION-'  '-+------+--ATTRIBUTES-'   
     '-WITH-'                       '-WITH-'                 

>--+-------------------------------------------------------------------------------------+-->
   '-NAMESPACE--+----+--+-identifier-4-+--+--------------------------------------------+-'   
                '-IS-'  '-literal-4----'  '-NAMESPACE-PREFIX--+----+--+-identifier-5-+-'     
                                                              '-IS-'  '-literal-5----'       

>--+-------------------------------------------+---------------->
   '-+----+--EXCEPTION--imperative-statement-1-'   
     '-ON-'                                        

>--+------------------------------------------------+----------->
   '-NOT--+----+--EXCEPTION--imperative-statement-2-'   
          '-ON-'                                        

>--+---------+-------------------------------------------------><
   '-END-XML-'   

identifier-1
The receiving area for a generated XML document. identifier-1 must reference one of the following:
  • An elementary data item of category alphanumeric
  • An alphanumeric group item
  • An elementary data item of category national
  • A national group item

When identifier-1 references a national group item, identifier-1 is processed as an elementary data item of category national. When identifier-1 references an alphanumeric group item, identifier-1 is treated as though it were an elementary data item of category alphanumeric.

identifier-1 must not be described with the JUSTIFIED clause, and cannot be a function identifier. identifier-1 can be subscripted or reference modified.

identifier-1 must not overlap identifier-2, identifier-3, codepage (if an identifier), identifier-4, or identifier-5.

Either identifier-1 must reference a data item of category national, or if identifier-1 is category alphanumeric, the document encoding must be Unicode UTF-8, if any of the following statements are true:

  • identifier-4 or identifier-5 references a data item of category national.
  • literal-4 or literal-5 is of category national.
  • codepage is a national literal or references a data item of category national.
  • The generated XML includes data from identifier-2 for:
    • Any data item of class national or class DBCS
    • Any data item with a multibyte name (that is, a data item whose name consists of multibyte characters)
    • Any data item of class alphanumeric that contains multibyte characters

The encoding of the generated XML output is described further in the documentation of the ENCODING phrase below.

identifier-1 must be large enough to contain the generated XML document. Typically, it should be from five to 10 times the size of identifier-2, depending on the length of the data-name or data-names within identifier-2. If identifier-1 is not large enough, an error condition exists at the end of the XML GENERATE statement.

identifier-2
The group or elementary data item to be converted to XML format.

If identifier-2 references a national group item, identifier-2 is processed as a group item. When identifier-2 includes a subordinate national group item, that subordinate item is processed as a group item.

identifier-2 cannot be a function identifier or be reference modified, but it can be subscripted.

identifier-2 must not overlap identifier-1 or identifier-3.

identifier-2 must not specify the RENAMES clause.

The following data items specified by identifier-2 are ignored by the XML GENERATE statement:

  • Any subordinate unnamed elementary data items or elementary FILLER data items
  • Any slack bytes inserted for SYNCHRONIZED items
  • Any data item subordinate to identifier-2 that is described with the REDEFINES clause or that is subordinate to such a redefining item
  • Any data item subordinate to identifier-2 that is described with the RENAMES clause
  • Any group data item all of whose subordinate data items are ignored

All data items specified by identifier-2 that are not ignored according to the rules above must satisfy the following conditions:

  • Each elementary data item must either have class alphabetic, alphanumeric, numeric, or national, or be an index data item. (That is, no elementary data item can be described with the USAGE POINTER, USAGE FUNCTION-POINTER, USAGE PROCEDURE-POINTER, or USAGE OBJECT REFERENCE phrase.)
  • There must be at least one such elementary data item.
  • Each non-FILLER data-name must be unique within any immediately superordinate group data item.
  • Any multibyte data-names, when converted to Unicode, must be legal as names in the XML specification, version 1.0. For details about the XML specification, see XML specification.
  • The data items must not specify the DATE FORMAT clause, or the DATEPROC compiler option must not be in effect.

For example, consider the following data declaration:

01 STRUCT.
  02 STAT PIC X(4).
  02 IN-AREA PIC X(100).
  02 OK-AREA REDEFINES IN-AREA.
    03 FLAGS PIC X.
    03 PIC X(3).
    03 COUNTER USAGE COMP-5 PIC S9(9).
    03 ASFNPTR REDEFINES COUNTER USAGE FUNCTION-POINTER.
    03 UNREFERENCED PIC X(92).
  02 NG-AREA1 REDEFINES IN-AREA.
    03 FLAGS PIC X.
    03 PIC X(3).
    03 PTR USAGE POINTER.
    03 ASNUM REDEFINES PTR USAGE COMP-5 PIC S9(9).
    03 PIC X(92).
  02 NG-AREA2 REDEFINES IN-AREA.
    03 FN-CODE PIC X.
    03 UNREFERENCED PIC X(3).
    03 QTYONHAND USAGE BINARY PIC 9(5).
    03 DESC USAGE NATIONAL PIC N(40).
    03 UNREFERENCED PIC X(12).

The following data items from the example above can be specified as identifier-2:

  • STRUCT, of which subordinate data items STAT and IN-AREA would be converted to XML format. (OK-AREA, NG-AREA1, and NG-AREA2 are ignored because they specify the REDEFINES clause.)
  • OK-AREA, of which subordinate data items FLAGS, COUNTER, and UNREFERENCED would be converted. (The item whose data description entry specifies 03 PIC X(3) is ignored because it is an elementary FILLER data item. ASFNPTR is ignored because it specifies the REDEFINES clause.)
  • Any of the elementary data items that are subordinate to STRUCT except:
    • ASFNPTR or PTR (disallowed usage)
    • UNREFERENCED OF NG-AREA2 (nonunique names for data items that are otherwise eligible)
    • Any FILLER data items

The following data items cannot be specified as identifier-2:

  • NG-AREA1, because subordinate data item PTR specifies USAGE POINTER but does not specify the REDEFINES clause. (PTR would be ignored if it specified the REDEFINES clause.)
  • NG-AREA2, because subordinate elementary data items have the nonunique name UNREFERENCED.
COUNT IN phrase
If the COUNT IN phrase is specified, identifier-3 contains (after execution of the XML GENERATE statement) the count of generated XML character encoding units. If identifier-1 (the receiver) has category national, the count is in UTF-16 character encoding units. For all other encodings (including UTF-8), the count is in bytes.
identifier-3
The data count field. Must be an integer data item defined without the symbol P in its picture string.

identifier-3 must not overlap identifier-1, identifier-2, codepage (if an identifier), identifier-4, or identifier-5.

ENCODING phrase
The ENCODING phrase, if specified, determines the encoding of the generated XML document.
codepage
Must be an unsigned integer data item, an unsigned integer literal, an alphanumeric literal, a national literal, or reference a data item of category alphanumeric or national. If codepage is a literal, it must not be a figurative constant.

If codepage is of class alphanumeric or national, it must identify a primary or alias code-page name that is supported by ICU conversion libraries (see International Components for Unicode: Converter Explorer). If codepage is an integer, the integer must be a valid CCSID number.

If identifier-1 references a data item of category national, codepage must identify UTF-16 in big-endian format.

If identifier-1 references a data item of category alphanumeric, codepage must identify UTF-8 or a single-byte ASCII or EBCDIC code page:
  • If the CHAR(EBCDIC) compiler option is not in effect or the data description entry for identifier-1 contains the NATIVE phrase, codepage must identify UTF-8 or a single-byte ASCII code page.
  • If CHAR(EBCDIC) is in effect and the data description entry for identifier-1 does not contain the NATIVE phrase, codepage must identify a single-byte EBCDIC code page.

If codepage is an identifier, it must not overlap identifier-1 or identifier-3.

If the ENCODING phrase is omitted and identifier-1 is of category national, the document encoding is Unicode UTF-16 in big-endian format.

If the ENCODING phrase is omitted and identifier-1 is of category alphanumeric, then:
  • If CHAR(EBCDIC) is not in effect or the data description entry for identifier-1 contains the NATIVE phrase, the XML document is encoded using the code page from the runtime locale.
  • If CHAR(EBCDIC) option is in effect and the data description entry for identifier-1 does not contain the NATIVE phrase, the XML document is encoded using the code page from the EBCDIC_CODEPAGE environment variable. If EBCDIC_CODEPAGE is not set, the encoding is the default EBCDIC code page associated with the runtime locale.
XML-DECLARATION phrase
If the XML-DECLARATION phrase is specified, the generated XML document starts with an XML declaration that includes the XML version information and an encoding declaration.

If identifier-1 is of category national, the encoding declaration has the value UTF-16 (encoding="UTF-16").

If identifier-1 is of category alphanumeric, the encoding declaration is derived from the ENCODING phrase, if specified, or from the runtime locale or the EBCDIC_CODEPAGE environment variable if the ENCODING phrase is not specified. See the description of the ENCODING phrase for further details.

For an example of the effect of coding the XML-DECLARATION phrase, see Generating XML output in the COBOL for AIX Programming Guide.

If the XML-DECLARATION phrase is omitted, the generated XML document does not include an XML declaration.

ATTRIBUTES phrase
If the ATTRIBUTES phrase is specified, each eligible item included in the generated XML document is expressed as an attribute of the XML element that corresponds to the data item immediately superordinate to that eligible item, rather than as a child element of the XML element. To be eligible, a data item must be elementary, must have a name other than FILLER, and must not specify an OCCURS clause in its data description entry.

For an example of the effect of the ATTRIBUTES phrase, see Generating XML output in the COBOL for AIX Programming Guide.

NAMESPACE and NAMESPACE-PREFIX phrases
Use the NAMESPACE phrase to identify a namespace for the generated XML document. If the NAMESPACE phrase is not specified, or if identifier-4 has length zero or contains all spaces, the element names of XML documents produced by the XML GENERATE statement are not in any namespace.

Use the NAMESPACE-PREFIX phrase to qualify the start and end tag of each element in the generated XML document with a prefix.

If the NAMESPACE-PREFIX phrase is not specified, or if identifier-5 is of length zero or contains all spaces, the namespace specified by the NAMESPACE phrase specifies the default namespace for the document. In this case, the namespace declared on the root element applies by default to each element name in the document, including that of the root element. (Default namespace declarations do not apply directly to attribute names.)

If the NAMESPACE-PREFIX phrase is specified, and identifier-5 is not of length zero and does not contain all spaces, then the start and end tag of each element in the generated document is qualified with the specified prefix. The prefix should therefore preferably be short. When the XML GENERATE statement is executed, the prefix must be a valid XML name, but without the colon (:), as defined in Namespaces in XML 1.0. The prefix can have trailing spaces, which are removed before use.

identifier-4, literal-4; identifier-5, literal-5
identifier-4, literal-4: The namespace identifier, which must be a valid Uniform Resource Identifier (URI) as defined in Uniform Resource Identifier (URI): Generic Syntax.

identifier-5, literal-5: The namespace prefix, which serves as an alias for the namespace identifier.

identifier-4 and identifier-5 must reference data items of category alphanumeric or national.

identifier-4 and identifier-5 must not overlap identifier-1 or identifier-3.

literal-4 and literal-5 must be of category alphanumeric or national, and must not be figurative constants.

For full details about namespaces, see Namespaces in XML 1.0.

For examples that show the use of the NAMESPACE and NAMESPACE-PREFIX phrases, see Generating XML output in the COBOL for AIX Programming Guide.

ON EXCEPTION phrase
An exception condition exists when an error occurs during generation of the XML document, for example if identifier-1 is not large enough to contain the generated XML document. In this case, XML generation stops and the content of the receiver, identifier-1, is undefined. If the COUNT IN phrase is specified, identifier-3 contains the number of character positions that were generated, which can range from 0 to the length of identifier-1.

If the ON EXCEPTION phrase is specified, control is transferred to imperative-statement-1. If the ON EXCEPTION phrase is not specified, the NOT ON EXCEPTION phrase, if any, is ignored, and control is transferred to the end of the XML GENERATE statement. Special register XML-CODE contains an exception code, as detailed in Handling XML GENERATE exceptions in the COBOL for AIX Programming Guide.

NOT ON EXCEPTION phrase
If an exception condition does not occur during generation of the XML document, control is passed to imperative-statement-2, if specified, otherwise to the end of the XML GENERATE statement. The ON EXCEPTION phrase, if specified, is ignored. Special register XML-CODE contains zero after execution of the XML GENERATE statement.
END-XML phrase
This explicit scope terminator delimits the scope of XML GENERATE or XML PARSE statements. END-XML permits a conditional XML GENERATE or XML PARSE statement (that is, an XML GENERATE or XML PARSE statement that specifies the ON EXCEPTION or NOT ON EXCEPTION phrase) to be nested in another conditional statement.

The scope of a conditional XML GENERATE or XML PARSE statement can be terminated by:

  • An END-XML phrase at the same level of nesting
  • A separator period

END-XML can also be used with an XML GENERATE or XML PARSE statement that does not specify either the ON EXCEPTION or the NOT ON EXCEPTION phrase.

For more information on explicit scope terminators, see Delimited scope statements.

Nested XML GENERATE or XML PARSE statements

When a given XML GENERATE or XML PARSE statement appears as imperative-statement-1 or imperative-statement-2, or as part of imperative-statement-1 or imperative-statement-2 of another XML GENERATE or XML PARSE statement, that given XML GENERATE or XML PARSE statement is a nested XML GENERATE or XML PARSE statement.

Nested XML GENERATE or XML PARSE statements are considered to be matched XML GENERATE and END-XML combinations, or XML PARSE and END-XML combinations, proceeding from left to right. Thus, any END-XML phrase that is encountered is matched with the nearest preceding XML GENERATE or XML PARSE statement that has not been implicitly or explicitly terminated.

Operation of XML GENERATE

The content of each eligible elementary data item within identifier-2 is converted to character format as described under Format conversion of elementary data and Trimming of generated XML data. Only the first definition of each storage area is processed. Redefinitions of data items are not included. Data items that are effectively defined by the RENAMES clause are also not included.

The converted content is then inserted as element character content, or, if the ATTRIBUTES phrase is specified and the data item is eligible to be expressed as an attribute, as the value of the attribute, in the generated XML document.

The XML element names and attribute names are derived from the data-names within identifier-2 as described under XML element name and attribute name formation. The names of group items that contain the selected elementary items are retained as parent elements. If the NAMESPACE-PREFIX phrase is specified, the prefix value, minus any trailing spaces, is used to qualify the start and end tag of each element.

No extra white space (new lines, indentation, and so forth) is inserted to make the generated XML more readable. An XML declaration is generated if the XML-DECLARATION phrase is specified.

If the receiving area specified by identifier-1 is not large enough to contain the resulting XML document, an error condition exists. See the description of the ON EXCEPTION phrase above for details.

If identifier-1 is longer than the generated XML document, only that part of identifier-1 in which XML is generated is changed. The rest of identifier-1 contains the data that was present before this execution of the XML GENERATE statement. To avoid referring to that data, either initialize identifier-1 to spaces before the XML GENERATE statement or specify the COUNT IN phrase.

If the COUNT IN phrase is specified, identifier-3 contains (after execution of the XML GENERATE statement) the total number of character positions (UTF-16 encoding units or bytes) that were generated. You can use identifier-3 as a reference modification length field to refer to the part of identifier-2 that contains the generated XML document.

After execution of the XML GENERATE statement, special register XML-CODE contains either zero, which indicates successful completion, or a nonzero exception code. For details, see Handling XML GENERATE exceptions in the COBOL for AIX Programming Guide.

The XML PARSE statement also uses special register XML-CODE. Therefore if you code an XML GENERATE statement in the processing procedure of an XML PARSE statement, save the value of XML-CODE before that XML GENERATE statement executes and restore the saved value after the XML GENERATE statement terminates.

A byte order mark is not generated for XML documents that have Unicode encoding.

Format conversion of elementary data

Elementary data items within identifier-2 are converted in a sequence of several steps, some of them optional, as described below.

Conversion to character format:

Elementary data items are converted to character format depending on the type of the data item:

  • Data items of category alphabetic, alphanumeric, alphanumeric-edited, DBCS, external floating-point, national, national-edited, and numeric-edited are not converted.
  • Fixed-point numeric data items other than COMPUTATIONAL-5 (COMP-5) binary data items or binary data items compiled with the TRUNC(BIN) compiler option are converted as if they were moved to a numeric-edited item that has:
    • As many integer positions as the numeric item has, but with at least one integer position
    • An explicit decimal point, if the numeric item has at least one decimal position
    • The same number of decimal positions as the numeric item has
    • A leading '-' picture symbol if the data item is signed (has an S in its PICTURE clause)
  • COMPUTATIONAL-5 (COMP-5) binary data items or binary data items compiled with the TRUNC(BIN) compiler option are converted in the same way as the other fixed-point numeric items, except for the number of integer positions. The number of integer positions is computed depending on the number of '9' symbols in the picture character string as follows:
    • 5 minus the number of decimal places, if the data item has 1 to 4 '9' picture symbols
    • 10 minus the number of decimal places, if the data item has 5 to 9 '9' picture symbols
    • 20 minus the number of decimal places, if the data item has 10 to 18 '9' picture symbols
  • Internal floating-point data items are converted as if they were moved to a data item as follows:
    • For COMP-1: an external floating-point data item with PICTURE -9.9(8)E+99
    • For COMP-2: an external floating-point data item with PICTURE -9.9(17)E+99 (illegal because of the number of digit positions)

    For native (IEEE) floating-point items, the special values positive infinity, negative infinity, and NaN (not-a-number) are represented as 'INF', '-INF', and 'NaN', respectively.

  • Index data items are converted as if they were declared USAGE COMP-5 PICTURE S9(9).

Trimming:

After any conversion to character format, leading and trailing spaces and leading zeroes are eliminated, as described under Trimming of generated XML data.

Conversion to the document encoding:

All values are converted as necessary to the encoding of the document, as follows. If identifier-1 is:
  • Category alphanumeric, and either the NATIVE phrase was specified in the data description or CHAR(NATIVE) was in effect, values are converted to UTF-8 or to the chosen ASCII code page.
  • Category alphanumeric, and the NATIVE phrase was not specified in the data description, and CHAR(EBCDIC) was in effect, values are converted to the chosen EBCDIC code page.
  • Category national: Any nonnational values are converted to national format.

Conversion of special characters to XML references:

Any remaining instances of the five characters & (ampersand), ' (apostrophe), > (greater-than sign), < (less-than sign), and " (quotation mark) are converted into the equivalent XML references '&amp;', '&apos;', '&gt;', '&lt;', and '&quot;', respectively.

Replacement of out-of-range Unicode characters:

Any remaining Unicode character that has a Unicode scalar value greater than x'FFFF' is replaced by an XML character reference. For example, if the document contains a character with Unicode scalar value x'10813', in UTF-16, that value is represented by the surrogate pair (NX'D802', NX'DC13'), which is replaced by the reference '&#x10813;'. For a document encoding of UTF-8, the byte sequence that is equivalent to character reference '&#x10813;' is X'F090A093'.

Trimming of generated XML data

Trimming is performed on data values after their conversion to character format. (Conversion is described under Format conversion of elementary data.)

For values converted from signed numeric values, the leading space is removed if the value is positive.

For values converted from numeric items, leading zeroes (after any initial minus sign) up to but not including the digit immediately before the actual or implied decimal point are eliminated. Trailing zeroes after a decimal point are retained. For example:

  • -012.340 becomes -12.340.
  • 0000.45 becomes 0.45.
  • 0013 becomes 13.
  • 0000 becomes 0.

Character values from data items of class alphabetic, alphanumeric, DBCS, and national have either trailing or leading spaces removed, depending on whether the corresponding data items have left (default) or right justification, respectively. That is, trailing spaces are removed from values whose corresponding data items do not specify the JUSTIFIED clause. Leading spaces are removed from values whose data items do specify the JUSTIFIED clause. If a character value consists solely of spaces, one space remains as the value after trimming is finished.

XML element name and attribute name formation

In the XML documents that are generated from identifier-2, the XML element names and attribute names are derived from the names of the data item specified by identifier-2 and from any eligible data-names that are subordinate to identifier-2 as follows:

  • The exact mixed-case spelling of data-names from the data description entry is retained. The spellings from any references to data items (for example, in an OCCURS DEPENDING ON clause) are not used.
  • Data-names that start with a digit are prefixed by an underscore. For example, the data-name '3D' becomes XML tag or attribute name '_3D'.
  • Data-names that start with the characters 'xml', in any combination of uppercase and lowercase, are prefixed by an underscore. For example, the data-name 'Xml' becomes XML tag or attribute name '_Xml'.

Multibyte data-names, when translated to Unicode, must be legal as names in the XML specification, version 1.0. For details about the XML specification, see XML specification.