Use the Big Data loader utility, pr0hdfs, to load converted files into a Hadoop Distributed Files System (HDFS) cluster. The utility can be invoked from the command line or used by convert actions as part of a convert request that uses the HDFS target file format.
The utility requires the HDFS WebHDFS REST Web service interface, which is usually deployed on the HDFS NameNode (on port 50070) and on the DataNodes (on port 50075). The HDFS administrator must enable WebHDFS, as it is not enabled by default.
For Windows, the utility uses the following files that are in the rt\bin installation directory: pr0hdfs.exe (the executable) and pr0hdfs.jar. For UNIX and Linux, the utility uses the following files that are in the rt/bin installation directory: pr0hdfs (the executable) and pr0hdfs.jar. The executable and .jar files must be located within the same directory and use the same file names.
The utility uses both long parameter names that are supplied with two dashes (--url) and one-character parameter names that are supplied with a single dash (-u). Both formats can be used together. Enter the parameters in any order.
Parameter (long form/short form) | Description |
---|---|
--url/-u | Required. The HTTP URL of the HDFS NameNode. For example, -u http://hostname:50070. |
--source/-s | Required. The source file to load into HDFS. If a relative path is specified, the path is resolved to the current Optim™ working directory. From
the Convert Request Editor, this parameter can be used with the OPTIM_FILE_NAME variable.
Use this variable to reference multiple files that are specified in
the HDFS Comma Separated Value (CSV) tab. Depending
on the convert action, the OPTIM_FILE_NAME variable
will use the following source value:
For example, -s c:\data\sales.csv or -s :OPTIM_FILE_NAME. |
--destination/-d | Required. The destination directory in HDFS. The file name is derived from the source file name. If a relative path is specified, the HDFS home directory is automatically obtained from Hadoop and prepended. The user name is specified in the userName parameter or the operating system user name of the user who executed the loader. For example, -d /user/user_name/input or -d input. |
--log/-l | The name and location of the log file. If the path is invalid or the log file is not writable, the utility stops processing. For example, -l C:\temp\log_files\example.txt.. To
include the log file information in the Convert Process Report, use
the processing options in the Convert Request Editor to define the
following parameters.
|
--overwrite/-o | A yes/no or true/false indication of whether the destination file can be overwritten. The default is no (false). For example, -o yes. |
--userName/-n | The name of the HDFS user. If a name is not
specified, the operating system user name is used. For example, -n hadoop1. |
--table/-t | The name of the table from which the CSV file was created. For use with the pr0hdfs log file. For example, -t CUSTOMERS. |
The following example includes only the required parameters.
pr0hdfs -u http://qrh6032a:50070 -s C:\data\custsomers.csv -d input
The following example uses the OPTIM_FILE_NAME variable for the source parameter.
pr0hdfs -u http://qrh6032a:50070 -s :OPTIM_FILE_NAME -d input
The following example includes log file information.
pr0hdfs -u http://qrh6032a:50070 -s C:\data\custsomers.csv -d input -l C:\temp\log_files\example.txt