Use the Big Data loader utility, pr0hdfs, to load converted files into a Hadoop Distributed Files System (HDFS) cluster. The utility can be invoked from the command line or used by convert actions as part of a convert service that uses the HDFS target file format.
The utility requires the HDFS WebHDFS REST Web service interface, usually deployed on the HDFS NameNode (on port 50070) and on the DataNodes (on port 50075). The HDFS administrator must enable WebHDFS, as it is not enabled by default.
For Windows, the utility uses the following files located in the rt\bin installation directory: pr0hdfs.exe (the executable) and pr0hdfs.jar. For UNIX and LINUX, the utility uses the following files located in the rt/bin installation directory: pr0hdfs (the executable) and pr0hdfs.jar. The executable and .jar files must be located within the same directory and use the same file names.
The utility uses both long parameter names supplied with two dashes (--url) and one-character parameter names supplied with a single dash (-u). Both formats can be used together. Enter the parameters in any order.
| Parameter (long form/short form) | Description |
|---|---|
| --url/-u | Required. The HTTP URL of the HDFS NameNode. For example, -u http://hostname:50070. |
| --source/-s | Required. The source file to load into HDFS. If a relative path is specified, the path is resolved to the current Optim™ working directory. From
the Convert Service Editor, this parameter can be used with the OPTIM_FILE_NAME
variable. Use this variable to reference multiple files specified
in the Object Files tab. Depending on the convert
action, the OPTIM_FILE_NAME variable will use the following source
value:
For example, -s c:\data\sales.csv or -s :OPTIM_FILE_NAME . |
| --destination/-d | Required. The destination directory in HDFS. The file name is derived from the source file name. If a relative path is specified, the HDFS home directory is automatically obtained from Hadoop and prepended. The user name is specified in the userName parameter or the operating system user name of the user executing the loader. For example, -d /user/user_name/input or -d input. |
| --log/-l | The name and location of the log file. If the path is invalid or the log file is not writable, the utility will stop processing. For example, -l C:\temp\log_files\example.txt.. To
include the log file information in the Convert Process Report, use
the processing options in the Convert Service Editor to define the
following parameters.
|
| --overwrite/-o | A yes/no or true/false indication of whether the destination file may be overwritten. The default is no (false). For example, -o yes. |
| --userName/-n | The name of the HDFS user. If a name is not
specified, the operating system user name will be used. For example, -n hadoop1. |
| --table/-t | The name of the table from which the CSV file was created. For use with the pr0hdfs log file. For example, -t CUSTOMERS. |
The following example includes only the required parameters.
pr0hdfs -u http://qrh6032a:50070 -s C:\data\custsomers.csv -d input
The following example uses the OPTIM_FILE_NAME variable for the source parameter.
pr0hdfs -u http://qrh6032a:50070 -s :OPTIM_FILE_NAME -d input
The following example includes log file information.
pr0hdfs -u http://qrh6032a:50070 -s C:\data\custsomers.csv -d input -l C:\temp\log_files\example.txt