Big Data loader utility

Use the Big Data loader utility, pr0hdfs, to load converted files into a Hadoop Distributed Files System (HDFS) cluster. The utility can be invoked from the command line or used by convert actions as part of a convert service that uses the HDFS target file format.

The utility requires the HDFS WebHDFS REST Web service interface, usually deployed on the HDFS NameNode (on port 50070) and on the DataNodes (on port 50075). The HDFS administrator must enable WebHDFS, as it is not enabled by default.

For Windows, the utility uses the following files located in the rt\bin installation directory: pr0hdfs.exe (the executable) and pr0hdfs.jar. For UNIX and LINUX, the utility uses the following files located in the rt/bin installation directory: pr0hdfs (the executable) and pr0hdfs.jar. The executable and .jar files must be located within the same directory and use the same file names.

The utility uses both long parameter names supplied with two dashes (--url) and one-character parameter names supplied with a single dash (-u). Both formats can be used together. Enter the parameters in any order.

Parameter (long form/short form) Description
--url/-u

Required. The HTTP URL of the HDFS NameNode.

For example, -u http://hostname:50070.

--source/-s

Required. The source file to load into HDFS.

If a relative path is specified, the path is resolved to the current Optim™ working directory.

From the Convert Service Editor, this parameter can be used with the OPTIM_FILE_NAME variable. Use this variable to reference multiple files specified in the Object Files tab. Depending on the convert action, the OPTIM_FILE_NAME variable will use the following source value:
  • For the start of convert process and end of convert process actions, the source value is taken from the Target file name field in the Target File Options tab of the Convert Service Editor.
  • For the start of table and end of table actions, the source value is taken from the File Name column for objects in the Object Files tab. If a file name is not specified for an object in the Object Files tab, the source value is taken from the Target file name field in the Target File Options tab of the Convert Service Editor.

For example, -s c:\data\sales.csv or -s :OPTIM_FILE_NAME .

--destination/-d

Required. The destination directory in HDFS. The file name is derived from the source file name.

If a relative path is specified, the HDFS home directory is automatically obtained from Hadoop and prepended. The user name is specified in the userName parameter or the operating system user name of the user executing the loader.

For example, -d /user/user_name/input or -d input.

--log/-l

The name and location of the log file. If the path is invalid or the log file is not writable, the utility will stop processing.

For example, -l C:\temp\log_files\example.txt..

To include the log file information in the Convert Process Report, use the processing options in the Convert Service Editor to define the following parameters.
Keyword
-log/-l
Value
The name and location of the log file.
Classification
Log/Report File Name
If the log file information is entered in the processing options, the log file information will be appended to the command line statement.
--overwrite/-o

A yes/no or true/false indication of whether the destination file may be overwritten. The default is no (false).

For example, -o yes.

--userName/-n The name of the HDFS user. If a name is not specified, the operating system user name will be used.

For example, -n hadoop1.

--table/-t

The name of the table from which the CSV file was created. For use with the pr0hdfs log file.

For example, -t CUSTOMERS.

Examples

The following example includes only the required parameters.

pr0hdfs -u http://qrh6032a:50070 -s C:\data\custsomers.csv -d input

The following example uses the OPTIM_FILE_NAME variable for the source parameter.

pr0hdfs -u http://qrh6032a:50070 -s :OPTIM_FILE_NAME -d input

The following example includes log file information.

pr0hdfs -u http://qrh6032a:50070 -s C:\data\custsomers.csv -d input -l C:\temp\log_files\example.txt



Feedback