HStreaming provides a command shell which is similar to the hadoop shell. The HStreaming Shell can be invoked using the command hstreaming. Following is the help output:
$ hstreaming hstreaming COMMAND where COMMAND is one of: jar <jarFile> [mainClass] args... run a jar file list <comma separated stream tags>... list stream endpoint addresses streamgen <apachelog|netflow> args... run sample stream generator example <twitterwordcount|wordcount|grep> args... run example
The shell simplifies launching streaming processes and allows to interrogate running streaming jobs.
The jar command is similar to the hadoop jar command and allows to launch a class in a provided jar file. HStreaming Shell automatically adds the streaming libraries to the classpath so that the user can launch a streaming job from the command line without requiring the libraries to be deployed in the cluster.
The list command displays the streaming endpoints for stream tags (see also: Stream Endpoints). Typically, the stream tag will be the URL specified when adding the output stream respectively the inbound server streams. Stream tags can be specified as a comma-separated list as shown in the following example:
$ hstreaming list http://localhost:40000
Listing streams for user mapred job_201112120652_0013: OUTBOUND stream [http://localhost:40000]: TASK attempt_201112120652_0013_r_000000_0, ENDPOINT http://node1:40000
If the the stream tag is omitted, the command will return all streams for the current user:
$ hstreaming list
Listing streams for user mapred job_201112120652_0013: OUTBOUND stream: TASK attempt_201112120652_0013_r_000000_0, ENDPOINT http://node1:40000 job_201112120652_0014: INBOUND stream: TASK attempt_201112120652_0014_m_000000_0, ENDPOINT udp://node1:10000 OUTBOUND stream: TASK attempt_201112120652_0014_r_000000_0, ENDPOINT tcp://node1:50000
The parameters -j <jobID> and -u <username> can be used to display streams of a given job respectively to change the particular user to display streams for:
$ hstreaming list -u mapred -j job_201112120652_0013
Listing streams for user mapred and job job_201112120652_0013 job_201112120652_0013: OUTBOUND stream: TASK attempt_201112120652_0013_r_000000_0, ENDPOINT http://node1:40000
The streamgen command allows to generate synthetic workload, e.g., to drive workload for the examples. Streamgen replays various data streams including Apache Log files and netflow files.
$ hstreaming streamgen hstreaming streamgen <apachelog|netflow|streamdefinition> [hostname] apachelog feed stream from apachelog sample file netflow feed stream from netflow sample file streamdefinition use custom stream definition (see below) hostname host to stream to/from (only valid for apachelog/netflow)
The source code for the stream generator is installed in /usr/share/doc/hstreaming/examples and can be compiled using ant.
HStreaming comes with a variety of examples which are derived from the standard Hadoop’s examples library. The examples can be used to test the streaming system and also to get familiar with coding to the native MapReduce API. The following command shows how to run the Twitter wordcount example:
$ hstreaming example twitterwordcount USERNAME PASSWORD http://twitterfeed:50000
Please replace USERNAME and PASSWORD with your Twitter credentials. The result of the wordcount is exported under the stream identifier “twitterfeed” on port 50000. If you run Hadoop in pseudo-distributed mode you can use a web browser and connect to port 50000 (e.g., http://localhost:50000). If you use a multi-node cluster, you need to determine the node which runs the job either using the job tracker UI or the HStreaming Shell using the command:
$ hstreaming list http://twitterfeed:50000
Alternatively, if you installed the jobtracker plugin, the jobtracker web UI will display a clickable link to the endpoint. Please refer to the HStreaming Installation Guide for instructions for installing the user interface extensions.
The source code for all examples is installed in /usr/share/doc/hstreaming/examples and can be compiled using ant.