Resources

Contents

HStreaming Frequently Asked Questions

General

Q: What is HStreaming?

HStreaming is the most scalable Real-time Analytics Platform powered by Hadoop. HStreaming’s technology combines complex event processing (CEP) capabilities with MapReduce. It supports real-time data processing and analytics as well as standard batch processing, ETL, storage and archival on one consolidated platform. Real-time analytics can be quickly converted into rich visualization and real-time dashboards.

Q: What is HStreaming Cloud?

HStreaming Cloud is a fully-managed cluster installation of HStreaming’s technology provided as a pay-as-you-go cloud service hosted on Amazon Web Services.

Q: What is HStreaming Enterprise?

HStreaming Enterprise Edition is a software license for HStreaming Real-time Analytics Platform for customers who manage their infrastructure or have existing Hadoop deployments. HStreaming offers perpetual license, support, consulting, and integration services.

Q: What is HStreaming Community Edition?

HStreaming Community Edition is a feature limited version of HStreaming Enterprise Edition. It show-cases the functionality and can be used for experimenting with stream processing and for limited production use. Community edition is limited to 3 hours of processing for any single stream process or 1 million events, whatever comes first.

Q: What is the difference between HStreaming Cloud and HStreaming Enterprise?

HStreaming Cloud is a hosted service which runs on Amazon Web Service. To use HStreaming Cloud you only need to sign up and you are ready to run streaming analytics jobs. HStreaming Enterprise is for customers with their own IT department, existing Hadoop installations, or requirements which don’t fit into the Software-as-a-Service (SaaS) model of HStreaming Cloud.

Q: What is Amazon Web Services (AWS)?

Amazon Web Services is a service by Amazon that provides resizable compute capacity in the cloud. You can read more at http://aws.amazon.com.

HStreaming Compared to Other Solutions

What is the difference between HStreaming and Storm and S4

Storm and S4 have been designed as actor-based systems which handle stream events in a distributed fabric. As such, both system provide their custom programming models, failure models, and cluster management solutions. HStreaming has been built for Hadoop and MapReduce and is tightly integrated with Hadoop’s processing model. This design enables compatibility with high-level languages such as Pig since the fundamental processing model does not change. A user who understands Hadoop, MapReduce, or Pig understands HStreaming.

What is the difference between HStreaming and other CEP solutions

Traditional CEP solutions have been built for single node system with a strong focus on scale-up. Scale-out solutions usually require hand-tuning of communication relations between multiple operators; each node is treated as an independent entity and cluster management is left to the user and administrator. HStreaming is built from the ground up as a distributed scale-out system leveraging the scalability of MapReduce. The system in nature is a multi-node cluster environment and can scale to hundreds of nodes.

Billing & Pricing

Q: What is the pricing of HStreaming Cloud?

You find pricing here. Since HStreaming Cloud is hosted by Amazon Web Services (AWS), you will get billed for usage of HStreaming Cloud by AWS as a part of your regular AWS bill.

Q: What is the pricing of HStreaming Enterprise?

Please contact our sales team for a quote.

Q: What is the pricing of HStreaming Community Edition?

HStreaming Community Edition is free for download.

HStreaming Cloud

Q: How can I get started with HStreaming Cloud?

Follow HStreaming Cloud sign-up process.

Q: What are AWS access keys?

Amazon provides AWS access keys as a secure mechanism to access Amazon Web Services (AWS). The AWS access keys can be compared to a username and password that allows applications to authenticate requests to Amazon Web Services.

Q: Where do I find my AWS keys?

The AWS Access keys are two distinct strings of characters which can be found on AWS Account page. Go to the section “Access Credentials” and get the “Access Key ID” and the “Secret Access Key”. A sample AWS key could be “ABCDEFGHIJK1L2MN3OPQ” and a sample secret access key could be “aBcDEfgHiJKLm123NopQ456rSTuVXyZ78aBCdef”.

Q: Why do I need to share AWS keys?

HStreaming Cloud is hosted on Amazon Web Services (AWS). HStreaming Cloud provides software solution and an easy-to-use console while AWS provides cloud infrastructure. HStreaming Cloud uses AWS keys as an authentication mechanism to securely connect to AWS to launch, shut down, and monitor machine instances of your Hadoop cluster when requested by you. That also means that your activity on HStreaming Cloud is visible under your AWS account.

Q: Where do I subscribe to HStreaming’s AMI?

HStreaming’s AMI is a product listing on AWS that you will be guided to subscribe to during the sign-up process on HStreaming Cloud. There is no fee for the subscription beyond the normal hourly usage charges.

Q: Can I run HStreaming’s AMI directly on Amazon Web Services?

No, you cannot run HStreaming directly from the Amazon Web Services. When our system launches the cluster, we are performing post-startup configurations and installations depending on your job specification. The AMI only contains a set of pre-installed libraries and programs and is not a functional Hadoop installation.

Q: How many instances can I run on HStreaming Cloud?

We currently limit the number of machines in one cluster to ten. If you need more than ten machines, please contact as at support@hstreaming.com. You are furthermore limited by your Amazon AWS account restrictions, which are typically 20.

Q: What regions and availability zone can I choose on HStreaming Cloud?

HStreaming Cloud offers its customers the choice to place machine instances in the region of US East or US West as defined by Amazon Web Services (AWS).

Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones as defined by Amazon Web Services (AWS). Currently, HStreaming does not support specific placement into availability zones but rely on AWS’ best-effort allocation.

Q: What is the difference between running a job on HStreaming Cloud and Amazon’s EMR?

HStreaming Cloud provides both real-time streaming and batch-processing capabilities, whereas Amazon EMR provides only bath-processing capabilities. HStreaming Cloud runs on HStreaming’s platform including Hadoop where Amazon EMR runs a standard Apache Hadoop.

Q: How can I access HStreaming Cloud?

HStreaming Cloud offers a user interface to quickly launch jobs from a library of job templates. Alternatively, you can log into the cluster using an ssh client (such as Putty or openssh) using your specified ssh key under the username “hadoop”. From the command line you can launch “hadoop” or “pig” commands.

Q: What kind of hardware will my application stack run on?

HStreaming is a memory intensive workload and is therefore limited to machine instances with sufficient amount of memory. HStreaming is available for the following AWS EC2 instance types:

  • Large (m1.large)
  • Extra large (m1.xlarge)
  • High CPU medium (c1.medium)
  • High CPU Extra Large (c1.xlarge)
  • High Memory Extra Large (m2.xlarge)
  • High Memroy Double Extra Large (m2.2xlarge)
  • High Memory Quadruple Extra Large (m2.4xlarge)

Q: Can I run on reserved or spot instances?

Right now Amazon Web Services (AWS) does not support running paid instances on reserved or spot instances. For alternative usage models, please contact our sales team.

Security and Privacy

Q: What is HStreaming’s Privacy Policy?

You can find HStreaming’s privacy policy under Legal Terms.

Q: What information is exchanged between Amazon AWS and HStreaming?

HStreaming only performs operations as requested by you through our HStreaming Cloud interface or our programmatic API. The key functions we are performing are instance launch, instance interrogation, instance shutdown, and query of available ssh public keys. Private keys are not stored on Amazon Web Services (AWS). We log every interaction we perform on your behalf and a log is available in your account page under AWS Activity Log File.

Q: Do you delete all my private data when I close my account?

If you decide to close your account, we delete all data from our database. Please note that we maintain backup copies and it may take some time until all data is purged from our archives.

HStreaming Library

Q: What is HStreaming Library and where do I find it?

HStreaming Library is a growing collection of ready-made configurable job templates and data stream connectors for common use cases. HStreaming Library can be accessed from HStreaming Cloud console. Left navigation bar contains a list of Jobs which can be launched directly from the console, either as a new cluster or can be added to a running cluster.

Q: Can I see the source code of the job templates in HStreaming Library?

Yes, each job has a link to the source code of the particular job. You can consult or modify the code at your convenience for your usage at HStreaming Cloud.

Q: How often do you update HStreaming Library?

We are building out HStreaming Library based on feedback from our customers. If you have an interesting script (stream or batch) you would like to share on our web page please email us at support@hstreaming.com.

HStreaming Runtime Errors

Q: My job step failed. What can I do?

HStreaming’s runtime system saves log and debugging information on several places:

  • In the JobTracker and HDFS Web User Interface. You can reach them via the HStreaming cloud user interface, where links to jobtracker and HDFS web UI are posted as soon as the cluster is fully running. The job and task logs in the Hadoop Web UI are the primary source of debug and informational output for jobs that are already running.
  • On the master node itself. If a job step could not be started on the HStreaming’s cluster at all (for instance, when a script has syntactical errors and pig bails out early), a second source to search for information are the log files on the master node:
    • /mnt/var/log/hstreaming/ contains information about each job step.
    • /mnt/var/log/hadoop/ contains default hadoop log files.
    • /var/log/hstreaming/ contains log files related to the HStreaming Cloud service.

Q: The library scripts refuse to start with a “Job launch returned with error code 8” error.

HStreaming’s distribution of pig will bail out with such an error code if it could not read the scripts. As our library scripts are hosted publicly on Amazon S3, the most probable cause is that your account or sub-account lacks credentials to access S3 buckets (you can test whether you have sufficient credentials using popular s3 tools such as “s3cmd”). As a solution, enable S3 access for your account and retry.

Q: I am running script and am getting an “Unrecoverable stream connection error”

HStreaming’s stream connector will bail out with such an exception under the following circumstances:

  • you are trying connect to a web site and get authorization errors (HTTP error codes 401, 403). If you look at the task log file in the Hadoop User Interface, you will see a log entry beginning with “HTTP error”. Such an error occurs, for instance, if you enter a wrong authentication user name or password.
  • you are trying to connect to (inbound) or write to (outbound) a stream URL that has been specified in illegal or inappropriate manner. Such an error occurs, for instance, if you enter an invalid TCP port.
Last update: 2011-12-20 07:11