Most scalable real-time data processing platform.
HStreaming enables performing advanced analytics in real-time, creating live dashboards, identifying and recognizing patterns within one data stream or across multiple data streams, and triggering of actions based on predefined rules or heuristics. HStreaming can handle even the most challenging data volumes up to hundreds of millions of events per second and complex analytical problems leveraging the scalability of Hadoop and MapReduce.
High availability and fault tolerance.
HStreaming’s patented communication fabric provides high availability and fault tolerance in case of hardware and software failures, and transient network outages. It is the fundamental building block for large-scale data processing and low-cost operations using commodity hardware. Fault recovery is fully transparent to the application reducing the complexity of the software stack built on the top of HStreaming.
Single consolidated platform for full data life-cycle management.
HStreaming adds real-time processing and ETL capabilities to Hadoop and thus enables customers to manage the full data life-cycle of pre-processing, ETL, storage, post-processing, and archival on a single consolidated platform. This single platform approach substantially simplifies development and maintenance, eliminates cross-platform integration between streaming and batch processing systems, and allows to dynamically re-allocate resources. As a result, HStreaming’s single platform for Hadoop substantially reduces total cost of ownership.
Common code base and tooling for real-time and batch processing.
HStreaming enables to use the same MapReduce and Apache Pig algorithms and functions for real-time or batch processing. Existing code such as user-defined functions (UDF) can be migrated to stream processing with no or minimal changes. It brings your business a rapid development cycle and gives you the agility to adapt fast to changing business requirements.
Integrated with Hadoop ecosystem.
HStreaming is built upon Hadoop and compatible with all major Hadoop distributions including Apache Hadoop, Cloudera, MapR, Amazon EMR, Hortonworks, EMC, and IBM. It also seamlessly integrates with related Apache Hadoop technologies like Pig, HDFS, HBase, and Zookeeper. HStreaming works closely with its customers to help them leverage the fast growing Hadoop ecosystem and custom build exactly the solution they need.
Low hardware and operational cost.
HStreaming brings the benefit of using existing low-cost commodity hardware and networks to real-time processing. HStreaming does not require anything beyond a Hadoop installation. Plus, HStreaming runs real-time data processing and analytics before any data hits the storage system and can drastically reduce the data volume stored. This early data reduction can substantially decrease the number of machines and disks required in your data center.
SQL and noSQL connectors.
HStreaming comes with a rich set of connectors for exporting data into SQL and noSQL databases such as MySQL, HBase and Cassandra. Real-time analytics results are immediately available for querying and real-time dashboards using standard web technologies. Data can further be archived into a file system, like HDFS.
Built-in visualizations and triggers.
The HStreaming visualization connector allows to instantly convert analytics results into web-browser accessible real-time dashboards. Visualizations can be fully customized from within the query language. Other built-in connectors enable analytics jobs to automatically trigger actions such as email notifications, alerts, and send requests to web-based services.