FwdData devotes time and energy in forging strong business relationships with our customers, serving both Government agencies and commercial organizations.

Installing, Configuring, and Deploying the Cluster

Hadoop cluster provisioning and ongoing management can be a complicated task, especially when there are hundreds or thousands of hosts involved. Apache Ambari provides an end-to-end management and monitoring application for Apache Hadoop. With Ambari, we can deploy and operate a complete Hadoop stack using a graphical user interface (GUI), manage configuration changes, monitor services, and create alerts for all the nodes in your cluster from a central point. We provide the following services by utilizing Apache Ambari:

  • Provision a Hadoop Cluster
    • Ambari provides a step-by-step wizard for installing Hadoop services across any number of hosts.
    • Ambari handles configuration of Hadoop services for the cluster.
  • Manage a Hadoop Cluster
    • Ambari provides central management for starting, stopping, and reconfiguring Hadoop services across the entire cluster.
  • Monitor a Hadoop Cluster
    • Ambari provides a dashboard for monitoring health and status of the Hadoop cluster.
    • Ambari leverages Ambari Metrics System for metrics collection.
    • Ambari leverages Ambari Alert Framework for system alerting and will notify you when your attention is needed

Big Data Storage

Organizations today need to manage, process, and store huge amounts of complex data. The key requirements of big data storage are that it can handle very large amounts of data and keep scaling to keep up with growth, and that it can provide the capability necessary to deliver data to its various applications. The properties of the ideal big data storage architecture, should have the following characteristics. It should:

  • Be highly scalable
  • Ensure content is highly available
  • Ensure content is widely accessible
  • Support both analytical and content applications
  • Integrate with legacy applications
  • Enable integration with public, private and hybrid cloud ecosystems

HDFS storage solution can provide a number of benefits. First of all, storage and computation are distributed across many servers, so they can be scaled independently rather than within the fixed capacity of a node. Second of all, HDFS can be configured for high availability. Now the cost savings, the biggest cost savings in HDFS come from the use of commodity hardware and open source software. Cost of Storage in a traditional EDW system can be anything between $10,000 to $50,000 per TB. HDFS storage cost is anything between $100-$300/TB. The difference is disruptive.

At FwdData, we provide storage solutions using open source technology Hadoop HDFS. This storage environment can scale out to meet capacity or increased compute requirements and use parallel file systems that are distributed across many storage nodes that can handle billions of files without the kind of performance degradation that happens with ordinary file systems as they grow.

Big Data Compute

Big data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. Starting with MapReduce, pig, hive, mahout, etc. have all played a significant role. Nothing disrupted compute so much as Spark. Spark is Lightning-Fast Cluster Computing framework which includes the following features Spark Core – contains the basic functionality of Spark, including components for task scheduling, memory management, fault recovery, interacting with storage systems and more. Spark SQL – Spark SQL provides support for interacting with Spark via SQL as well as the Apache Hive variant of SQL, called the Hive Query Language (HiveQL). Spark Streaming – Spark Streaming enables processing of live stream of data. MLlib – MLlib provides multiple types of machine learning algorithms, including binary classification, regression, clustering and collaborative filtering, as well as supporting functionality such as model evaluation and data import. GraphX – GraphX is a library added in Spark 0.9 that provides an API for manipulating graphs (e.g., a social network’s friend graph) and performing graph-parallel computationsQuery Language (HiveQL) Most of the compute can be done using spark-shell itself and for advanced programming Java can be used. At FWDData, we provide Apache Spark solutions for your business needs. Apache Spark – It is the only compute technology you need.