Tag: HDFS

Introduction to Hadoop Distributed File System(HDFS)

With growing data velocity the data size easily outgrows the storage limit of a machine. A solution would be to store the data across a network of machines. Such filesystems are called distributed filesystems. Since data is stored across a network all the complications of a network come in. /certifications/ This is where Hadoop comes … Continue reading Introduction to Hadoop Distributed File System(HDFS)

Hadoop – Features of Hadoop Which Makes It Popular

Today tons of Companies are adopting Hadoop Big Data tools to solve their Big Data queries and their customer market segments. There are lots of other tools also available in the Market like HPCC developed by LexisNexis Risk Solution, Storm, Qubole, Cassandra, Statwing, CouchDB, Pentaho, Openrefine, Flink, etc. Then why Hadoop is so popular among … Continue reading Hadoop – Features of Hadoop Which Makes It Popular

Getting Started with Big Data Integration using HDFS and DMX-h

Introduction The data researchers no longer depend only on interviews, surveys, observational studies to collect data. Instead, they have switched to the faster ways of data collection which includes leveraging internet, cameras, smartphones, drones, bots and many more. Later, the collected data is used by organization / governments to make business decisions. But, before that, … Continue reading Getting Started with Big Data Integration using HDFS and DMX-h

Hadoop Architecture – YARN, HDFS and MapReduce

Hadoop Architecture In this post, we are going to discuss about Apache Hadoop 2.x Architecture and How it’s components work in detail. Hadoop 2.x Architecture Apache Hadoop 2.x or later versions are using the following Hadoop Architecture. It is a Hadoop 2.x High-level Architecture. We will discuss in-detailed Low-level Architecture in coming sections. Hadoop Common … Continue reading Hadoop Architecture – YARN, HDFS and MapReduce

The Hadoop Module & High-level Architecture

The Apache Hadoop Module: Hadoop Common: this includes the common utilities that support the other Hadoop modules HDFS: the Hadoop Distributed File System provides unrestricted, high-speed access to the application data. Hadoop YARN: this technology accomplishes scheduling of job and efficient management of the cluster resource. MapReduce: highly efficient methodology for parallel processing of huge … Continue reading The Hadoop Module & High-level Architecture