The Problem(s) Any technology is only useful if it solves a problem (or problems). So what problem(s) does Big Data solve? As we all know, there is data, lots of it: historical data, sure, but also new data generated from social media apps, click stream data from web applications, IoT sensor data, and on and … Continue reading What is big data? More than volume, velocity and variety…
Month: December 2017
New in Cloudera Data Science Workbench 1.2: Usage Monitoring for Administrators
Cloudera Data Science Workbench (CDSW) provides data science teams with a self-service platform for quickly developing machine learning workloads in their preferred language, with secure access to enterprise data and simple provisioning of compute. Individuals can request schedulable resources (e.g. compute, memory, GPUs) on a shared cluster that is managed centrally. While self-service provisioning of … Continue reading New in Cloudera Data Science Workbench 1.2: Usage Monitoring for Administrators
Hadoop Servers
Client Server Client is neither a master or a slave. Its role here is to store data on hadoop cluster and submit MR jobs with instructions saying how data has to be processed. Also Client can retrieve & view the result after completion of job. Data Storage: Client accesses the file system on behalf of … Continue reading Hadoop Servers
Deep Learning with Intel’s BigDL and Apache Spark
how to use Deeplearning4J (DL4J) along with Apache Hadoop and Apache Spark to get state-of-the-art results on an image recognition task. Continuing on a similar stream of work, in this post we discuss a viable alternative that is specifically designed to be used with Spark, and data available in Spark and Hadoop clusters via a … Continue reading Deep Learning with Intel’s BigDL and Apache Spark
Introducing S3Guard: S3 Consistency for Apache Hadoop
Synopsis This article introduces a new Apache Hadoop feature called S3Guard. S3Guard addresses one of the major challenges with running Hadoop on Amazon’s Simple Storage Service (S3), eventual consistency. We outline the problem of S3’s eventual consistency, how it affects Hadoop workloads, and explain how S3Guard works. Problem Although Apache Hadoop has support for using … Continue reading Introducing S3Guard: S3 Consistency for Apache Hadoop
Big Data Architecture Workshop
Since the birth of big data, Cloudera University has been teaching developers, administrators, analysts, and data scientists how to use big data technologies. We have taught over 50,000 folks all of the details of using technologies from Apache such as HDFS, MapReduce, Hive, Impala, Sqoop, Flume, Kafka, Core Spark, Spark SQL, Spark Streaming, and Spark … Continue reading Big Data Architecture Workshop
Informatica Big Data Management on Cloudera Altus
Today, we’re really excited to announce the latest innovation from Cloudera and Informatica’s partnership. Companies are increasingly moving their data operations into the cloud. With both companies focusing on helping customers derive business insights out of vast amounts of data, our new joint offering will dramatically simplify leveraging cloud-native infrastructures for big data analytics. Cloudera … Continue reading Informatica Big Data Management on Cloudera Altus