The data lake may be all about Apache Hadoop, but integrating operational data can be a challenge. A Hadoop software platform provides a proven cost-effective, highly scalable and reliable means of storing vast data sets on commodity hardware. By its nature, it does not deal well with changing data, having no concept of "update," nor … Continue reading Providing transactional data to your Hadoop and Kafka data lake
Month: January 2018
IBM launches new Integrated Analytics System with Machine Learning
Information analytics has never been a “one size fits all” proposition. That applies to the hardware and software technologies organizations employ, the information being parsed and the goals of specific projects. So it’s worth examining how individual vendors approach analytics and the way they evolve their solutions and services to reflect changes in commercial markets. … Continue reading IBM launches new Integrated Analytics System with Machine Learning
Analyze clickstream data with IBM Db2 EventStore for customer insights
In this blog, we will look at analyzing the Clickstream data with IBM Db2 EventStore to derive timely insights on interests of retail customers. Typically, ingesting streaming event data, persisting with low latency and analyzing it along with historical event data requires integrating multiple analytic systems. IBM Db2 EventStore is purpose built to simplify the … Continue reading Analyze clickstream data with IBM Db2 EventStore for customer insights
Taking the hard work out of Apache Hadoop
Why did IBM decide to create its own Hadoop and Spark distribution, and why does it need a reference architecture? The ability to collect, manage and analyze big data is one of the key tenets of the IBM cognitive business strategy, as well as being central to the Internet of Things. We see a lot … Continue reading Taking the hard work out of Apache Hadoop
Propelling the future of big data and data science
Data is a potent business resource and the key to gaining and maintaining competitive advantage. Last month, IBM and Hortonworks announced a partnership to bring data science to the world on an open platform, offering Hortonworks Data Platform (HDP) along with IBM Data Science Experience (DSX) and IBM Big SQL to help everyone from data … Continue reading Propelling the future of big data and data science
10 expert tips to boost agility with Hadoop as a service
Recently, a group of Apache Hadoop and Apache Spark subject matter experts from IBM Analytics hosted a public CrowdChat discussion about using cloud-based Hadoop and Spark services as a lever for business agility. Here is a top-ten list of hot topics and themes that emerged from that discussion. Despite years of effort centralizing information in … Continue reading 10 expert tips to boost agility with Hadoop as a service
Performance comparison of different file formats and storage engines in the Apache Hadoop ecosystem
TOPIC This post presents a performance comparison of few popular data formats and storage engines available in the Apache Hadoop ecosystem: Apache Avro, Apache Parquet, Apache HBase and Apache Kudu on the field of space efficiency, ingestion performance, analytic scans and random data lookup. This should help in understanding how (and when) each of them … Continue reading Performance comparison of different file formats and storage engines in the Apache Hadoop ecosystem