Apr 15

Big Data Debate: Will HBase Dominate NoSQL? – InformationWeek

Big Data Comments Off on Big Data Debate: Will HBase Dominate NoSQL? – InformationWeek

I am evaluating various NOSQL technologies as part of my new role at Zulily. This article was forwarded by someone on our team. This is a good read. 🙂 

HBase offers both scalability and the economy of sharing the same infrastructure as Hadoop, but will its flaws hold it back? NoSQL experts square off.

HBase is modeled after Google BigTable and is part of the world’s most popular big data processing platform, Apache Hadoop. But will this pedigree guarantee HBase a dominant role in the competitive and fast-growing NoSQL database market?

Michael Hausenblas of MapR argues that Hadoop’s popularity and HBase’s scalability and consistency ensure success. The growing HBase community will surpass other open-source movements and will overcome a few technical wrinkles that have yet to be worked out.

Jonathan Ellis of DataStax, the support provider behind open-source Cassandra, argues that HBase flaws are too numerous and intrinsic to Hadoop’s HDFS architecture to overcome. These flaws will forever limit HBase’s applicability to high-velocity workloads, he says.

Read what our two NoSQL experts have to say, and then weigh in with your opinion in the comments section below.

Big Data Debate: Will HBase Dominate NoSQL? – InformationWeek

Apr 13

Great Article on Choosing a MPP database

Big Data Comments Off on Great Article on Choosing a MPP database

In my new role at Zulily I am responsible for our Big Data Platform. I have been investigating different options available in the space especially the best MPP database product in the market that we could leverage… I came across this awesome article by Marcos Ortiz. It is a great read… Smile 

Like the title says, to choose an enterprise-level Massive Parallel Processing (MPP) database is actually a big headache for every Data Science Manager; basically because there are very good choices around the tech world.

But, I will give my top reasons to choose a good platform of this kind.

Fast Query processing

I think that I don’t have to explain very much here, because you should know that this feature is critical for every Data-Driven business to answer bigger questions to be able to take action more quickly. If you have a platform where you can query huge data sets in matters of seconds or minutes, this is a huge advantage over your competitors. So, I think like a Product Manager, focused in Big Data Analytics, this is critical for my company.

Integration with Apache Hadoop

Apache Hadoop has become in the de-facto Analytics platform for Big Data processing, so, for a new business interested in Big Data, you have to build an integrated platform where Hadoop could play a critical role, and if you have a database which can communicate easily with the yellow elephant; you will be able to adapt to changes in the future more quickly, of course in terms of Business Analytics.

Choosing a MPP database is incredibly hard | Diary of a Data-Driven Product Manager