I am evaluating various NOSQL technologies as part of my new role at Zulily. This article was forwarded by someone on our team. This is a good read. :)
HBase offers both scalability and the economy of sharing the same infrastructure as Hadoop, but will its flaws hold it back? NoSQL experts square off.
HBase is modeled after Google BigTable and is part of the world’s most popular big data processing platform, Apache Hadoop. But will this pedigree guarantee HBase a dominant role in the competitive and fast-growing NoSQL database market?
Michael Hausenblas of MapR argues that Hadoop’s popularity and HBase’s scalability and consistency ensure success. The growing HBase community will surpass other open-source movements and will overcome a few technical wrinkles that have yet to be worked out.
Jonathan Ellis of DataStax, the support provider behind open-source Cassandra, argues that HBase flaws are too numerous and intrinsic to Hadoop’s HDFS architecture to overcome. These flaws will forever limit HBase’s applicability to high-velocity workloads, he says.
Read what our two NoSQL experts have to say, and then weigh in with your opinion in the comments section below.
Big Data Debate: Will HBase Dominate NoSQL? – InformationWeek
In my new role at Zulily I am responsible for our Big Data Platform. I have been investigating different options available in the space especially the best MPP database product in the market that we could leverage… I came across this awesome article by Marcos Ortiz. It is a great read…
Like the title says, to choose an enterprise-level Massive Parallel Processing (MPP) database is actually a big headache for every Data Science Manager; basically because there are very good choices around the tech world.
But, I will give my top reasons to choose a good platform of this kind.
Fast Query processing
I think that I don’t have to explain very much here, because you should know that this feature is critical for every Data-Driven business to answer bigger questions to be able to take action more quickly. If you have a platform where you can query huge data sets in matters of seconds or minutes, this is a huge advantage over your competitors. So, I think like a Product Manager, focused in Big Data Analytics, this is critical for my company.
Integration with Apache Hadoop
Apache Hadoop has become in the de-facto Analytics platform for Big Data processing, so, for a new business interested in Big Data, you have to build an integrated platform where Hadoop could play a critical role, and if you have a database which can communicate easily with the yellow elephant; you will be able to adapt to changes in the future more quickly, of course in terms of Business Analytics.
Choosing a MPP database is incredibly hard | Diary of a Data-Driven Product Manager
Am investigating various offerings with Hadoop. This is a very good article by Sriram Krishnan and Eva Tse from Netflix. Awesome..
Hadoop has become the de facto standard for managing and processing hundreds of terabytes to petabytes of data. At Netflix, our Hadoop-based data warehouse is petabyte-scale, and growing rapidly. However, with the big data explosion in recent times, even this is not very novel anymore. Our architecture, however, is unique as it enables us to build a data warehouse of practically infinite scale in the cloud (both in terms of data and computational power).
In this article, we discuss our cloud-based data warehouse, how it is different from a traditional data center-based Hadoop infrastructure, and how we leverage the elasticity of the cloud to build a system that is dynamically scalable. We also introduce Genie, which is our in-house Hadoop Platform as a Service (PaaS) that provides REST-ful APIs for job execution and resource management.
In a traditional data center-based Hadoop data warehouse, the data is hosted on the Hadoop Distributed File System (HDFS). HDFS can be run on commodity hardware, and provides fault-tolerance and high throughput access to large datasets. The most typical way to build a Hadoop data warehouse in the cloud would be to follow this model, and store your data on HDFS on your cloud-based Hadoop clusters. However, as we describe in the next section, we have chosen to store all of our data on Amazon’s Storage Service (S3), which is the core principle on which our architecture is based. A high-level overview of our architecture is shown below, followed by the details.
The Netflix Tech Blog: Hadoop Platform as a Service in the Cloud
Interesting article but I think the 3 key companies to look at are Hortonworks, Cloudera and MapR. AWS with EMR and Azure with HDInsights will be interesting to watch out too. I am planning to play around with Hortonworks offering this week…
Network World – If you’ve got a lot of data, then Hadoop either is, or should be on your radar.
Once reserved for the Internet empires like Google and Yahoo, the most popular and well-known big data management system is now creeping into the enterprise. There are two big reasons for that: 1) Businesses have a lot more data to manage, and Hadoop is a great platform, especially for combining both legacy old data, and new, unstructured data 2) A lot of vendors are jumping into the game of offering support and services around Hadoop, making it more palatable for enterprises.
Most firms estimate that they are only analyzing 12% of the data that they already have, leaving 88% of it on the cutting-room floor.
— According to Forrester’s Software Survey Q4, 2013
“Hadoop is unstoppable as its open source roots grow wildly and deeply into enterprise data management architectures,” Forrester analysts Mike Gualtieri and Noel Yuhanna wrote recently in the company’s Wave Report on the Hadoop marketplace. “Forrester believes that Hadoop is a must-have data platform for large enterprises, forming the cornerstone of any flexible future data management platform. If you have lots of structured, unstructured, and/or binary data, there is a sweet spot for Hadoop in your organization.”
So where do you start? Forrester says there are a variety of places to go, and it evaluated nine vendors offering Hadoop services to find the pros and cons of each. Forrester concluded that there is no clear market leader at this point, with relatively young companies in this market offering compelling services alongside the tech titans.
Nine Hadoop companies you should know – Network World
Growth in Big data market is going to be staggering. If the number are accurate this is phenomenal. Not sure if there is any other trend that has grown this fast in recent times…
The global Hadoop market is expected to grow at a compound annual growth rate of 58 percent between 2013 and 2020, according to a new report by Allied Market Research.
The market revenue was estimated to be $2 billion in 2013 and is expected to grow to $50.2 billion by 2020. A huge increase in raw structured and unstructured data and increasing demand for big data analytics are the major driving factors for the global Hadoop market, the report says. Hadoop provides cost-effective and faster data processing of big data analytics over conventional data analysis tools such as relational database management systems.
Distributed computing and Hadoop platform security issues are currently hindering the growth of the market, Allied Market Research says. But with continuous technological growth these issues can be addressed, it says
Hadoop Market Forecasted to Reach $50.2 Billion by 2020 – Information Management Online Article
Everything you need to know about the March Xbox update.
As a gamer, I couldn’t be more excited for the upcoming launch of “Titanfall.” I’ve played the game and it’s awesome – I can’t wait to see you online! This launch is particularly special because our team has been working to make Xbox Live on Xbox One the best place to play.
We have received lots of feedback since the launch of Xbox One in November and listening closely to you. I’m excited to share that our second, and most significant, system update for Xbox One is starting to roll out today and includes improved matchmaking, party chat and friends features that will make gaming on Xbox One an experience like no other. To me that means playing games like “Titanfall” on the best multiplayer service on the planet, using a new headset or the one you already own, while live broadcasting your games on Twitch.
Here’s a breakdown of more of the features in the March update, addressing some of the biggest feedback you’ve shared:
Prepare for Titanfall: Marc Whitten Provides Details on Xbox One March Update
Great news for customers interested in using Oracle on Azure.
IDG News Service – Oracle’s database, WebLogic application server and Java programming language will soon be generally available on Microsoft’s Windows Azure cloud service, marking a major milestone in the high-profile partnership the vendors announced last June.
A general availability date of March 12 has been set for Windows-based software images of the products, which have been in preview, Microsoft said on the Azure website.
While there have been no additional fees for the preview versions, this will change starting March 12. Customers who want to avoid Oracle-related charges for these "license included" virtual images must shut them down before that date, Microsoft said. Windows Server VMs the images run in are charged for separately.
Purely pay-as-you-go customers won’t see much discounting for higher usage of Oracle’s software on Azure.
Oracle software on Microsoft Azure gets a general availability date – Computerworld
Power BI for Office 365 is now available. It is pretty cool check it out at www.powerbi.com. Power BI allows you too
- Quickly create collaborative BI sites – enable anyone to quickly create a collaborative BI site to share workbooks containing data and insights.
- Keep reports up to date with scheduled data refresh – reports that have been saved to the cloud can now connect back to on-premises data sources to refresh the data and stay up to date.
- Manage data queries for the team – share not only workbooks but also the data queries created in Power Query for Excel. Team members can now build and manage data queries for others to use when creating their own reports.
- Maintain a Data Catalog of searchable data – IT departments can now use the Data Catalog feature to make it easier for everyone to find and connect to corporate data by searching for it from within Excel.
- Ask questions of your data in natural language – with the Q&A features people can type questions they have of the data in natural language and the system will interpret the question and present answers in the form of interactive visualizations.
- Stay connected with mobile access to your reports – stay connected from anywhere with new HTML5 support and the Power BI windows app.
Learn More at: Microsoft Blog Post or Power BI Team Blog.
Xbox One was #1 console in December. Great to see the success in the market. It was great opportunity to work with really smart people across Xbox Org to ship this.
The NPD report for December is in. It was a good holiday season for all involved — and a very good one for Microsoft MSFT +0.38%. The Xbox One and Xbox 360 were the best selling consoles in each of their generations, with 908,000 sales for Xbox One and 643,000 for Xbox 360, making up 46% of the total US hardware market. Xbox One games also held six of the top ten spots for best-selling software.
This is the status quo according to the last generation — Xbox holds down the US, Playstation does better in Europe and dominates Japan (Microsoft has been trying to make Europe a battleground this generation, but has largely ceded Japan). Still waiting on PS4 numbers, and will update with those. My guess is they aren’t too far behind — Sony Sony is mounting an impressive attack in home territory, significantly helped by better on-paper specs and a $100 price difference. Globally, Sony’s machine is still ahead.
Xbox One #1 Console In US For December, Outselling PS4 – Forbes