Mar 15

Book Review: The Power of Less

Book Reviews Comments Off on Book Review: The Power of Less

Author: Leo Babauta

Rating: ★☆☆☆☆

I had high hopes from this book. It has 4 stars on Amazon and I wanted to read a book that would genuinely help me with how to manage with less. Smile This book was huge disappointment for me.

Actually when you have author giving you morning routine ideas like Have coffee or tea, Shower, Take a bath, Eat breakfast among others ideas, you really think if author is running out of ideas on what to write. Anyway, mostly the book is just common sense stuff nothing that you do not know already…


The only good part was I skimmed through this book REALLY FAST!!! Smile

Mar 14

Using Hadoop with Hortonworks Data Platform @ zulilY

Big Data, Technology Comments Off on Using Hadoop with Hortonworks Data Platform @ zulilY

Good paper on how we leverage Hortonworks Data Platform for our big data processing and benefits we get from this. You should also check out our joint video with Google on leveraging HDP + Google Big Query.

zulily is a publicly traded, Seattle-based online retailer. The company launched in 2010 with a mission of bringing its customers (primarily moms) fresh and interesting product every day. The company has over four and a half million active customers and expects to do over $1B in sales for 2014.

Unlike search-based e-commerce sites whose users come looking for specific items, the zulily customer visits the company’s web site and mobile app on a daily basis to browse and discover products. The company focuses on crafting engaging, unique and compelling experiences for its customer.

The zulily experience promises “something special every day,” and creating that experience is no small feat. To do so, the company’s 500 merchandising professionals create over 100 daily “events” that launch more than 9,000 unique products each day.


To turn this raw content into an engaging experience for customers, zulily has invested heavily in personalization. Showing the right sales event and images to the right member at the right time is critical for zulily. A mother with a 6-year old daughter should have a completely different experience from that of an expecting mother.

…take advantage of the latest technologies available in the industry for managing both structured and unstructured data.

Accomplishing this level of personalization required that zulily build systems to understand members coming to its web site and then instantly determine what to show them. To do this, zulily’s systems must capture, integrate and analyze many different inputs from a wide variety of sources.

The company was founded with a data platform built on a relational database, but two years after launch the number of events, SKUs, customers and interactions were growing too rapidly for that system to keep up. To continue delivering relevant, customized content to its rapidly growing customer base, zulily knew that it needed to modernize its platform.

For personalization at scale, zulily built a Hadoop-based system for collecting clickstream data from across the company’s web, mobile and email engagement channels. This system allowed the company to turn clickstream data into engines that produce personalized, targeted and relevant product recommendations.

The zulily platform helped it achieve a new level of precision and maturity in its ability to personalize its customers’ experiences, but one challenge still remained. It still had a silo of structured data in one place (including transactions, customer records and product details), which was separate from its clickstream data in Hadoop.

“We really struggled to integrate and analyze the data across the two different silos,” said Sudhir Hasbe, director of software engineering for data services at zulily. “We often found ourselves making decisions based exclusively on a single type of data, or needing to get developers involved to produce new reports or analyses.”

The need to constantly involve the company’s development team was expensive and time consuming, and it distracted the developers from focusing on their own priorities. Due to the complexity of its siloed data platform, the company found itself limited in its ability to agilely respond to changes in the marketplace or company strategy.


After zulily thoroughly examined its analytical priorities and the challenges posed by its current infrastructure, the company concluded that it needed to move beyond its legacy relational database.

This platform is allowing us to take questions that we couldn’t answer to the point where we can answer them

“We knew we couldn’t build what we needed on traditional relational database technology,” said Hasbe. “We would have to innovate and take advantage of the latest technologies available in the industry for managing both structured and unstructured data.”

As Hasbe and his team further defined the company’s requirements, they formed a vision for what the company now calls the zulily Data Platform (ZDP). They planned to make ZDP a primary, central repository for all of the business’ data. It would:

  • Support the company’s efforts to enhance the customer experience through better personalization and targeting,
  • Give the company’s business analysts easy access to all of the company’s information,
  • Allow the team to make smarter business decisions without needing IT support, and
  • Scale to support the company’s growth over the long term.

To meet these goals, Hasbe and his team created a modern data architecture that combined the strengths of both Apache Hadoop and cloud computing to deliver a highly scalable unified data platform for structured and unstructured data.

ZDP is based on:

  • Hortonworks Data Platform (HDP). With YARN as its architectural center, HDP provides a data platform for multi-workload data processing across an array of processing methods – from batch through interactive to real-time, supported by key capabilities required of an enterprise data platform – spanning governance, security and operations.
  • Google Cloud Platform (GCP), Google’s public cloud infrastructure-as-a-service (IaaS) offering.
  • Google BigQuery, a cloud-based tool for super-fast data query across massive data sets.
  • Tableau, a visualization and reporting tool suite.

After identifying its path forward, the zulily team was able to move from its vision to an in-production data platform in a mere four months. It will migrate all existing data processing to the new platform by the end of 2014.

“Our new platform enables analytics scenarios that were difficult to achieve with our former technology stack,” said Hasbe. “And we now have the ability to scale both storage and analytics on demand. To finally be ahead of the company’s growth curve is exciting for us.”


For Luke Friang, zulily’s chief information officer, the ZDP platform creates important new opportunities for the company.

“Data is everything to us, yet we really struggled with how to properly consume and harvest a mass of data to provide our customers with a great experience,” said Friang. “Our new platform empowers us to use data all over the business. It drives the content of the email that our customers receive in the morning. It drives how and when we ship customers the products they order. It drives what customer sees in the mobile app versus what customer sees in a browser on their computer. It’s allowing us to make sure that we’re tailoring customer experiences appropriately, throughout the entire lifecycle as a zulily customer.”

From Friang’s perspective, it all comes down to supporting the business’ ability to derive new insights and make quick decisions.

“This platform is allowing us to take questions that we couldn’t answer to the point where we can answer them,” he said. “It is allowing us to accelerate decision making processes from weeks, days and hours to minutes, seconds and milliseconds. From an off-line analytics activity to a real-time decision-making processes embedded within a piece of software. That’s the value.”

“Hortonworks’ depth of knowledge was invaluable to us in this process,” added Friang. “The responsiveness of their team, and their ability to get things done and get issues fixed, were key to our ability to get ZDP off the ground.”

zulily does Hadoop. With Hortonworks Data Platform

Mar 14

zulily turns big data into big advantage using Google Cloud Platform

Big Data, Technology Comments Off on zulily turns big data into big advantage using Google Cloud Platform

Feb 17

How to Create a Great Presentation in Just 15 Minutes | @DanMartell

General Management, Marketing Comments Off on How to Create a Great Presentation in Just 15 Minutes | @DanMartell

Stumbled upon this blog post from Dan Martell about giving presentations which is one of my passions. Smile . Loved it

The most important takeaway for me was how to come up with a catchy title which is one of my big weak points. Smile Am surely going to try this out.

Creating a catchy title can feel overwhelming, but there’s a simple trick based on decades of research and it’s super scientific. Just use magazine covers. Search online for a magazine in your industry and put the words, “Magazine Cover” after it. (ex: Forbes Magazine Cover). You’ll see 100’s of examples of article headlines designed to capture someones attention. Use them for inspiration and tweak for your own needs.

How to Create a Great Presentation in Just 15 Minutes | @DanMartell

Jan 11

Why Planes Can Still Vanish: Airlines Slow to Upgrade Tracking Tech – Businessweek

Technology Comments Off on Why Planes Can Still Vanish: Airlines Slow to Upgrade Tracking Tech – Businessweek

This is unacceptable in this age. :( 

An Indonesian navy service member looks out before departing for a search operation to find AirAsia flight QZ8501 at the Indonesian Naval Aviation Base Juanda on Jan. 3, 2015, in Surabaya.

Photographer: Robertus Pudyanto/Getty Images

An Indonesian navy service member looks out before departing for a search operation to find AirAsia flight QZ8501 at the Indonesian Naval Aviation Base Juanda on Jan. 3, 2015, in Surabaya.

By Dec. 30, when search teams began to recover debris and bodies from the apparent crash site of AirAsia flight QZ8501, the airline industry had begun to hear renewed calls from flyers and regulators for more precise, consistent tracking of commercial aircraft. During inclement weather two days earlier, the Airbus (AIR:FP) A320, while carrying 162 people from Surabaya in Indonesia to Singapore, had dropped off radar and couldn’t be found.

More than three-quarters of the earth’s surface, including large parts of Africa, Asia, and South America as well as most of the oceans, lacks reliable radar coverage. In March, when a Malaysia Airlines Boeing (BA) 777 bound for China disappeared, airlines and regulators began to grapple with the fact that a plane without a working transponder can be virtually invisible. Most new aircraft include technologies that bridge the radar gap, but many older planes still don’t have them. The International Air Transport Association, a trade group that represents 250 airlines, emphasizes that almost all the 100,000 flights per day travel without incident.

Why Planes Can Still Vanish: Airlines Slow to Upgrade Tracking Tech – Businessweek

Jan 11

Toyota gives away patents to build ‘game-changing’ car of the future – GeekWire

Innovation, Technology Comments Off on Toyota gives away patents to build ‘game-changing’ car of the future – GeekWire

This is awesome… :) Maybe I should wait for hydrogen electric vehicle. :)  This will surely drive innovation in this space.

LAS VEGAS — Toyota made it clear today at the big Consumer Electronics Show that its fuel-cell technology is the future of automotive transportation.

The car-maker on Monday held a press conference detailing the progress of the new Toyota Mirai, a hydrogen fuel cell vehicle that is powered by combining oxygen and hydrogen and will be commercially available later this year.

“We believe that hydrogen electric will be the primary fuel for the next 100 years,” said Bob Carter, a senior VP with Toyota’s US office.

Dr. Michio Kaku talks about Toyota's fuel cell technology. Dr. Michio Kaku talks about Toyota’s fuel cell technology.

Perhaps the most newsworthy announcement came when Toyota said it would make all of its 5,680 patents related to fuel cell technology available, royalty-free, to other companies manufacturing and selling both fuel-cell vehicles and hydrogen refueling stations. The idea is to drive more innovation in this somewhat nascent sector of the automobile industry.

Toyota gives away patents to build ‘game-changing’ car of the future – GeekWire

Apr 26

Apache Ambari 1.5.1 is Released!

Big Data Comments Off on Apache Ambari 1.5.1 is Released!

This week Ambari 1.5 version was released. Need to try it out. Smile Check out post from HortonWorks.

Yesterday the Apache Ambari community proudly released version 1.5.1. This is the result of constant, concerted collaboration among the Ambari project’s many members. This release represents the work of over 30 individuals over 5 months and, combined with the Ambari 1.5.0 release, resolves more than 1,000 JIRAs.


This version of Ambari makes huge strides in simplifying the deployment, management and monitoring of large Hadoop clusters, including those running Hortonworks Data Platform 2.1.

Ambari 1.5.1 contains many new features – let’s take a look at those.

Apache Ambari 1.5.1 is Released! | Hortonworks

Apr 15

Big Data Debate: Will HBase Dominate NoSQL? – InformationWeek

Big Data Comments Off on Big Data Debate: Will HBase Dominate NoSQL? – InformationWeek

I am evaluating various NOSQL technologies as part of my new role at Zulily. This article was forwarded by someone on our team. This is a good read. :) 

HBase offers both scalability and the economy of sharing the same infrastructure as Hadoop, but will its flaws hold it back? NoSQL experts square off.

HBase is modeled after Google BigTable and is part of the world’s most popular big data processing platform, Apache Hadoop. But will this pedigree guarantee HBase a dominant role in the competitive and fast-growing NoSQL database market?

Michael Hausenblas of MapR argues that Hadoop’s popularity and HBase’s scalability and consistency ensure success. The growing HBase community will surpass other open-source movements and will overcome a few technical wrinkles that have yet to be worked out.

Jonathan Ellis of DataStax, the support provider behind open-source Cassandra, argues that HBase flaws are too numerous and intrinsic to Hadoop’s HDFS architecture to overcome. These flaws will forever limit HBase’s applicability to high-velocity workloads, he says.

Read what our two NoSQL experts have to say, and then weigh in with your opinion in the comments section below.

Big Data Debate: Will HBase Dominate NoSQL? – InformationWeek

Apr 13

Great Article on Choosing a MPP database

Big Data Comments Off on Great Article on Choosing a MPP database

In my new role at Zulily I am responsible for our Big Data Platform. I have been investigating different options available in the space especially the best MPP database product in the market that we could leverage… I came across this awesome article by Marcos Ortiz. It is a great read… Smile 

Like the title says, to choose an enterprise-level Massive Parallel Processing (MPP) database is actually a big headache for every Data Science Manager; basically because there are very good choices around the tech world.

But, I will give my top reasons to choose a good platform of this kind.

Fast Query processing

I think that I don’t have to explain very much here, because you should know that this feature is critical for every Data-Driven business to answer bigger questions to be able to take action more quickly. If you have a platform where you can query huge data sets in matters of seconds or minutes, this is a huge advantage over your competitors. So, I think like a Product Manager, focused in Big Data Analytics, this is critical for my company.

Integration with Apache Hadoop

Apache Hadoop has become in the de-facto Analytics platform for Big Data processing, so, for a new business interested in Big Data, you have to build an integrated platform where Hadoop could play a critical role, and if you have a database which can communicate easily with the yellow elephant; you will be able to adapt to changes in the future more quickly, of course in terms of Business Analytics.

Choosing a MPP database is incredibly hard | Diary of a Data-Driven Product Manager

Mar 27

Am investigating various offerings with Hadoop. This is a very good article by Sriram Krishnan and Eva Tse from Netflix. Smile Awesome.. Smile

Hadoop has become the de facto standard for managing and processing hundreds of terabytes to petabytes of data. At Netflix, our Hadoop-based data warehouse is petabyte-scale, and growing rapidly. However, with the big data explosion in recent times, even this is not very novel anymore. Our architecture, however, is unique as it enables us to build a data warehouse of practically infinite scale in the cloud (both in terms of data and computational power).

In this article, we discuss our cloud-based data warehouse, how it is different from a traditional data center-based Hadoop infrastructure, and how we leverage the elasticity of the cloud to build a system that is dynamically scalable. We also introduce Genie, which is our in-house Hadoop Platform as a Service (PaaS) that provides REST-ful APIs for job execution and resource management.

Architectural Overview

In a traditional data center-based Hadoop data warehouse, the data is hosted on the Hadoop Distributed File System (HDFS). HDFS can be run on commodity hardware, and provides fault-tolerance and high throughput access to large datasets. The most typical way to build a Hadoop data warehouse in the cloud would be to follow this model, and store your data on HDFS on your cloud-based Hadoop clusters. However, as we describe in the next section, we have chosen to store all of our data on Amazon’s Storage Service (S3), which is the core principle on which our architecture is based. A high-level overview of our architecture is shown below, followed by the details.

The Netflix Tech Blog: Hadoop Platform as a Service in the Cloud