Cloudera and Hortonworks Cease Fire

Like brothers on opposite sides of a conflict, Cloudera and Hortonworks have decided in favor of peace and reuniting the family.

The merger of Cloudera and Hortonworks brings financial and product line synergies. More than 70% of their portfolios are the same Hadoop open source software. This means duplicate development efforts in packaging and selling almost identical software. Non-differentiated free software. Imagine selling Coca Cola in Pepsi bottles.

Even so, this will simplify a complex Hadoop stack. Hadoop distributions have rough edges. Hortonworks and Cloudera Hadoop releases do not offer the latest Hadoop subproject release levels (see Hadoop tracker). This gets rectified when two engineering points of view become one later in 2019. This merger is excellent news for other software vendors. Their integration and support costs will shrink by supporting only one Hadoop release. The ODPi vision of a single Hadoop release will finally arrive.

But overlapping proprietary products will face the chopping block. Which software will survive?

  1. Impala versus Hive LLAP

  2. Apache Sentry versus Ranger

  3. Spot versus Metron

  4. Cloudera Navigator versus Atlas

  5. Cloudera Manager versus Ambari

Hey, can we vote on this? Can’t we keep them all? Nope. Some customer skills investments will be forfeit.

Both companies should be cash positive in the next 2-3 quarters. Combined they will have $720 million annual revenue and more than $500 million in cash. Furthermore, the Hadoop market can’t sustain two companies at 40% year-over-year growth much longer. Job cuts will be large for Hadoop developers, quality assurance, and packaging teams. Consolidation and cost cutting is why the stock pundits are elated. A leaner, stronger combined vendor will emerge.

What’s not to like? Well, so far the story is all balance sheet benefits. Customers won’t get anything new except the loss of some skills investments.

CLOUDERAWORKS – THE REAL REASON FOR THE MERGER

The new company already knows they wasted time competing with each other. Amazon Web Services to raced past them. They gobbled up “big data” customers with cheap storage and easy to use software. This merger is a fight for survival against AWS, Microsoft Azure, and Google public clouds. All three public cloud behemoths offer Hadoop lineage software (Amazon EMRAzure HDInsight, and Google Cloud Dataproc). It’s too easy to get hardware and software to run Hadoop-like software in the cloud. This established cloud as the defacto platform for data lakes. Which steals business from the Hadoop distro providers.

Multiple cloud strategies redefined the big data landscape. First, all cloud vendors offer extremely low cost cold storage for huge data sets. For Amazon this is Simple Storage Service (S3), for Azure its blob storage. Both are reasonable substitutes for HDFS, the foundation of Hadoop. Hadoop aficionados will argue they can put HDFS into these cold storage structures. Yes they can. But extra software layers of software on top of S3 isn’t the first choice.

The second trend to watch is new Apache Hadoop releases that make it easy to use S3 or Blob storage to replace HDFS. In a jaundiced point of view, Hadoop is cutting off its own legs.

Next comes the Spark debate. Spark showed up on the Apache Hadoop roster a couple years ago. It is much easier to use than Hadoop’s clunky Map-Reduce and its sister Hive. Programmers are gravitating to Spark, further undercutting the Hadoop data lake predominance. You guessed it. Apache Spark is already in the cloud vendor portfolios. There’s competitive advantage in being way too easy to do business with.

It’s a fight for survival but give ClouderaWorks credit. Stodgy companies would stew in denial for five more years before taking action. We sometimes forget Cloudera and Hortonworks are full of Silicon Valley innovators. And their senior managers have been forward thinking for a decade already. Mike Olson predicted Hadoop would disappear at the 2014 Strata conference (13:55mm:ss). Four years later, Gartner removed Hadoop distributions from their 2018 Data Management Hype Cycle. Which is why we don’t see Hadoop posters and keynotes at the Strata and Data Works conferences.

Cloudera began repositioning themselves as an Ai/Analytics company about 18 months ago. Hortonworks no longer talks much about Hadoop. Instead, they seek to be the Internet of Things data management company. These two brothers aimed at the two hottest growth markets in Information Technology. The children of Yahoo! have chosen the right course, albeit with stiff headwinds. Will the real Cloud Era please stand up?

Dan Graham