In my last post, I suggested that as of mid-2010, there were exactly five “Big Data” DBMS vendors: Aster Data, Greenplum, Netezza, Teradata, and Vertica. But what exactly does it mean to be a “Big Data” vendor? I believe that such a platform must have at least two characteristics:
- It must be able to store and manage data sets over 100 TB
- It must enable robust analysis of that data in a performant manner
The first point is clearly based on today’s data volumes (“Big Data” in five years will obviously be…bigger). The second point is meant to differentiate between traditional data warehouses and newer, more powerful data analytics platforms. The Big Data vendors are those whose platforms can not only store extremely large volumes of data, but also enable users to perform complex analysis of that data – going beyond the basic retrieval and simple reporting workloads prevalent in traditional data warehouses (and beyond the limitations imposed by SQL).
So if we start with the above definition for Big Data vendors, there are at least three fundamental questions that can be used to compare their platforms:
- What can you do with the data?
- How mature is the platform?
- How fast is the platform?
For the purposes of evaluating the recent spate of acquisitions, I’ll focus on the first two: how do the analytics capabilities of these platforms compare and how do the platforms compare in terms of stability, ecosystem support, ease-of-use, and general “enterprise-readiness”?
Comparing the Big Data Vendors Pre-Acquisition
Let’s once again go back to July 2010 (prior to Greenplum’s acquisition by EMC). As with any self-respecting pundit, I’ll summarize my thoughts in a handy two-dimensional grid. (For those familiar with Gartner’s MQ, note the key difference in that the MQ compares vendor organizations, whereas I’m focusing solely on their respective platforms).
In terms of analytics capabilities, I would rank Aster Data as the strongest, with its robust implementation of SQL/MR and comprehensive libraries of advanced analytical functions. Next comes Greenplum, another vendor who was early to introduce support for Map Reduce. Netezza ranks third, having made a significant push to improve its analytics offerings in early 2010. Teradata comes fourth, and Vertica a distant fifth. (As recently as 2008, Vertica’s Mike Stonebreaker characterized Map Reduce as a “major step backwards”, although his company eventually reversed its stance and limped onto the bandwagon with a Hadoop connector. Even today, Vertica’s support for analytics is primarily limited to SQL-99 functions, though support for time series analysis and sessionization were finally introduced in version 4.0).
On the platform maturity/enterprise-readiness side, Teradata is the clear leader, followed by fellow appliance vendor Netezza. The three software-only vendors are fairly similar in terms of product maturity, which makes sense given their relative youth as companies. I would place Greenplum third, based on a more advanced ecosystem at the time, and give Vertica and Aster Data a tie for fourth.
Evaluating the Acquisitions
Now let’s take a look at the potential impact each acquisition has in terms of the analytics and product maturity, from the viewpoint of the Big Data vendor.
EMC Acquires Greenplum
This acquisition is all about product maturity. EMC immediately gives Greenplum serious credibility with enterprise clients, and we’re already seeing the impact in the market. October saw the inevitable release of an EMC-Greenplum appliance. The extra resources provided by EMC will enable Greenplum to advance its analytics, but make no mistake about it, this is movement to the right.
IBM Acquires Netezza
To me, this one has some intriguing potential. IBM now has a complete data analytics stack: Netezza for the Big Data, Cognos for BI, and SPSS for advanced data mining and statistical analysis. Not to mention Netezza now gets more access to the hardware side of IBM than they would have as an OEM customer. On the other hand, this has the highest integration risk of all the acquisitions, and IBM still needs to figure out (and tell the rest of us) where the dividing line between DB2 and Netezza will be. If they can manage the integration well, this platform will be a major contender going forward.
HP Acquires Vertica
It’s really hard to see this as anything more than a knee-jerk reaction to Neoview’s failure. Vertica’s offering pales in comparison to the other Big Data vendors in terms of analytics capabilities, and its columnar-only database is clearly a niche product (as Gartner has reiterated year after year). HP will provide some additional credibility in the enterprise space, but not nearly the same impact that Greenplum will get from EMC. And unlike IBM, HP doesn’t own any adjacent products. Combine that with all the defections after Neoview was gutted, and it’s unclear what, if anything, HP can contribute to improving the platform (aside from resources).
Teradata Acquires Aster Data
Obviously, I’m somewhat biased given that I spent the last 5 years at Aster Data, but I really believe that on paper this has the most potential out of all four acquisitions. The obvious difference between the Aster Data acquisition and the other three is that it’s the only case where the acquirer was another Big Data vendor. The synergies are clear: Aster Data has the most advanced analytics and Teradata is the most mature platform. Of course, once the rose-colored glasses come off there’s lots of work to be done. Like IBM’s challenge with Netezza and DB2, Teradata will need to figure out how these two products coexist. Beyond that, the question is how effective will they be at cross-pollinating technology – can they succeed in taking the best of both worlds to create a robust, mature analytics platform? From Aster Data’s perspective, this is a big movement to the right. For Teradata, it’s straight up.
After the Dust Settles
Once everything is said and done, I believe that there are three clear leaders in the Big Data DBMS space: EMC/Greenplum, IBM/Netezza, and Teradata/Aster Data. Post-acquisition, each of these players has the building blocks for a platform that can handle complex analytics at scale, with the level of maturity needed to expand beyond early adopters and into the wider enterprise market. On the other hand, given its history with Neoview and a spotty record at capitalizing on software acquisitions, it’s hard for me to envision HP/Vertica as a serious contender going forward. That said, with a new CEO pushing a software-driven strategy, there’s always the potential for surprises. Another acquisition, particularly one that complements Vertica’s current capabilities, could change things altogether.
Of course, one can’t talk about serious contenders without at least mentioning Oracle. Though its Exadata platform has, thus far, been used primarily for traditional data warehouse use cases, there’s a definite push to expand its capabilities and expand into the broader analytics market. Netezza, for one, has definitely taken notice, and other vendors are keeping a watchful eye.
The next 6-12 months will reveal a lot about each vendor’s integration strategy, how they plan to position themselves against each other, and whether or not Oracle can succeed in becoming a serious contender. Let the games begin!