I threw in a note about Mapr yesterday almost as an afterthought, because Om Malik had written about it.
Mapr is building what it calls a proprietary version of Hadoop, one it says will be three times better. It's mainly set up by ex-Microsoft and EMC executives, along with an ex-Googler from India and a bunch of his Indian programmer buddies. Paid Hadoop support is offered by Cloudera.
This is going to end badly, and it's important to understand why I can write that with confidence.
It's one thing for a proprietary company to take on an established innovator, as Google did in challenging Apple's iPhone with Android. With a big market and a head start, the proprietary company has a chance, especially if it can expand its leadership into new niches, as Apple has with the iPad. Vertical integration from a strong center can be a winning strategy in consumer markets, and no matter how big Android's market share is, Apple is making more money.
Hadoop, on the other hand, is innovation that derives from an open source base. Apache gets help from many different companies to improve Hadoop. That was why Yahoo made it open source – it needed that help to keep the code base current.
Now, to compete, Mapr not only needs a better mousetrap, but it has to keep that mousetrap better, even while scaling marketing and support so it can convince companies it's better without telling them what's inside the black box – that's the way closed source works.
How long can it do that? How long can one team beat 10 teams, while it's charging top dollar and the other 10 guys are giving their stuff away?
The Apache License lets Mapr do a proprietary fork. Lots of companies offer “enterprise” versions of nominally open source products, with additional features (often usability). But they stay inside the open source system. They continue contributing to the code base, and they take advantage of the open source project's innovations.
Mapr thinks it's going to take from Hadoop, give nothing back, build on that and get people to pay it big bucks for the resulting code? Really? Really.
That may work for a while, but not long enough to matter. One beats 10 only if they have a big head start, if the 10 get to bickering, and if the one does everything right. That's not the way to bet.
You want to build your own Hadoop distribution and Distcp runs map reduce job to transfer your data from one cluster to another.
You want to build your own Hadoop distribution and Distcp runs map reduce job to transfer your data from one cluster to another.