5 min Analytics

What the merger of Cloudera and Hortonworks means for data analysts

What the merger of Cloudera and Hortonworks means for data analysts

A few months ago, it became known that big data companies Cloudera and Hortonworks are going to merge. Now that this takeover has been approved and all lights are turned green, it was time for an interview with Wim Stoop, senior product marketing manager at Cloudera, who can tell us everything about the vision of this merger and the benefits for companies working with the products of the two companies and of course the data analysts.

Stoop says that this merger is more or less the perfect marriage. Both companies are involved in big data based on Hadoop and have specialized in this in recent years. For example, Hortonworks is very good at Hadoop Data Flow (HDF), working with streaming data that needs to be added quickly to the Hadoop platform. Deploying in the cloud or on-premise is also something that Hortonworks is very good at.

Cloudera Data Science Workbench

With its data science workbench, Cloudera has a good solution for data analysts. With this workbench, they can quickly and easily combine and analyze data, without the immediate need for extreme processing power. With the workbench, one can experiment and test to see what kind of results this offers, before immediately applying it on a large scale. The main advantage is that the workbench can handle an enormous number of programming languages, allowing the data analyst to work in his own favorite language. The workbench also keeps track of the exact steps taken to achieve a result. The outcome is important, but the algorithm and methods that lead to the end result are just as important.

The route to a single solution

If we take a broader view, there are of course a lot of smaller things that Hortonworks or Cloudera is very good at. Or which technology is just that little bit better or more efficient than the others. That will force the two companies to make hard choices, but according to Stoop, that will all work out well. The need for a good data platform is enormous. It is inevitable that choices will have to be made.

Ultimately, the company is thus responding to the criticism that has been levelled at Hadoop. Hadoop itself forms the basis of the database, but on top of that, you can paste so many different modules that can read or read data or process data. As a result, the overview is a bit lost. The fact that there are so many solutions has to do with the open-source character and the support of companies like Cloudera and Hortonworks, who are the biggest contributors to many projects.

That is going to change as well. This year there will be a new platform called Cloudera Data Platform. In this platform, the best parts of Hortonworks and Cloudera will be merged. It also means that conflicting projects or modules are good news for one but bad news for another. Both companies now use a different solution for processing metadata, in the Cloudera Data Platform, we will only see one of them. This means that the number of modules will be a bit less, and there will be more overview, which we can only applaud.

Cloudera Data Platform

Something we hadn’t addressed was the new name of the company. The companies have opted for a merger, but eventually, the name Hortonworks will simply disappear. The company continues as Cloudera, hence the name Cloudera Data Platform.

The intention is that the Cloudera Data Platform will be available this year, so that customers can start testing with it. As soon as the platform is stable and mature enough, customers will be advised to migrate to this new platform.

All existing Cloudera and Hortonworks products will eventually disappear, but the companies will continue to support these products fully until the end of 2022. After that, however, everyone will have to switch to the Cloudera Data Platform.

Cloudera has already taken into account a migration trajectory in the most recent versions of its current products. At Hortonworks, this will now also happen. The company will take steps so that existing products and the new Data Platform will be able to work together on the migration to the new platform.

Shared Data Experience

Another innovation that, according to Stoop, will become increasingly important in the future is the shared data experience. When customers use Cloudera products, these Hadoop environments can easily be linked together, so that the resources (CPU, GPU, memory) can also be combined when analyzing data. Suppose a company has Cloudera environments for data analysis in its own data centers and cloud platforms, but then suddenly has to analyze a very large project. In that case, it could combine all these environments and deploy them together. It is also possible to combine data from local offices/branches, for example.

More innovation possible by merging

According to Stoop, a huge advantage of this merger is the development capacity that will become available to develop new innovative solutions. The companies were now often working on the same kind of projects separately from each other. To stick to the example of metadata, both companies contributed to a different project that can deal with metadata in Hadoop. Eventually, one of the two was reinventing the wheel. Given the current labor market, finding developers who also have passion and knowledge for data analysis is extremely difficult. With this merger, it is possible to work much more efficiently, and a lot of teams can be deployed for the development of new innovative solutions.

This week the Hortonworks Datasummit will take place in Barcelona. Techzine will be present at this event, and there will undoubtedly be more information about the merger, the products and the status of the new Cloudera Data Platform.