Chances are you’ve never heard of Dataiku. The company that focuses on collaborating on data science projects was founded in 2013, after which it grew considerably in the following years. This is mainly because the Data Science Studio (DSS) platform can exist alongside existing data science initiatives and tools. All kinds of employees – from data analysts and data scientists to engineers and business analysts – can collaborate with DSS on analytical and artificial intelligence (AI) projects. A bit of simplification that is more than welcome will agree with the opinion of the professional.
The focus of Dataiku is, of course, broad. For the real data scientist, DSS comes up with possibilities for performing professional tasks, while less trained analysts and other employees also find suitable possibilities on the platform. Because of the explosive growth in data, more and more employees are becoming involved in analytics or AI projects. The data explosion also causes a variety of issues and components to be taken into account: different programming languages, building data pipelines, model governance, making initiatives operational, and so on. Everything related to analytics and AI has grown in size in recent years. This also means that the average organization is committed to expanding its AI initiatives.
Enough reason to realize a more uniform way of working in the data science maze, something that collaboration software could achieve. Such an approach has already proven itself in several IT disciplines. For example, Atlassian invented various collaboration tools to simplify the work of software developers significantly. With DSS, Dataiku has found a similar trick for the data science world.
Four pillars form the basis for DSS
The platform supports just about everything that has to do with data science. Hadoop, NoSQL, Tensorflow: a lot of things have been thought of. However, this extensive character does not make DSS incomprehensible. The platform has four pillars on which it mainly rests. One of them is the ‘Automated Machine Learning’ engine, to facilitate and accelerate the construction of machine learning models. In theory, the only thing that can be done with the help of the visual interface is to be able to make a model fairly quickly. For example, a user can determine the task of the model with one click and datasets are automatically analyzed to determine what the best algorithm could be. Ultimately, the precise application of the model determines how fast it can be built, because sometimes more advanced components are needed that may have to be written with R. The model can be used in a variety of ways.
Another pillar of DSS is called ‘Collaborative AI’ and shows the real collaboration character of the platform. Most of the DSS components that have a pure collaboration focus are included. This could include data scientists who write common model components so that they can be reused by non-coders. For data scientists themselves, there are also many useful Collaborative AI components. These include a timeline that provides updates on past activities, as well as automating repetitive Python tasks.
Within DSS, we also find the pillar ‘Model Deployment’, which focuses more on the management and governance of data and models. To this end, DSS provides administrators with various options. On the platform, they can set up rules that determine which data can be used for modelling. It is also possible for them to see which data is used where. That way, everything runs according to the guidelines. This is to comply with the company regulations and the General Data Protection Regulation (GDPR).
Both focus on SMEs and enterprises
Finally, there is the pillar that Dataiku calls ‘Enterprise Scaling’. In principle, Dataiku wants to offer a platform that any organization can use, regardless of its size. In this way, it can already provide a nice list of the big names that are already using the platform. Mercedes-Benz, Rabobank, Palo Alto Networks, Ubisoft and Unilever are examples of large organizations already using DSS. Initially, Dataiku’s focus was on the top end of the market, which is still visible in the customer base.
According to Dataiku, however, the platform contains enough functionalities to support SMEs. On more than one occasion, organizations with a few hundred employees have put the platform into use within a few months, after which they have grown over time. This includes some technical expansion. Dataiku supports such an expansion by, for example, making the addition of new nodes relatively simple.
In addition, Dataiku has launched two versions of its platform that are specifically intended for startups, the mid-market and medium-sized companies. Dataiku recognizes that it is mainly the large enterprise organizations that make valuable use of data science. The free version, therefore, offers a number of basic functionalities of DSS, while the ‘lite’ version should be a suitable starting point for smaller companies.
Good basis for further growth
The four core components of the Dataiku platform provide a solid basis for simplifying analytics and AI initiatives. It can be an addition to the data science tools that companies already use. For example, SAS can be used as a data source for DSS, but the platform also supports the integration of the AWS SageMaker engine.
Ultimately, DSS offers many possibilities that promise enough for the further growth of Dataiku. It is clear that the platform solves a problem that many companies are facing, but that Dataiku has not yet reached its full market potential. The company could expand its customer base even further. Therefore, we are curious to see how the product and the business will develop.