Several machine learning groups are joining forces in a Python consortium. This consortium will determine the Python Data API standards. The machine learning groups Apache MXNet and Open Neural Network Exchange were the once to start this initiative.
The idea is that the collaboration between machine learning specialist and data scientists will improve. Regardless of which framework or other tools from the Python ecosystem are used. The consortium will set standards for the data frames and tensors. The aim is to reduce the fragmentation that is currently causing increasing problems within the Python ecosystem.
Python is a language used in Pandas PySpark and Apache Arrow, as well as TensorFlow and PyTorch. The latter is interesting: PyTorch doesn’t join the consortium. It is a widely used machine learning framework. A Facebook spokesperson states that currently all data frame libraries have the same API’s, but with enough differences so that it is not really possible to use them alternately.
The consortium is planning to grow. “We would like to grow to cross-project and cross-ecosystem collaboration in terms of APIs, data transfer mechanisms and so on”. However, this requires coordination and communication. Even more than technical innovation. That is why the consortium is being set up. The first thing the group will work on is setting up the working group. Then it has to lay the foundation for the initial standard.
It will then ask for feedback from dataframe-library staff and make further adjustments. That feedback session is scheduled for next month. Work to be done, because only after all that work, the standard becomes a fact. In addition, tools also need to be created. These tools will allow people to compare arrays and tensors or keep track of the function of the library.
The Open Neural Network Exchange is also a kind of consortium, because it was once started by Facebook and Microsoft. Now there are 40 organizations connected, including IBM, Baidu, Intel and Qualcomm.