2 min Analytics

LinkedIn tool prepares data for TensorFlow

LinkedIn tool prepares data for TensorFlow

LinkedIn has created an internally built open source tool. This is Avro2TF, a conversion tool that transforms data from Apache Spark to a format that can easily be used by TensorFlow, for machine learning purposes.

The new tool allows data scientists and other users to convert data sets in Apache Avro format into a pattern that can easily be used by TensorFlow, writes Silicon Angle. LinkedIn engineers use the Apache Avro format a lot. The advantage of the tool is that engineers and developers no longer have to focus on preparing the data, but can focus on their machine learning models.

LinkedIn engineers argue that they have developed Avro2TF to create a solution aimed at “scalable data conversion”. The tool must support all kinds of data formats to be read by Spark. LinkedIn says it thinks that many organizations can benefit from Avro2TF, because Microsoft is not the only company that had trouble converting data for machine learning purposes.

“Many companies have large amounts of machine learning data in similar sparse vector formats, and the Tensor format is still relatively new to many companies,” said engineers Xuhong Zhang, Chenya Zhang and Yiming Ma. “Avro2TF closes this gap by providing scalable Spark-based transformation and extension mechanisms to efficiently convert the data into TF records that can be used directly by TensorFlow.

Open source

Avro2TF is the latest in a series of machine learning-based tools that LinkedIn has created open source. In doing so, it fulfils its mission to “democratize machine learning”. “One of the most important lessons we learned from this trip is the importance of providing good deep learning platforms that help our modeling engineers become more efficient and productive,” says the engineers.

For example, the company made TonY open source available in September last year. This makes it possible to connect the machine learning framework TensorFlow to data stored in Apache Hadoop.

This news article was automatically translated from Dutch to give Techzine.eu a head start. All news articles after September 1, 2019 are written in native English and NOT translated. All our background stories are written in native English as well. For more information read our launch article.