Earlier this week, LinkedIn introduced it was open-sourcing AvroTensorDataset, which is a “TensorFlow dataset for studying, parsing, and processing Avro knowledge.” Apache Avro is the first storage format that LinkedIn makes use of for its coaching knowledge.
Based on LinkedIn, it was experiencing bottlenecks in its machine studying workloads that had been brought on by the necessity to learn a number of terabytes of enter knowledge. AvroTensorDataset can velocity up preprocessing of knowledge by a number of orders of magnitude, based on the corporate.
The device was constructed internally at LinkedIn, and it needed to open-source the venture in order that others might expertise the massive efficiency boosts to coaching workloads. It has been in manufacturing for over a yr already at LinkedIn.
LinkedIn says that with this device it has been capable of enhance processing velocity by 162x in comparison with current options and has decreased general coaching time by 66%
“ATDSDataset is LinkedIn’s resolution to effectively learn Avro knowledge into TensorFlow. By a number of efficiency enhancements, we had been capable of velocity up I/O throughput by orders of magnitude over current Avro reader options. Our workforce at LinkedIn labored carefully with the TensorFlow I/O neighborhood to open-source this characteristic, and we hope that by open-sourcing it, the TensorFlow neighborhood also can profit from these efficiency enhancements,” Jonathan Hung, workers software program engineer at LinkedIn, wrote in a weblog submit.