Skip to main navigation Skip to search Skip to main content

TensorSocket: Shared Data Loading for Deep Learning Training

Research output: Journal Article or Conference Article in JournalJournal articleResearchpeer-review

Abstract

Training deep learning models is a repetitive and resource-intensive process. Data scientists often train several models before landing on a set of parameters (e.g., hyper-parameter tuning) and model architecture (e.g., neural architecture search), among other things that yield the highest accuracy. The computational efficiency of these training tasks depends highly on how well the training data is supplied to the training process. The repetitive nature of these tasks results in the same data processing pipelines running over and over, exacerbating the need for and costs of computational resources.

In this paper, we present TensorSocket to reduce the computational needs of deep learning training by enabling simultaneous training processes to share the same data loader. TensorSocket mitigates CPU-side bottlenecks in cases where the collocated training workloads have high throughput on GPU, but are held back by lower data-loading throughput on CPU. TensorSocket achieves this by reducing redundant computations and data duplication across collocated training processes and leveraging modern GPU-GPU interconnects. While doing so, TensorSocket is able to train and balance differently-sized models and serve multiple batch sizes simultaneously and is hardware- and pipeline-agnostic in nature.

Our evaluation shows that TensorSocket enables scenarios that are infeasible without data sharing, increases training throughput by up to 100%, and when utilizing cloud instances, achieves cost savings of 50% by reducing the hardware resource needs on the CPU side. Furthermore, TensorSocket outperforms the state-of-the-art solutions for shared data loading such as CoorDL and Joader; it is easier to deploy and maintain and either achieves higher or matches their throughput while requiring fewer CPU resources.
Original languageEnglish
Article number267
JournalProceedings of the ACM on Management of Data
Volume3
Issue number4
Pages (from-to)1-26
Number of pages27
DOIs
Publication statusPublished - 22 Sept 2025
EventThe ACM Symposium on Principles of Database Systems - Bengaluru, India
Duration: 31 May 20265 Jun 2026
https://2026.sigmod.org/

Conference

ConferenceThe ACM Symposium on Principles of Database Systems
Country/TerritoryIndia
CityBengaluru
Period31/05/202605/06/2026
Internet address

Keywords

  • Data Loading
  • Data Sharing
  • Work Sharing
  • Deep Learning Training
  • Systems for ML
  • Hardware Underutilization
  • Workload Collocation

Fingerprint

Dive into the research topics of 'TensorSocket: Shared Data Loading for Deep Learning Training'. Together they form a unique fingerprint.

Cite this