Extremely Parallel and Incredibly Diverse Data Processing on Many Heterogeneous Cores

Projekter: ProjektForskning

Projektdetaljer

Beskrivelse

The amount of data that we generate and collect keeps increasing. However,
transforming the sheer amount of complex data into discoveries that influence our society requires data-intensive systems to utilize the full processing power offered by the server hardware, and the server hardware to keep on offering more processing power to data-intensive systems. Status quo is not an option.

Server hardware has evolved from faster and more complex single-core
processors to multicores with general-purpose cores whose speed and
complexity have stayed stable over the years. Traditional data-intensive systems
had to go through fundamental changes to exploit this hardware evolution well.
Today, the traditional multicore design faces a challenge. Adding more and more
cores to a processor is not sustainable if we cannot power all of those cores up
simultaneously. The focus needs to shift toward minimizing energy per instruction. Simply using low-power cores is unsuitable for latency-critical tasks since such cores take longer time to complete a task. Therefore, this era requires hardware specialization especially for frequently executed tasks to optimize
overall power consumption instead of executing each task on power-inefficient
general-purpose processors. The emerging server hardware landscape, therefore,
will likely be composed of a diverse set of processing units; each specialized to
execute a specific task very well, with opportunities for extreme levels of
parallelism.
On such hardware, software systems must pick the cores to power-up at a time
based on the active tasks while shutting down the cores that are not required.
Once this is done effectively, data-intensive systems can go a step further and
also influence the hardware specialization decisions. The XPD project targets
these strategic challenges.
With this goal in mind, XPD will target modern machine learning platforms such
as TensorFlow, SystemML, PyTorch, RAPIDS, etc. Some of these platforms already
target hardware heterogeneity in the form of CPU-GPU co-processors. However,
being relatively new, it is not clear how well they utilize modern hardware
resources and how they would behave on emerging more heterogeneous and
parallel server hardware. The research questions XPD will focus on are:
(1) How well the existing platforms utilize modern heterogeneous hardware?
(2) How can one improve/automate the scheduling decisions on such hardware?
(3) How can these data processing platforms influence hardware
specialization decisions?

Lægmandssprog

Its year 2025. Sabina is a data scientist and starting a new project in NLP group of ITU about speech recognition for Danish language to be used in virtual assistants. The lab built a new server infrastructure with state-of-the-art hardware and larger scale to be able to process more data and achieve more accurate language models for this project. However, she realizes the new, and very expensive and more energy consuming, server infrastructure does not improve her results compared to the old infrastructure they have. In addition, she has to compete for the time on this infrastructure with other researchers in the NLP group. They are discussing buying more hardware, which means more money and energy consumption. Sabina is upset because of the unsustainable infrastructure they built. She is unsure how to utilize it better. Then, she remembers the Resource-Aware Data Science (RAD) project one of her colleagues, Pinar, has been working on. She goes to her for help. By adopting the tools and methodology developed in RAD project, Sabina manages to double the efficiency of her data science processing pipeline over the new server infrastructure. In addition, NLP group starts to share this server infrastructure more efficiently across different researchers obviating the need to buy more hardware.
Kort titelRAD
AkronymRAD
StatusIgangværende
Effektiv start/slut dato01/04/202131/03/2025

Finansiering

  • Danmarks Frie Forskningsfond: 6.190.775,00 kr.

Emneord

  • Resource-Aware ML

Fingerprint

Udforsk forskningsemnerne, som dette projekt berører. Disse etiketter er oprettet på grundlag af de underliggende bevillinger/legater. Sammen danner de et unikt fingerprint.
  • RAD+: Resource-Aware Data Science

    Tözün, P. (PI), Rosero, P. (CoI), Nielsen , N. K. (CoI), Tøttrup, J. V. (CoI), Bayer, R. (CoI), Duane, A. (CoI), Hvass Jørgensen, J. (CoI), Osterhammel, J. M. (CoI) & Sørensen, P. K. (Admin)

    Danmarks Frie Forskningsfond

    01/12/202131/03/2025

    Projekter: ProjektForskning