Teraflux

EC Project Teraflux

  • Start date: 01.01.2010
  • End date: 31.03.2014
  • Funded by: EC (European Community)
  • Local head of project: Prof. Dr. Theo Ungerer
  • Local scientists: Sebastian Weis
  • External scientists / cooperations:

Roberto Giorgi (Project Leader), Sandro Bartolini (University of Siena)

Mateo Valero, Nacho Navarro, Yoav Etsion (Barcelona Supercomputing Center, Spain)
Francois Bodin (CAPS entreprise, France)
Paolo Faraboschi, Eduardo Argollo (Hewlett-Packard Labs Barcelona, Spain)
Albert Cohen (INRIA, France)
Avi Mendelson (Microsoft R&D Israel)
Eric Lenormand, Philippe Bonnot, Teodora Petrisor (THALES)
Paraskevas Evripidou, Pedro Trancoso (University of Cyprus)

 

 

Abstract

Parallel systems will in future be widely available in form of multi-/many-core building blocks with hundreds or thousands of cores on a chip.

 

In order to address the programmability challenges of such many-cores, we combine an underlying dataflow-based thread execution with advanced programming models like transactional memory.

 

The second challenge addressed by this project is the definition of an appropriate architecture to match the proposed execution model and reliability challenges. The architectural explorations will cover the concepts of data-driven and decoupled thread execution, provide architectural support for the parallel programming model, introduce specific hardware scheduling units able to manage different levels of thread granularities, take care of code or data migration based on information passed by the virtual layer, and consider power, thermal, and fault information. The underlying architectural elements will essentially encompass a heterogeneous architecture trying to reuse existing ”off-the-shelf” or well-known components.

 

The third challenge, which is the particular objective of University of Augsburg, concerns reliability issues: such a large number of cores together with the high density of the components that are integrated into the chip results obviously in systems that will suffer from failures during runtime. These failures may be transient or permanent. The system must provide mechanisms to detect such failures and resume execution with reconfigured core, link and memory assignments in order to complete the execution successfully.

 

Our approach for evaluating the research proposals is based on a many-core simulator model provided by the COTSon simulation framework of TERAFLUX partner HP Labs.

Partners in the TERAFLUX project are the University of Siena, the Barcelona Supercomputing Center, CAPS Enterprise, Hewlett Packard, INRIA, Microsoft, THALES, the University of Cyprus, the University of Manchester, and the Chair of Systems and Networking of the University of Augsburg.

 

 

Publications

2014

 

  • TERAFLUX: Harnessing Dataflow in Next Generation Teradevices
    Roberto Giorgi, Rosa M. Badia, François Bodin, Albert Cohen, Paraskevas Evripidou, Paolo Faraboschi, Bernhard Fechner, Guang R. Gao, Arne Garbade, Rahul Gayatri, Sylvain Girbal, Daniel Goodman, Behram Khan, Souad Koliai, Joshua Landwehr, Nhat Minh Lê, Feng Li, Mikel Luján, Avi Mendelson, Laurent Morin, Nacho Navarro, Tomasz Patejko, Antoniu Pop, Pedro Trancoso, Theo Ungerer, Ian Waton, Sebastian Weis, Stéphane Zuckerman, Mateo Valero
    Journal of Microprocessors and Microsystems: Embedded Hardware Design (MICPRO), April 2014

 

2013

 

  • Fault Localization in NoCs Exploiting Periodic Heartbeat Messages in a Many-Core Environment
    Arne Garbade, Sebastian Weis, Bernhard Fechner, Theo Ungerer
    Proceedings of the 27th International Symposium on Parallel & Distributed Processing Workshops and PhD Forum (CASS 2013), Boston, USA, pages 791-795

 

  • Impact of Message-Based Fault Detectors on a Network on Chip
    Arne Garbade, Sebastian Weis, Sebastian Schlingmann, Bernhard Fechner, Theo Ungerer
    Proceedings of the 21th International Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2013), pages 470-477

 

2012

 

  • Fault Coverage of a Timing and Control Flow Checker for Hard Real-Time Systems
    Julian Wolf, Bernhard Fechner, and Theo Ungerer
    Proceeding of the 18th IEEE International On-Line Testing Symposium (IOLTS '12), Sitges, Spain, p. 161-163

 

  • Fine-Grained Timing and Control Flow Error Checking for Hard Real-Time Task Execution
    Julian Wolf, Bernhard Fechner, Sascha Uhrig, and Theo Ungerer
    Proceeding of the 7th IEEE International Symposium on Industrial Embedded Systems (SIES '12), Karlsruhe, Germany, p. 257-266

 

  • Simulating the Future kilo-x86-64 core Processors and their Infrastructure
    Antoni Portero, Alberto Scionti, Zhibin Yu, Paolo Faraboschi, Caroline Concatto, Luigi Carro, Arne Garbade, Sebastian Weis, Theo Ungerer, Roberto Giorgi
    2012 Spring Simulation Multiconference (SpringSim 2012)

 

  • Fault localization in NoCs by Timed Heartbeats
    Bernhard Fechner, Arne Garbade, Sebastian Weis, Theo Ungerer
    Proceedings of the 8th Workshop on Dependability and Fault Tolerance (ARCS / VERFE 2012), LNI 200, pages 191 - 200

 

2011

 

  • A Fault Detection and Recovery Architecture for a Teradevice Dataflow System
    Sebastian Weis, Arne Garbade, Julian Wolf, Bernhard Fechner, Avi Mendelson, Roberto Giorgi, Theo Ungerer
    Proceedings of the First Workshop on Data-Flow Execution Models for Extreme Scale Computing (DFM 2011), pages 38-44

 

  • OC Techniques Applied to Solve Reliability Problems in Future 1000-core Processors
    Arne Garbarde, Sebastian Weis, Sebastian Schlingmann, Theo Ungerer
    Organic Computing — A Paradigm Shift for Complex Systems, pages 575 - 577

 

  • Towards Fault Detection Units as an Autonomous Fault Detection Approach for Future Many-Cores
    Sebastian Weis, Arne Gabarde, Sebastian Schlingmann, Theo Ungerer
    Proceedings of the 1st Workshop on Software-Controlled, Adaptive Fault-Tolerance in Microprocessors (SCAFT 2011) at the 24th International Conference on Architecture of Computing Systems (ARCS 2011), pages 20-23

 

  • Connectivity-sensitive Algorithm for Task Placement on a Many-core Considering Faulty Regions
    Sebastian Schlingmann, Arne Garbade, Sebastian Weis, Theo Ungerer
    Proceedings of the 19th International Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2011)

 

2010

 

  • Fault detection and reliability techniques for future many-cores
    Sebasitan Weis, Arne Gabarde, Faruk Bagci, Theo Ungerer
    Poster Abstracts of the 6th international summer school on advanced computer architecture and compilation for high-performance and embedded systems (ACACES 2010), pages 175-178

 

 

Links

Search