Theo Ungerer presented "Fault Detection and Recovery in a Teradevice Dataflow Multicore System" at the First Workshop on Manufacturable and Dependable Multicore Architectures at Nanoscale (MEDIAN'12) - Annecy, France, June 1st, 2012
Here are the slides of the presentation: Fault Detection and Recovery in a Teradevice Dataflow Multicore System
Abstract:The EC Project TERAFLUX targets architecture, programmability, and reliability issues of future many-cores targeting 1000+ cores. The TERAFLUX overall approach is based on threaded dataflow execution schemes combined with transactional memory and reliability techniques.
It is expected that in 5-10 years, process technology will allow us to host 1000 and more cores on die, but the overall reliability of the die will be reduced. The system will be exposed to all different types of faults; e.g., permanent, transient, and intermittent, caused by over-heating, aging, cosmic radiation, etc. Future processors may also suffer from process variation that will impose heterogeneous behavior even for homogeneous system. This talk proposes techniques to establish a reliable overall multi-/many-core system out of unreliable components and to efficiently manage them.