Temperature Management For Time-Critical Applications Running on Multi-core Platforms

 

Project description

Use of multi-core chips to meet the non-functional requirements of Cyber-Physical Systems (CPSes) has been drawing significant interest due mainly to their potential for high performance and reliability at low cost. However, as CMOS process technology continues to scale down into the nano-meter, the power density on a chip continues to rise, generating more heat and thermal hotspots on the chip. High on-chip temperature affects the functional and timing correctness of the chip’s operation, and will eventually trigger an irrecoverable failure. Hence, proper temperature management on a multi-core chip is crucial to reduction of the risk of failure resulting from thermal hotspots, and satisfaction of the non-functional requirements of CPSes. Conventional approach using a hardware cooling system on a chip has been proved not cost-effective. To control on-chip temperature effectively, a built-in subsystem should be able to predict the thermal dynamics of a multi-core chip in real time and trigger proper power/temperature management before the chip gets overheated. However, due to dynamically changing runtime parameters in a CPS, it is very difficult to estimate the chip temperature on-the-fly.

In this project, we have been developing an efficient way of estimating multi-core chip thermal models which adapt to the system dynamics in real time. Based on these prediction models, we design a proactive peak temperature manager (PTM) which periodically estimates the occurrance of peak temperature on cores during the next PTM cycle and triggers proper dynamic temperature managments (DTMs) on the estimated-to-be-overheated cores for their cooling without violating critical applications’ timing constraints. To predict the future thermal behavior of a core at the time of PTM invocation, we propose a simple, but effective runtime method for estimating the thermal model of each core on a multi-core chip. For a given time window, each core is categorized as “overheated” or “not-overheated” using a temperature threshold, which is a design parameter (e.g., The operating temperature on an electronic chip is recommended less than 85∞C). For this purpose, the PTM needs to access the thermal profiles of the given task sets, which are obtained off-line, and the core temperatures measured via on-chip temperature sensors during runtime. Depending on the hardware support, the PTM can apply DTMs (e.g., DVFS and power-gating) either locally or globally. Such well-known DTM schemes can be incorporated easily in the proposed PTM to avoid both overheating and violation of timing constraints without modification of those algorithms.


People

Faculty

  • Kang G. Shin, Professor/Principal Investigator. Email: kgshin at eecs.umich.edu
  • C. M. Krishna, Professor, Email: krishna at ecs.umass.edu
  • Israel Koren, Professor, Email: korenat ecs.umass.edu

Post-PhD Researchers

Student

  • Buyoung Yun, Grad. Student.
  • Sinan Farmarka, Grad. Student.
  • Eugene Kim, Grad. Student.

Reports

  • C.M. Krishna. Ameliorating thermally accelerated aging with state-based application of fault-tolerance in cyber-physical computers. IEEE Transactions on ReliabilityPDF pdf.
  • Eugene Kim, Jinkyu Lee and Kang G. Shin. Real-time battery thermal management for electric vehicles. ACM/IEEE 5th International Conference on Cyber-Physical Systems. pp.~72–83, Berlin, Germany, April 14-17, 2014.PDF pdf.
  • Jinkyu Lee, Buyoung Yun, and Kang G. Shin. Reducing peak power consumption in multi-core systems without violating real-time constraints. IEEE Trans. on Parallel and Distributed Systems. vol.~25, no.~4, pp.~1024–1033, April 2014.PDF pdf.
  • Jinkyu Lee and Kang G. Shin. Schedulability analysis for a mode transition in real-time multi-core systems. IEEE Real-Time Systems Symposium.Vancouver, Canada, December 3-6, 2013.PDF pdf.
  • Jinkyu Lee, Eugene Kim, and Kang G. Shin. Design and management of satellite power systems.. IEEE Real-Time Systems Symposium.Vancouver, Canada, December 3-6, 2013.PDF pdf.
  • Buyoung Yun, Kang G. Shin and Taejoon Park. Co-design of Real-Time Control and Scheduling with Peak Temperature Minimization on Microprocessors.. submitted to the IEEE Real-Time Systems Symposium, 2013.
  • Buyoung Yun, Kang G. Shin, and Shige Wang. Predicting Thermal Behavior for Temperature Management in Time-Critical Multicore Systems.. 19th IEEE Real-Time and Embedded Technology and Applications Symposium (RTAS’13) April 8-11, 2013, Philadelphia, PA, USA.PDF pdf.
  • Buyoung Yun, Shige Wang and Kang G. Shin. Thermal-Aware Scheduling of Critical Applications Using Job Migration and Power-Gating on Multi-core chips.. The 8-th IEEE International Conference on Embedded Software and Systems, 2011
  • I. Koren, and C.M. Krishna. Temperature-aware computing.. Sustainable Computing: Informatics and Systems, Vol. 1, pp. 46-56, March 2011.PDF pdf.