More Web Proxy on the site http://driver.im/

research-article

Schedtask: a hardware-assisted task scheduler

Authors:

Prathmesh Kallurkar,

Smruti R. SarangiAuthors Info & Claims

MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 612 - 624

https://doi.org/10.1145/3123939.3123984

Published: 14 October 2017 Publication History

Abstract

The execution of workloads such as web servers and database servers typically switches back and forth between different tasks such as user applications, system call handlers, and interrupt handlers. The combined size of the instruction footprints of such tasks typically exceeds that of the i-cache (16--32 KB). This causes a lot of i-cache misses and thereby reduces the application's performance. Hence, we propose SchedTask, a hardware-assisted task scheduler that improves the performance of such workloads by executing tasks with similar instruction footprints on the same core. We start by decomposing the combined execution of the OS and the applications into sequences of instructions called SuperFunctions. We propose a scheme to determine the amount of overlap between the instruction footprints of different SuperFunctions by using Bloom filters. We then use a hierarchical scheduler to execute SuperFunctions with similar instruction footprints on the same core. For a suite of 8 popular OS-intensive workloads, we report an increase in the application's performance of up to 29 percentage points (mean: 11.4 percentage points) over state of the art scheduling techniques.

References

[1]

2016. Filebench. (2016). https://github.com/filebench/filebench/wiki

[2]

2016. Linux Security Fix against Rowhammer Vulnerability. (2016). https://lwn.net/Articles/642069/

[3]

2016. Project Zero: Exploiting the DRAM rowhammer bug to gain kernel privileges. (2016). https://googleprojectzero.blogspot.in/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

[4]

2016. TPC-H. (2016). http://www.tpc.org/tpch/

[5]

2017. Sensitivity Analysis of Core Specialization Techniques. (2017). https://arxiv.org/abs/1708.03900

[6]

Murali Annavaram, Jignesh M. Patel, and Edward S. Davidson. 2003. Call Graph Prefetching for Database Applications. ACM Transactions on Computer Systems (2003).

Digital Library

[7]

Islam Atta, Pinar Tozun, Anastasia Ailamaki, and Andreas Moshovos. 2012. SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads. In ACM/IEEE Symposium on Microarchitecture (MICRO).

Digital Library

[8]

Islam Atta, Pinar Tözün, Xin Tong, Anastasia Ailamaki, and Andreas Moshovos. 2013. STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution. In ACM International Symposium on Computer Architecture (ISCA).

Digital Library

[9]

Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The multikernel: a new OS architecture for scalable multicore systems. In ACM Symposium on Operating Systems Principles (SOSP).

Digital Library

[10]

Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In USENIX Annual Technical Conference, FREENIX Track.

Digital Library

[11]

Muli Ben-Yehuda, Michael D Day, Zvi Dubitzky, Michael Factor, Nadav Har'El, Abel Gordon, Anthony Liguori, Orit Wasserman, and Ben-Ami Yassour. 2010. The Turtles Project: Design and Implementation of Nested Virtualization. (2010). http://dl.acm.org/citation.cfm?id=1924943.1924973

Digital Library

[12]

Rohan Bhalla, Prathmesh Kallurkar, Nitin Gupta, and Smruti R Sarangi. 2014. TriKon: A Hypervisor Aware Manycore Processor. In IEEE International Conference on High Performance Computing (HiPC). http://ieeexplore.ieee.org/document/7116710/

[13]

Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM (1970).

Digital Library

[14]

Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, M Frans Kaashoek, Robert Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yue-hua Dai, et al. 2008. Corey: An Operating System for Many Cores. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). http://dl.acm.org/citation.cfm?id=1855741.1855745

Digital Library

[15]

Silas Boyd-Wickizer, Austin T Clements, Yandong Mao, Aleksey Pesterev, M Frans Kaashoek, Robert Morris, Nickolai Zeldovich, et al. 2010. An Analysis of Linux Scalability to Many Cores. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).

Digital Library

[16]

Koushik Chakraborty, Philip M Wells, and Gurindar S Sohi. 2006. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).

Digital Library

[17]

S. Chandran, P. Kallurkar, P. Gupta, and S.R. Sarangi. 2014. Architectural Support for Handling Jitter in Shared Memory Based Parallel Applications. IEEE Transactions on Parallel and Distributed Systems (2014).

Digital Library

[18]

Nachiappan Chidambaram Nachiappan, Praveen Yedlapalli, Niranjan Soundararajan, Mahmut Taylan Kandemir, Anand Sivasubramaniam, and Chita R Das. 2014. GemDroid: a framework to evaluate mobile platforms. ACM SIGMETRICS Performance Evaluation Review (2014).

Digital Library

[19]

Michael Ferdman, Cansu Kaynak, and Babak Falsafi. 2011. Proactive Instruction Fetch. In ACM/IEEE Symposium on Microarchitecture (MICRO).

Digital Library

[20]

Abel Gordon, Nadav Amit, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. 2012. ELI: bare-metal performance for I/O virtualization. ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2012).

Digital Library

[21]

Sangjin Han, Scott Marshall, Byung-Gon Chun, and Sylvia Ratnasamy. 2012. MegaPipe: A New Programming Interface for Scalable Network I/O. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). http://dl.acm.org/citation.cfm?id=2387880.2387894

Digital Library

[22]

Stavros Harizopoulos and Anastassia Ailamaki. 2004. STEPS towards cache-resident transaction processing. In International Conference on Very Large Databases (VLDB). http://dl.acm.org/citation.cfm?id=1316689.1316747

Digital Library

[23]

Michio Honda, Felipe Huici, Costin Raiciu, Joao Araujo, and Luigi Rizzo. 2014. Rekindling network protocol innovation with user-level stacks. ACM SIGCOMM Computer Communication Review (2014).

Digital Library

[24]

Raj Jain, Arjan Durresi, and Gojko Babic. 1999. Throughput fairness index: An explanation. Technical Report. Tech. rep., Department of CIS, The Ohio State University.

[25]

EunYoung Jeong, Shinae Woo, Muhammad Asim Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). http://dl.acm.org/citation.cfm?id=2616448.2616493

Digital Library

[26]

Prathmesh Kallurkar and Smruti R Sarangi. 2016. pTask: A Smart Prefetching Scheme for OS Intensive Applications. In ACM/IEEE Symposium on Microarchitecture (MICRO).

[27]

Aasheesh Kolli, Ali Saidi, and Thomas F Wenisch. 2013. RDIP: return-address-stack directed instruction prefetching. In ACM/IEEE Symposium on Microarchitecture (MICRO).

Digital Library

[28]

Alexey Kopytov. 2004. SysBench: a system performance benchmark. (2004).

[29]

Robert F Krick, Glenn J Hinton, Michael D Upton, David J Sager, and Chan W Lee. 2000. Trace based instruction caching. (2000). US Patent 6,018,786.

[30]

Min Lee. 2013. Memory region: a system abstraction for managing the complex memory structures of multicore platforms. Ph.D. Dissertation. Georgia Institute of Technology.

[31]

Pierre Michaud. 2004. Exploiting the cache capacity of a single-chip multi-core processor with execution migration. In IEEE International Symposium on High-Performance Computer Architecture (HPCA).

Digital Library

[32]

David Nellans, Rajeev Balasubramonian, and Erik Brunvand. 2009. Interference Aware Cache Designs for Operating System Execution. University of Utah, Tech. Rep. UUCS-09-002 (2009).

[33]

Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan Mangard. 2016. DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks. In USENIX Security Symposium. Austin, TX. https://www.usenix.org/node/197211

[34]

S. R. Sarangi, Kalayappan Rajshekar, Kallurkar Prathmesh, Goel Seep, and Peter Eldhose. 2015. Tejas: A Java based Versatile Micro-architectural Simulator. In IEEE International Workshop on Power And Timing Modeling, Optimization and Simulation (PATMOS).

[35]

Pranab Kumar Sen. 1968. Estimates of the Regression Coefficient Based on Kendall's Tau. J. Amer. Statist. Assoc. (1968).

[36]

Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call Scheduling with Exception-Less System Calls. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). http://dl.acm.org/citation.cfm?id=1924943.1924946

Digital Library

[37]

Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling Heterogeneous Multi-Cores through Performance Impact Estimation (PIE). In ACM International Symposium on Computer Architecture (ISCA). http://dl.acm.org/citation.cfm?id=2337159.2337184

Digital Library

[38]

Victor van der Veen, Yanick Fratantonio, Martina Lindorfer, Daniel Gruss, Clémentine Maurice, Giovanni Vigna, Herbert Bos, Kaveh Razavi, and Cristiano Giuffrida. 2016. Drammer: Deterministic Rowhammer Attacks on Mobile Platforms. In ACM SIGSAC Conference on Computer and Communications Security.

Digital Library

[39]

David Wentzlaff and Anant Agarwal. 2009. Factored Operating Systems (fos): The Case for a Scalable Operating System for Multicores. ACM SIGOPS Operating System Review (OSR) (2009).

Digital Library

[40]

Matthew Wilcox. 2003. I'll Do It Later: Softirqs, Tasklets, Bottom Halves, Task Queues, Work Queues and Timers. In linux. conf. au.

Cited By

Boroujerdian BJing YTripathy DKumar ASubramanian LYen LLee VVenkatesan VJindal AShearer RReddi V(2023)FARSI: An Early-stage Design Space Exploration Framework to Tame the Domain-specific System-on-chip ComplexityACM Transactions on Embedded Computing Systems10.1145/354401622:2(1-35)Online publication date: 24-Jan-2023
https://dl.acm.org/doi/10.1145/3544016
Shafi OAbidi I(2021)CuckoOnsai: An Efficient Memory Authentication Using Amalgam of Cuckoo Filters and Integrity Trees2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586205(1273-1278)Online publication date: 5-Dec-2021
https://doi.org/10.1109/DAC18074.2021.9586205
Shafi OBashir JSarkar VKim H(2020)SecSchedProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414631(229-240)Online publication date: 30-Sep-2020
https://dl.acm.org/doi/10.1145/3410463.3414631
Show More Cited By

Index Terms

Schedtask: a hardware-assisted task scheduler
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
    2. Parallel architectures
      1. Multicore architectures
2. Software and its engineering
  1. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Virtual memory
        Process management
        Scheduling

Recommendations

Understanding the effects of wrong-path memory references on processor performance
WMPI '04: Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture

High-performance out-of-order processors spend a significant portion of their execution time on the incorrect program path even though they employ aggressive branch prediction algorithms. Although memory references generated on the wrong path do not ...
Using the first-level caches as filters to reduce the pollution caused by speculative memory references

High-performance processors employ aggressive branch prediction and prefetching techniques to increase performance. Speculative memory references caused by these techniques sometimes bring data into the caches that are not needed by correct execution. ...
Global-aware and multi-order context-based prefetching for high-performance processors

Data prefetching is widely used in high-end computing systems to accelerate data accesses and to bridge the increasing performance gap between processor and memory. Context-based prefetching has become a primary focus of study in recent years due to its ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture

October 2017

850 pages

ISBN:9781450349529

DOI:10.1145/3123939

General Chairs:
Hillery Hunter
IBM Research
,
Jaime Moreno
IBM Research
,
Program Chairs:
Joel Emer
NVIDIA and MIT
,
Daniel Sanchez
MIT

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MICRO-50

Sponsor:

SIGMICRO
IEEE-CS\DATC

MICRO-50: The 50th Annual IEEE/ACM International Symposium on Microarchitecture

October 14 - 18, 2017

Massachusetts, Cambridge

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
518
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)2

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Boroujerdian BJing YTripathy DKumar ASubramanian LYen LLee VVenkatesan VJindal AShearer RReddi V(2023)FARSI: An Early-stage Design Space Exploration Framework to Tame the Domain-specific System-on-chip ComplexityACM Transactions on Embedded Computing Systems10.1145/354401622:2(1-35)Online publication date: 24-Jan-2023
https://dl.acm.org/doi/10.1145/3544016
Shafi OAbidi I(2021)CuckoOnsai: An Efficient Memory Authentication Using Amalgam of Cuckoo Filters and Integrity Trees2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586205(1273-1278)Online publication date: 5-Dec-2021
https://doi.org/10.1109/DAC18074.2021.9586205
Shafi OBashir JSarkar VKim H(2020)SecSchedProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414631(229-240)Online publication date: 30-Sep-2020
https://dl.acm.org/doi/10.1145/3410463.3414631
Singh SSarangi S(2020)SoftMonProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387444(397-408)Online publication date: 29-Jun-2020
https://dl.acm.org/doi/10.1145/3379597.3387444
Moolchandani DKumar AMartinez JSarangi S(2020)VisSched: An Auction based Scheduler for Vision Workloads on Heterogeneous ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.3013076(1-1)Online publication date: 2020
https://doi.org/10.1109/TCAD.2020.3013076

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents