[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3123939.3123984acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Schedtask: a hardware-assisted task scheduler

Published: 14 October 2017 Publication History

Abstract

The execution of workloads such as web servers and database servers typically switches back and forth between different tasks such as user applications, system call handlers, and interrupt handlers. The combined size of the instruction footprints of such tasks typically exceeds that of the i-cache (16--32 KB). This causes a lot of i-cache misses and thereby reduces the application's performance. Hence, we propose SchedTask, a hardware-assisted task scheduler that improves the performance of such workloads by executing tasks with similar instruction footprints on the same core. We start by decomposing the combined execution of the OS and the applications into sequences of instructions called SuperFunctions. We propose a scheme to determine the amount of overlap between the instruction footprints of different SuperFunctions by using Bloom filters. We then use a hierarchical scheduler to execute SuperFunctions with similar instruction footprints on the same core. For a suite of 8 popular OS-intensive workloads, we report an increase in the application's performance of up to 29 percentage points (mean: 11.4 percentage points) over state of the art scheduling techniques.

References

[1]
2016. Filebench. (2016). https://github.com/filebench/filebench/wiki
[2]
2016. Linux Security Fix against Rowhammer Vulnerability. (2016). https://lwn.net/Articles/642069/
[3]
2016. Project Zero: Exploiting the DRAM rowhammer bug to gain kernel privileges. (2016). https://googleprojectzero.blogspot.in/2015/03/exploiting-dram-rowhammer-bug-to-gain.html
[4]
2016. TPC-H. (2016). http://www.tpc.org/tpch/
[5]
2017. Sensitivity Analysis of Core Specialization Techniques. (2017). https://arxiv.org/abs/1708.03900
[6]
Murali Annavaram, Jignesh M. Patel, and Edward S. Davidson. 2003. Call Graph Prefetching for Database Applications. ACM Transactions on Computer Systems (2003).
[7]
Islam Atta, Pinar Tozun, Anastasia Ailamaki, and Andreas Moshovos. 2012. SLICC: Self-Assembly of Instruction Cache Collectives for OLTP Workloads. In ACM/IEEE Symposium on Microarchitecture (MICRO).
[8]
Islam Atta, Pinar Tözün, Xin Tong, Anastasia Ailamaki, and Andreas Moshovos. 2013. STREX: boosting instruction cache reuse in OLTP workloads through stratified transaction execution. In ACM International Symposium on Computer Architecture (ISCA).
[9]
Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The multikernel: a new OS architecture for scalable multicore systems. In ACM Symposium on Operating Systems Principles (SOSP).
[10]
Fabrice Bellard. 2005. QEMU, a Fast and Portable Dynamic Translator. In USENIX Annual Technical Conference, FREENIX Track.
[11]
Muli Ben-Yehuda, Michael D Day, Zvi Dubitzky, Michael Factor, Nadav Har'El, Abel Gordon, Anthony Liguori, Orit Wasserman, and Ben-Ami Yassour. 2010. The Turtles Project: Design and Implementation of Nested Virtualization. (2010). http://dl.acm.org/citation.cfm?id=1924943.1924973
[12]
Rohan Bhalla, Prathmesh Kallurkar, Nitin Gupta, and Smruti R Sarangi. 2014. TriKon: A Hypervisor Aware Manycore Processor. In IEEE International Conference on High Performance Computing (HiPC). http://ieeexplore.ieee.org/document/7116710/
[13]
Burton H. Bloom. 1970. Space/Time Trade-offs in Hash Coding with Allowable Errors. Commun. ACM (1970).
[14]
Silas Boyd-Wickizer, Haibo Chen, Rong Chen, Yandong Mao, M Frans Kaashoek, Robert Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yue-hua Dai, et al. 2008. Corey: An Operating System for Many Cores. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). http://dl.acm.org/citation.cfm?id=1855741.1855745
[15]
Silas Boyd-Wickizer, Austin T Clements, Yandong Mao, Aleksey Pesterev, M Frans Kaashoek, Robert Morris, Nickolai Zeldovich, et al. 2010. An Analysis of Linux Scalability to Many Cores. In USENIX Symposium on Operating Systems Design and Implementation (OSDI).
[16]
Koushik Chakraborty, Philip M Wells, and Gurindar S Sohi. 2006. Computation Spreading: Employing Hardware Migration to Specialize CMP Cores On-the-fly. In ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[17]
S. Chandran, P. Kallurkar, P. Gupta, and S.R. Sarangi. 2014. Architectural Support for Handling Jitter in Shared Memory Based Parallel Applications. IEEE Transactions on Parallel and Distributed Systems (2014).
[18]
Nachiappan Chidambaram Nachiappan, Praveen Yedlapalli, Niranjan Soundararajan, Mahmut Taylan Kandemir, Anand Sivasubramaniam, and Chita R Das. 2014. GemDroid: a framework to evaluate mobile platforms. ACM SIGMETRICS Performance Evaluation Review (2014).
[19]
Michael Ferdman, Cansu Kaynak, and Babak Falsafi. 2011. Proactive Instruction Fetch. In ACM/IEEE Symposium on Microarchitecture (MICRO).
[20]
Abel Gordon, Nadav Amit, Nadav Har'El, Muli Ben-Yehuda, Alex Landau, Assaf Schuster, and Dan Tsafrir. 2012. ELI: bare-metal performance for I/O virtualization. ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (2012).
[21]
Sangjin Han, Scott Marshall, Byung-Gon Chun, and Sylvia Ratnasamy. 2012. MegaPipe: A New Programming Interface for Scalable Network I/O. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). http://dl.acm.org/citation.cfm?id=2387880.2387894
[22]
Stavros Harizopoulos and Anastassia Ailamaki. 2004. STEPS towards cache-resident transaction processing. In International Conference on Very Large Databases (VLDB). http://dl.acm.org/citation.cfm?id=1316689.1316747
[23]
Michio Honda, Felipe Huici, Costin Raiciu, Joao Araujo, and Luigi Rizzo. 2014. Rekindling network protocol innovation with user-level stacks. ACM SIGCOMM Computer Communication Review (2014).
[24]
Raj Jain, Arjan Durresi, and Gojko Babic. 1999. Throughput fairness index: An explanation. Technical Report. Tech. rep., Department of CIS, The Ohio State University.
[25]
EunYoung Jeong, Shinae Woo, Muhammad Asim Jamshed, Haewon Jeong, Sunghwan Ihm, Dongsu Han, and KyoungSoo Park. 2014. mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems. In USENIX Symposium on Networked Systems Design and Implementation (NSDI). http://dl.acm.org/citation.cfm?id=2616448.2616493
[26]
Prathmesh Kallurkar and Smruti R Sarangi. 2016. pTask: A Smart Prefetching Scheme for OS Intensive Applications. In ACM/IEEE Symposium on Microarchitecture (MICRO).
[27]
Aasheesh Kolli, Ali Saidi, and Thomas F Wenisch. 2013. RDIP: return-address-stack directed instruction prefetching. In ACM/IEEE Symposium on Microarchitecture (MICRO).
[28]
Alexey Kopytov. 2004. SysBench: a system performance benchmark. (2004).
[29]
Robert F Krick, Glenn J Hinton, Michael D Upton, David J Sager, and Chan W Lee. 2000. Trace based instruction caching. (2000). US Patent 6,018,786.
[30]
Min Lee. 2013. Memory region: a system abstraction for managing the complex memory structures of multicore platforms. Ph.D. Dissertation. Georgia Institute of Technology.
[31]
Pierre Michaud. 2004. Exploiting the cache capacity of a single-chip multi-core processor with execution migration. In IEEE International Symposium on High-Performance Computer Architecture (HPCA).
[32]
David Nellans, Rajeev Balasubramonian, and Erik Brunvand. 2009. Interference Aware Cache Designs for Operating System Execution. University of Utah, Tech. Rep. UUCS-09-002 (2009).
[33]
Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan Mangard. 2016. DRAMA: Exploiting DRAM Addressing for Cross-CPU Attacks. In USENIX Security Symposium. Austin, TX. https://www.usenix.org/node/197211
[34]
S. R. Sarangi, Kalayappan Rajshekar, Kallurkar Prathmesh, Goel Seep, and Peter Eldhose. 2015. Tejas: A Java based Versatile Micro-architectural Simulator. In IEEE International Workshop on Power And Timing Modeling, Optimization and Simulation (PATMOS).
[35]
Pranab Kumar Sen. 1968. Estimates of the Regression Coefficient Based on Kendall's Tau. J. Amer. Statist. Assoc. (1968).
[36]
Livio Soares and Michael Stumm. 2010. FlexSC: Flexible System Call Scheduling with Exception-Less System Calls. In USENIX Symposium on Operating Systems Design and Implementation (OSDI). http://dl.acm.org/citation.cfm?id=1924943.1924946
[37]
Kenzo Van Craeynest, Aamer Jaleel, Lieven Eeckhout, Paolo Narvaez, and Joel Emer. 2012. Scheduling Heterogeneous Multi-Cores through Performance Impact Estimation (PIE). In ACM International Symposium on Computer Architecture (ISCA). http://dl.acm.org/citation.cfm?id=2337159.2337184
[38]
Victor van der Veen, Yanick Fratantonio, Martina Lindorfer, Daniel Gruss, Clémentine Maurice, Giovanni Vigna, Herbert Bos, Kaveh Razavi, and Cristiano Giuffrida. 2016. Drammer: Deterministic Rowhammer Attacks on Mobile Platforms. In ACM SIGSAC Conference on Computer and Communications Security.
[39]
David Wentzlaff and Anant Agarwal. 2009. Factored Operating Systems (fos): The Case for a Scalable Operating System for Multicores. ACM SIGOPS Operating System Review (OSR) (2009).
[40]
Matthew Wilcox. 2003. I'll Do It Later: Softirqs, Tasklets, Bottom Halves, Task Queues, Work Queues and Timers. In linux. conf. au.

Cited By

View all
  • (2023)FARSI: An Early-stage Design Space Exploration Framework to Tame the Domain-specific System-on-chip ComplexityACM Transactions on Embedded Computing Systems10.1145/354401622:2(1-35)Online publication date: 24-Jan-2023
  • (2021)CuckoOnsai: An Efficient Memory Authentication Using Amalgam of Cuckoo Filters and Integrity Trees2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586205(1273-1278)Online publication date: 5-Dec-2021
  • (2020)SecSchedProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414631(229-240)Online publication date: 30-Sep-2020
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-50 '17: Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture
October 2017
850 pages
ISBN:9781450349529
DOI:10.1145/3123939
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 October 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. architectural support for operating system
  2. cache pollution
  3. scheduling

Qualifiers

  • Research-article

Conference

MICRO-50
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)2
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)FARSI: An Early-stage Design Space Exploration Framework to Tame the Domain-specific System-on-chip ComplexityACM Transactions on Embedded Computing Systems10.1145/354401622:2(1-35)Online publication date: 24-Jan-2023
  • (2021)CuckoOnsai: An Efficient Memory Authentication Using Amalgam of Cuckoo Filters and Integrity Trees2021 58th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC18074.2021.9586205(1273-1278)Online publication date: 5-Dec-2021
  • (2020)SecSchedProceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques10.1145/3410463.3414631(229-240)Online publication date: 30-Sep-2020
  • (2020)SoftMonProceedings of the 17th International Conference on Mining Software Repositories10.1145/3379597.3387444(397-408)Online publication date: 29-Jun-2020
  • (2020)VisSched: An Auction based Scheduler for Vision Workloads on Heterogeneous ProcessorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2020.3013076(1-1)Online publication date: 2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media