Ward et al., 2021 - Google Patents
Colmena: Scalable machine-learning-based steering of ensemble simulations for high performance computingWard et al., 2021
View PDF- Document ID
- 11594227040764658545
- Author
- Ward L
- Sivaraman G
- Pauloski J
- Babuji Y
- Chard R
- Dandu N
- Redfern P
- Assary R
- Chard K
- Curtiss L
- Thakur R
- Foster I
- Publication year
- Publication venue
- 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC)
External Links
Snippet
Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for …
- 238000010801 machine learning 0 title abstract description 65
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/485—Task life-cycle, e.g. stopping, restarting, resuming execution
- G06F9/4856—Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/44—Arrangements for executing specific programmes
- G06F9/455—Emulation; Software simulation, i.e. virtualisation or emulation of application or operating system execution engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a programme unit and a register, e.g. for a simultaneous processing of several programmes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformations of program code
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
- G06Q10/063—Operations research or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ward et al. | Colmena: Scalable machine-learning-based steering of ensemble simulations for high performance computing | |
Wang et al. | Distributed machine learning with a serverless architecture | |
Gu et al. | Liquid: Intelligent resource estimation and network-efficient scheduling for deep learning jobs on distributed GPU clusters | |
Chen et al. | Deep learning research and development platform: Characterizing and scheduling with qos guarantees on gpu clusters | |
Teng et al. | Simmapreduce: A simulator for modeling mapreduce framework | |
Marozzo et al. | JS4Cloud: script‐based workflow programming for scalable data analysis on cloud platforms | |
CN109617939B (en) | WebIDE cloud server resource allocation method based on task pre-scheduling | |
Abdullah et al. | Diminishing returns and deep learning for adaptive CPU resource allocation of containers | |
Georgiou et al. | Topology-aware job mapping | |
Teijeiro et al. | Towards cloud-based parallel metaheuristics: a case study in computational biology with differential evolution and spark | |
Harichane et al. | KubeSC‐RTP: Smart scheduler for Kubernetes platform on CPU‐GPU heterogeneous systems | |
LaSalle et al. | Mpi for big data: New tricks for an old dog | |
Tchernykh et al. | Mitigating uncertainty in developing and applying scientific applications in an integrated computing environment | |
Ward et al. | Employing artificial intelligence to steer exascale workflows with colmena | |
Dongarra et al. | Parallel processing and applied mathematics | |
Legashev et al. | An effective scheduling method in the cloud system of collective access, for virtual working environments | |
Baheri | Mars: Multi-scalable actor-critic reinforcement learning scheduler | |
Do et al. | Co-scheduling ensembles of in situ workflows | |
Shih et al. | Performance study of parallel programming on cloud computing environments using MapReduce | |
Balis et al. | Improving prediction of computational job execution times with machine learning | |
Pinel et al. | Savant: Automatic generation of a parallel scheduling heuristic for map-reduce | |
Funika et al. | Automatic management of cloud applications with use of proximal policy optimization | |
Allaqband et al. | An efficient machine learning based CPU scheduler for heterogeneous multicore processors | |
Karimian-Aliabadi et al. | Scalable performance modeling and evaluation of MapReduce applications | |
Nascimento et al. | An incremental reinforcement learning scheduling strategy for data‐intensive scientific workflows in the cloud |