Li et al., 2018 - Google Patents
Near-optimal straggler mitigation for distributed gradient methodsLi et al., 2018
View PDF- Document ID
- 12520932557542914524
- Author
- Li S
- Kalan S
- Avestimehr A
- Soltanolkotabi M
- Publication year
- Publication venue
- 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
External Links
Snippet
Modern learning algorithms use gradient descent updates to train inferential models that best explain data. Scaling these approaches to massive data sizes requires proper distributed gradient descent schemes where distributed worker nodes compute partial …
- 230000000116 mitigating 0 title abstract description 10
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Programme initiating; Programme switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for programme control, e.g. control unit
- G06F9/06—Arrangements for programme control, e.g. control unit using stored programme, i.e. using internal store of processing equipment to receive and retain programme
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30286—Information retrieval; Database structures therefor; File system structures therefor in structured data stores
- G06F17/30386—Retrieval requests
- G06F17/30424—Query processing
- G06F17/30533—Other types of queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/30—Information retrieval; Database structures therefor; File system structures therefor
- G06F17/30943—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type
- G06F17/30946—Information retrieval; Database structures therefor; File system structures therefor details of database functions independent of the retrieved data type indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
- G06Q10/063—Operations research or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation, e.g. linear programming, "travelling salesman problem" or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Near-optimal straggler mitigation for distributed gradient methods | |
Reisizadeh et al. | Coded computation over heterogeneous clusters | |
Bitar et al. | Stochastic gradient coding for straggler mitigation in distributed learning | |
Li et al. | Coding for distributed fog computing | |
Li et al. | Polynomially coded regression: Optimal straggler mitigation via data encoding | |
Lee et al. | Coded computation for multicore setups | |
Li et al. | A predictive scheduling framework for fast and distributed stream data processing | |
Zhang et al. | LAGC: Lazily aggregated gradient coding for straggler-tolerant and communication-efficient distributed learning | |
Sun et al. | Heterogeneous coded computation across heterogeneous workers | |
Peng et al. | Diversity/parallelism trade-off in distributed systems with redundancy | |
Hanna et al. | Adaptive distributed stochastic gradient descent for minimizing delay in the presence of stragglers | |
CN103942108A (en) | Resource parameter optimization method under Hadoop homogenous cluster | |
Mallick et al. | Rateless codes for near-perfect load balancing in distributed matrix-vector multiplication | |
Kalyaev et al. | Method of multiagent scheduling of resources in cloud computing environments | |
Zhang et al. | Coded computation over heterogeneous workers with random task arrivals | |
Woolsey et al. | A practical algorithm design and evaluation for heterogeneous elastic computing with stragglers | |
Konstantinidis et al. | Camr: Coded aggregated mapreduce | |
Ho et al. | Kylin: An efficient and scalable graph data processing system | |
Gu et al. | Maximizing workflow throughput for streaming applications in distributed environments | |
Rehab et al. | Scalable massively parallel learning of multiple linear regression algorithm with MapReduce | |
CN104933110A (en) | MapReduce-based data pre-fetching method | |
Ji et al. | A new design framework for heterogeneous uncoded storage elastic computing | |
Barika et al. | Adaptive scheduling for efficient execution of dynamic stream workflows | |
Yuan et al. | Fairness-aware scheduling algorithm for multiple DAGs based on task replication | |
Saraswathy et al. | A Hybrid Strategy Integrating Ant Colony Optimization and Locust-Inspired Algorithm (HACO-LA) to Boost Efficiency and Performance in cloud Resource Management |