Author: Li, Xiaoye S : Search

Applied Filters

People

Publications

Conferences

Reproducibility Badges

Publication Date

18 Results for: Author: Li, Xiaoye SEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,833,026 records)|Limit your search to The ACM Full-Text Collection (773,090 records)

Showing 1 - 18of18 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
January 2025
A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 39, Issue 1Pages 18–31https://doi.org/10.1177/10943420241288567

We present the GPU implementation efforts and challenges of the sparse solver package STRUMPACK. The code is made publicly available on github with a permissive BSD license. STRUMPACK implements an approximate multifrontal solver, a sparse LU ...
0
Metrics
Total Citations0
research-article
November 2024
Non-smooth Bayesian optimization in tuning scientific applications
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 38, Issue 6Pages 633–657https://doi.org/10.1177/10943420241278981

Tuning algorithmic parameters to optimize the performance of large, complicated computational codes is an important problem involving finding the optima and identifying regimes defined by non-smooth boundaries in black-box functions. Within the Bayesian ...
0
Metrics
Total Citations0
research-article
November 2024
Batched sparse direct solver design and evaluation in SuperLU_DIST
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 38, Issue 6Pages 585–598https://doi.org/10.1177/10943420241268200

Over the course of interactions with various application teams, the need for batched sparse linear algebra functions has emerged in order to make more efficient use of the GPUs for many small and sparse linear algebra problems. In this paper, we present ...
0
Metrics
Total Citations0
research-article
Open Access
April 2024
Then and Now: Improving Software Portability, Productivity, and 100× Performance
Computing in Science and Engineering (IEEECS_CISE-NEW), Volume 26, Issue 1Pages 61–70https://doi.org/10.1109/MCSE.2024.3387302
The U.S. Exascale Computing Project (ECP) has succeeded in preparing applications to run efficiently on the first reported exascale supercomputers in the world. To achieve this, it modernized the whole leadership software stack, from libraries to ...
0
Metrics
Total Citations0
research-article
March 2022
Resiliency in numerical algorithm design for extreme scale simulations
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 36, Issue 2Pages 251–285https://doi.org/10.1177/10943420211055188

This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high ...
0
Metrics
Total Citations0
research-article
Public Access
November 2021
Artifacts Available / v1.1
Artifacts Evaluated & Functional / v1.1
Dr. Top-k: delegate-centric Top-k on GPUs
SC '21: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 39, Pages 1–14https://doi.org/10.1145/3458817.3476141

Recent top-k computation efforts explore the possibility of revising various sorting algorithms to answer top-k queries on GPUs. These endeavors, unfortunately, perform significantly more work than needed. This paper introduces Dr. Top-k, a <u>D</u>...
0
597
Metrics
Total Citations0
Total Downloads597
Last 12 Months222
Last 6 weeks36
1
Supplementary Material
Dr. Top-k_ Delegate-Centric Top-k Computation on GPUs.mp4
View online with eReader
PDF
research-article
July 2021
A survey of numerical linear algebra methods utilizing mixed-precision arithmetic
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 35, Issue 4Pages 344–369https://doi.org/10.1177/10943420211003313

The efficient utilization of mixed-precision numerical linear algebra algorithms can offer attractive acceleration to scientific computing applications. Especially with the hardware integration of low-precision special-function units designed for machine ...
25
Metrics
Total Citations25
research-article
November 2020
C-SAW: a framework for graph sampling and random walk on GPUs
SC '20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and AnalysisArticle No.: 56, Pages 1–15

Many applications require to learn, mine, analyze and visualize large-scale graphs. These graphs are often too large to be addressed efficiently using conventional graph processing technologies. Fortunately, recent research efforts find out graph ...
1
162
Metrics
Total Citations1
Total Downloads162
Last 12 Months15
Last 6 weeks2
Get Access
research-article
July 2020
A parallel hierarchical blocked adaptive cross approximation algorithm
International Journal of High Performance Computing Applications (SAGE-HPCA), Volume 34, Issue 4Pages 394–408https://doi.org/10.1177/1094342020918305

This article presents a low-rank decomposition algorithm based on subsampling of matrix entries. The proposed algorithm first computes rank-revealing decompositions of submatrices with a blocked adaptive cross approximation (BACA) algorithm, and then ...
1
Metrics
Total Citations1
research-article
January 2020
A Distributed-Memory Algorithm for Computing a Heavy-Weight Perfect Matching on Bipartite Graphs
SIAM Journal on Scientific Computing (SISC), Volume 42, Issue 4Pages C143–C168https://doi.org/10.1137/18M1189348

We design and implement an efficient parallel algorithm for finding a perfect matching in a weighted bipartite graph such that weights on the edges of the matching are large. This problem differs from the maximum weight matching problem, for which scalable ...
10
Metrics
Total Citations10
research-article
September 2019
A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems
Journal of Parallel and Distributed Computing (JPDC), Volume 131, Issue CPages 218–234https://doi.org/10.1016/j.jpdc.2019.03.004
Abstract
We propose a new algorithm to improve the strong scalability of right-looking sparse LU factorization on distributed memory systems. Our 3D algorithm for sparse LU uses a three-dimensional MPI process grid, exploits elimination tree ...
Highlights

The 3D sparse LU factorization algorithm reduces the asymptotic communication complexity by ( log n ) for planar sparse matrices of dimension n and by a ...
6
Metrics
Total Citations6
research-article
December 2018
Consensus Ensemble System for Traffic Flow Prediction
IEEE Transactions on Intelligent Transportation Systems (ITS-TRANSACTIONS), Volume 19, Issue 12Pages 3903–3914https://doi.org/10.1109/TITS.2018.2791505
Traffic flow prediction is a key component of an intelligent transportation system. Accurate traffic flow prediction provides a foundation for other tasks, such as signal coordination and travel time forecasting. There are many known methods in literature ...
8
Metrics
Total Citations8
research-article
November 2018
A Unified Software Framework to Enable Solution of Traffic Assignment Problems at Extreme Scale
2018 21st International Conference on Intelligent Transportation Systems (ITSC)Pages 3917–3922https://doi.org/10.1109/ITSC.2018.8569991
We describe a modular software framework for solving user equilibrium traffic assignment problems. The design is based on the formulation of the problem as a variational inequality. Unlike most existing traffic assignment software which focus on specific ...
0
Metrics
Total Citations0
research-article
November 2018
Efficient Online Hyperparameter Learning for Traffic Flow Prediction
2018 21st International Conference on Intelligent Transportation Systems (ITSC)Pages 164–169https://doi.org/10.1109/ITSC.2018.8569972
Compute efficiency is an important consideration for traffic flow prediction models. Machine learning algorithms adjust model parameters automatically based on the data, but often require users to set additional parameters, known as hyperparameters. ...
0
Metrics
Total Citations0
Article
September 2014
High-Performance Inverse Modeling with Reverse Monte Carlo Simulations
BRACIS '14: Proceedings of the 2014 Brazilian Conference on Intelligent SystemsPages 201–210https://doi.org/10.1109/ICPP.2014.29

In the field of nanoparticle material science, X-ray scattering techniques are widely used for characterization of macromolecules and particle systems (ordered, partially-ordered or custom) based on their structural properties at the micro- and nano-...
0
Metrics
Total Citations0
Article
October 2014
Tuning HipGISAXS on Multi and Many Core Supercomputers
High Performance Computing Systems. Performance Modeling, Benchmarking and SimulationPages 217–238https://doi.org/10.1007/978-3-319-10214-6_11
Abstract
With the continual development of multi and many-core architectures, there is a constant need for architecture-specific tuning of application-codes in order to realize high computational performance and energy efficiency, closer to the theoretical ...
0
Metrics
Total Citations0
Article
April 2002
Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines
IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing SymposiumPage 203

The increasing gap between processor and memory performance has led to new architectural models for memory-intensive applications. In this paper, we use a set of memory-intensive benchmarks to evaluate a mixed logic and DRAM processor called VIRAM as a ...
9
Metrics
Total Citations9
Article
May 2000
Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems
IPDPS '00: Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed ProcessingPages 497–503

Computer sim ulationsof realistic applications usually require solving a set of non-linear partial differential equations (PDEs) over a finite region. The process of obtaining numerical solutions to the governing PDEs involves solving large sparse ...
3
Metrics
Total Citations3

Search Results

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Supplemental Material Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Caption

A graphics processing unit accelerated sparse direct solver and preconditioner with block low rank compression

Non-smooth Bayesian optimization in tuning scientific applications

Batched sparse direct solver design and evaluation in SuperLU_DIST

Then and Now: Improving Software Portability, Productivity, and 100× Performance

Resiliency in numerical algorithm design for extreme scale simulations

Dr. Top-k: delegate-centric Top-k on GPUs

A survey of numerical linear algebra methods utilizing mixed-precision arithmetic

C-SAW: a framework for graph sampling and random walk on GPUs

A parallel hierarchical blocked adaptive cross approximation algorithm

A Distributed-Memory Algorithm for Computing a Heavy-Weight Perfect Matching on Bipartite Graphs

A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems

Consensus Ensemble System for Traffic Flow Prediction

A Unified Software Framework to Enable Solution of Traffic Assignment Problems at Extreme Scale

Efficient Online Hyperparameter Learning for Traffic Flow Prediction

High-Performance Inverse Modeling with Reverse Monte Carlo Simulations

Tuning HipGISAXS on Multi and Many Core Supercomputers

Memory-Intensive Benchmarks: IRAM vs. Cache-Based Machines

Ordering Unstructured Meshes for Sparse Matrix Computations on Leading Parallel Systems