Open
Description
Hello, I am testing the high level APIs on a V100 GPU (Summit) with a very simple benchmark. The input data is generated from random numbers between (0, 1). I got a few questions and it would be very helpful if you guys could shed some lights on them.
- I got ~0.1GB/s for the comp and decomp thoughput. I am not sure what would a typical thoughput for mgard but does this seem low?
- The API takes host/managed pointer. I guess the host-device copies (assumming comp/decomp happens on GPU) might lower the throughput. Is there a way to directly pass a device pointer and do all the work on GPU?
- With the ABS error bound, if I set the tolerance below 1.0e-4, the data will not be compressed but inflated, i.e., compression ratio < 1.0. May I ask what causes this? Are there any lower bounds for the tolerance?
Below is the test I am using. Thank you so much!
#include <vector>
#include <iostream>
#include <random>
#include <limits>
#include "mgard/compress_x.hpp"
const double eps = std::numeric_limits<double>::epsilon();
int main()
{
mgard_x::SIZE ni = 128;
mgard_x::SIZE nj = 128;
mgard_x::SIZE nk = 16;
mgard_x::SIZE nCell = ni * nj * nk;
std::vector<mgard_x::SIZE> shape({ni, nj, nk});
std::random_device rd;
std::default_random_engine eng(rd());
std::uniform_real_distribution<double> gen(0.0, 1.0);
double *arr_h = new double [nCell];
for (int i=0; i<nCell; ++i) arr_h[i] = gen(eng);
mgard_x::Config config;
config.dev_type = mgard_x::device_type::CUDA;
config.lossless = mgard_x::lossless_type::Huffman;
config.uniform_coord_mode = 1;
config.timing = true;
void* compArr = nullptr;
size_t compSz;
mgard_x::compress(3, mgard_x::data_type::Double, shape, 1.0e-6, 0.0,
mgard_x::error_bound_type::ABS, arr_h, compArr,
compSz, config, false);
double ratio = (double)(nCell*sizeof(double)) / compSz;
std::cout << "ratio " << ratio << "\n";
void* decompArr;
mgard_x::decompress(compArr, compSz, decompArr, config, false);
double maxabs = 0.0, avgabs = 0.0;
double maxrel = 0.0, avgrel = 0.0;
//double* output = decompArr;
for (int i=0; i<nCell; ++i) {
double err = fabs(arr_h[i] - ((double*)decompArr)[i]);
maxabs = std::max(err, maxabs);
avgabs += err;
maxrel = std::max(err/(fabs(arr_h[i])+eps), maxrel);
avgrel += err / (fabs(arr_h[i]) + eps);
}
avgabs /= nCell;
avgrel /= nCell;
std::cout << "max abs err " << maxabs << " avg abs err " << avgabs << "\n";
std::cout << "max rel err " << maxrel << " avg rel err " << avgrel << "\n";
delete [] arr_h;
return 0;
}