Releases · rapidsai/cudf

@bdice

🚨 Breaking Changes

Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
Refactor stream compaction APIs (#10370) @PointKernel
Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
Rewrites sample API (#10262) @isVoid
Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
Remove deprecated code (#10124) @vyasr
Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
Optimize compaction operations (#10030) @PointKernel
Remove deprecated method Series.set_index. (#9945) @bdice
Add cudf::strings::findall_record API (#9911) @davidwendt
Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar

🐛 Bug Fixes

Fix an issue with tdigest merge aggregations. (#10506) @nvdbaranec
Batch of fixes for index overflows in grid stride loops. (#10448) @nvdbaranec
Update dask_cudf imports to be compatible with latest dask (#10442) @rlratzel
Fix for integer overflow in contiguous-split (#10437) @jbrennan333
Fix has_null predicate for drop_list_duplicates on nested structs (#10436) @sperlingxx
Fix empty reduce with List output and non-List input (#10435) @sperlingxx
Fix list and struct meta generation issue in dask-cudf (#10434) @galipremsagar
Fix error in cudf.to_numeric when a bool input is passed (#10431) @galipremsagar
Support cupy array in quantile input (#10429) @galipremsagar
Fix benchmarks to work with new aggregation types (#10428) @davidwendt
Fix cudf::shift to handle offset greater than column size (#10414) @davidwendt
Fix lifespan of the temporary directory that holds cuFile configuration file (#10403) @vuule
Fix error thrown in compiled-binaryop benchmark (#10398) @davidwendt
Limiting async allocator using alignment of 512 (#10395) @rongou
Include <optional> in multibyte split. (#10385) @bdice
Fix issue with column and scalar re-assignment (#10377) @galipremsagar
Fix floating point data generation in benchmarks (#10372) @vuule
Avoid overflow in fused_concatenate_kernel output_index (#10344) @abellina
Remove is_relationally_comparable for table device views (#10342) @davidwendt
Fix debug compile error in device_span to column_view conversion (#10331) @davidwendt
Add Pascal support to JCUDF transcode (row_conversion) (#10329) @mythrocks
Fix std::bad_alloc exception due to JIT reserving a huge buffer (#10317) @ttnghia
Fixes up the overflowed fixed-point round on nullable column (#10316) @sperlingxx
Fix DataFrame slicing issues for empty cases (#10310) @brandon-b-miller
Fix documentation issues (#10307) @ajschmidt8
Allow Java bindings to use default decimal precisions when writing columns (#10276) @sperlingxx
Fix incorrect slicing of GDS read/write calls (#10274) @vuule
Fix out-of-memory error in compiled-binaryop benchmark (#10269) @davidwendt
Add tests of reflected ufuncs and fix behavior of logical reflected ufuncs (#10261) @vyasr
Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
Fix out-of-memory error in UrlDecode benchmark (#10258) @davidwendt
Fix groupby reductions that perform operations on source type instead of target type (#10250) @ttnghia
Fix small leak in explode (#10245) @revans2
Yet another small JNI memory leak (#10238) @revans2
Fix regex octal parsing to limit to 3 characters (#10233) @davidwendt
Fix string to decimal128 conversion handling large exponents (#10231) @davidwendt
Fix JNI leak on copy to device (#10229) @revans2
Fix the data generator element size for decimal types (#10225) @vuule
Fix decimal metadata in parquet writer (#10224) @galipremsagar
Fix strings handling of hex in regex pattern (#10220) @davidwendt
Fix docs builds (#10216) @ajschmidt8
Fix a leftover _has_nulls change from Nullate (#10211) @devavret
Fix bitmask of the output for JNI of lists::drop_list_duplicates (#10210) @ttnghia
Fix compile error in binaryop/compiled/util.cpp (#10209) @ttnghia
Skip ORC and Parquet readers' benchmark cases that are not currently supported (#10194) @vuule
Fix JNI leak of a cudf::column_view native class. (#10171) @revans2
Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
Convert Column Name to String Before Using Struct Column Factory (#10156) @isVoid
Preserve the correct ListDtype while creating an identical empty column (#10151) @galipremsagar
benchmark fixture - static object pointer fix (#10145) @karthikeyann
Fix UDF Caching (#10133) @brandon-b-miller
Raise duplicate column error in DataFrame.rename (#10120) @galipremsagar
Fix flaky memory usage test by guaranteeing array size. (#10114) @vyasr
Encode values from python callback for C++ (#10103) @jdye64
Add check for regex instructions causing an infinite-loop (#10095) @davidwendt
Remove metadata singleton from nvtext normalizer (#10090) @davidwendt
Column equality testing fixes (#10011) @brandon-b-miller
Pin libcudf runtime dependency for cudf / libcudf-kafka nightlies (#9847) @charlesbluca

📖 Documentation

Fix documentation for DataFrame.corr and Series.corr. (#10493) @bdice
Add cut to API docs (#10479) @shwina
Remove documentation for methods removed in #10124. (#10366) @bdice
Fix documentation issues (#10306) @ajschmidt8
Fix fixed_point binary operation documentation (#10198) @codereport
Remove cleaned up methods from docs (#10189) @galipremsagar
Update developer guide to recommend no default stream parameter. (#10136) @bdice
Update benchmarking guide to use NVBench. (#10093) @bdice

🚀 New Features

Add StringIO support to read_text (#10465) @cwharris
Add support for tdigest and merge_tdigest aggregations through cudf::reduce (#10433) @nvdbaranec
JNI support for Collect Ops in Reduction (#10427) @sperlingxx
Enable read_text with dask_cudf using byte_range (#10407) @ChrisJar
Add cudf::stable_sort_by_key (#10387) @PointKernel
Implement maps_column_view abstraction over LIST<STRUCT<K,V>> (#10380) @mythrocks
Support Java bindings for Avro reader (#10373) @HaoYang670
Refactor stream compaction APIs (#10370) @PointKernel
Support collect aggregations in reduction (#10353) @sperlingxx
Refactor array_ufunc for Index and unify across all classes (#10346) @vyasr
Add JNI for extract_list_element with index column (#10341) @firestarman
Support min and max operations for structs in rolling window (#10332) @ttnghia
Add device create_sequence_table for benchmarks (#10300) @karthikeyann
Enable numpy ufuncs for DataFrame (#10287) @vyasr
move input generation for json benchmark to device (#10281) @karthikeyann
move input generation for type dispatcher benchmark to device (#10280) @karthikeyann
move input generation for copy benchmark to device (#10279) @karthikeyann
generate url decode benchmark input in device (#10278) @karthikeyann
device input generation in join bench (#10277) @karthikeyann
Add nvtext::byte_pair_encoding API (#10270) @davidwendt
Prevent internal usage of expensive APIs (#10263) @vyasr
Column to JCUDF row for tables with strings (#10235) @hyperbolic2346
Support percent_rank() aggregation (#10227) @mythrocks
Refactor Series.array_ufunc (#10217) @vyasr
Reduce pytest runtime (#10203) @brandon-b-miller
Add regex flags parameter to python cudf strings split (#10185) @davidwendt
Support for MOD, PMOD and PYMOD for decimal32/64/128 (#10179) @codereport
Adding string row size iterator for row to column and column to row conversion (#10157) @hyperbolic2346
Add file size counter to cuIO benchmarks (#10154) @vuule
byte_range support for multibyte_split/read_text (#10150) @cwharris
Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
Add maxSplit parameter to Java binding for strings:split (#10137) @ttnghia
Add libcudf strings split API that accepts regex pattern (#10128) @davidwendt
generate benchmark input in device (#10109) @karthikeyann
Avoid nan_as_null op if nan_count is 0 (#10082) @galipremsagar
Add Dataframe and Index nunique (#10077) @martinfalisse
Support nanosecond timestamps in parquet (#10063) @PointKernel
Java bindings for mixed semi and anti joins (#10040) @jlowe
Implement mixed equality/conditional semi/anti joins (#10037) @vyasr
Optimize compaction operations (#10030) @PointKernel
Support args= in Series.apply (#9982) @brandon-b-miller
Add cudf::strings::findall_record API (#9911) @davidwendt
Add covariance for sort groupby (python) (#9889) @mayankanand007
Implement DataFrame diff() (#9817) @skirui-source
Implement DataFrame pct_change (#9805) @skirui-source
Support segmented reductions and null mask reductions (#9621) @isVoid
Add 'spearman' correlation method for dataframe.corr and series.corr (#7141) @dominicshanshan

🛠️ Improvements

Add scipy skip for a test (#10502) @galipremsagar
Temporarily disable new ops-bot functionality (#10496) @ajschmidt8
Include <cstddef> to fix compilation of parquet reader on GCC 11. (#10483) @bdice
Pin dask and distributed (#10481) @galipremsagar
MD5 refactoring. (#10445) @bdice
Remove or split up Frame methods that use the index (#10439) @vyasr
Centralization of tdigest aggregation code. (#10422) @nvdbaranec
Simplify column binary operations (#10421) @vyasr
Add .github/ops-bot.yaml config file (#10420) @ajschmidt8
Use list of columns for methods in Groupby.pyx (#10419) @isVoid
Remov...

@mythrocks

🚨 Breaking Changes

ORC writer API changes for granular statistics (#10058) @mythrocks
decimal128 Support for to/from_arrow (#9986) @codereport
Remove deprecated method one_hot_encoding (#9977) < 10000 a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/isVoid/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/isVoid">@isVoid
Remove str.subword_tokenize (#9968) @VibhuJawa
Remove deprecated method parameter from merge and join. (#9944) @bdice
Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
Remove deprecated method Series.hash_encode. (#9942) @bdice
Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
Break tie for top categorical columns in Series.describe (#9867) @isVoid
Add partitioning support in parquet writer (#9810) @devavret
Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
Change default dtype of all nulls column from float to object (#9803) @galipremsagar
Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
Add decimal128 support to Parquet reader and writer (#9765) @vuule
Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
Match pandas scalar result types in reductions (#9717) @brandon-b-miller
Add parameters to control row group size in Parquet writer (#9677) @vuule
Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
Add support for decimal128 in cudf python (#9533) @galipremsagar
Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

🐛 Bug Fixes

Add check for negative stripe index in ORC reader (#10074) @vuule
Update Java tests to expect DECIMAL128 from Arrow (#10073) @jlowe
Avoid index materialization when DataFrame is created with un-named Series objects (#10071) @galipremsagar
fix gcc 11 compilation errors (#10067) @rongou
Fix columns ordering issue in parquet reader (#10066) @galipremsagar
Fix dataframe setitem with ndarray types (#10056) @galipremsagar
Remove implicit copy due to conversion from cudf::size_type and size_t (#10045) @robertmaynard
Include <optional> in headers that use std::optional (#10044) @robertmaynard
Fix repr and concat of StructColumn (#10042) @galipremsagar
Include row group level stats when writing ORC files (#10041) @vuule
build.sh respects the --build_metrics and --incl_cache_stats flags (#10035) @robertmaynard
Fix memory leaks in JNI native code. (#10029) @mythrocks
Update JNI to use new arena mr constructor (#10027) @rongou
Fix null check when comparing structs in arg_min operation of reduction/groupby (#10026) @ttnghia
Wrap CI script shell variables in quotes to fix local testing. (#10018) @bdice
cudftestutil no longer propagates compiler flags to external users (#10017) @robertmaynard
Remove CUDA_DEVICE_CALLABLE macro usage (#10015) @hyperbolic2346
Add missing list filling header in meta.yaml (#10007) @devavret
Fix conda recipes for custreamz & cudf_kafka (#10003) @ajschmidt8
Fix matching regex word-boundary (\b) in strings replace (#9997) @davidwendt
Fix null check when comparing structs in min and max reduction/groupby operations (#9994) @ttnghia
Fix octal pattern matching in regex string (#9993) @davidwendt
decimal128 Support for to/from_arrow (#9986) @codereport
Fix groupby shift/diff/fill after selecting from a GroupBy (#9984) @shwina
Fix the overflow problem of decimal rescale (#9966) @sperlingxx
Use default value for decimal precision in parquet writer when not specified (#9963) @devavret
Fix cudf java build error. (#9958) @firestarman
Use gpuci_mamba_retry to install local artifacts. (#9951) @bdice
Fix regression HostColumnVectorCore requiring native libs (#9948) @jlowe
Rename aggregate_metadata in writer to fix name collision (#9938) @devavret
Fixed issue with percentile_approx where output tdigests could have uninitialized data at the end. (#9931) @nvdbaranec
Resolve racecheck errors in ORC kernels (#9916) @vuule
Fix the java build after parquet partitioning support (#9908) @revans2
Fix compilation of benchmark for parquet writer. (#9905) @bdice
Fix a memcheck error in ORC writer (#9896) @vuule
Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
Fix fallback to sort aggregation for grouping only hash aggregate (#9891) @abellina
Add zlib to cudfjni link when using static libcudf library dependency (#9890) @jlowe
TimedeltaIndex constructor raises an AttributeError. (#9884) @skirui-source
Fix cudf.Scalar string datetime construction (#9875) @brandon-b-miller
Load libcufile.so with RTLD_NODELETE flag (#9872) @vuule
Break tie for top categorical columns in Series.describe (#9867) @isVoid
Fix null handling for structs min and arg_min in groupby, groupby scan, reduction, and inclusive_scan (#9864) @ttnghia
Add one-level list encoding support in parquet reader (#9848) @PointKernel
Fix an out-of-bounds read in validity copying in contiguous_split. (#9842) @nvdbaranec
Fix join of MultiIndex to Index with one column and overlapping name. (#9830) @vyasr
Fix caching in Series.applymap (#9821) @brandon-b-miller
Enforce boolean ascending for dask-cudf sort_values (#9814) @charlesbluca
Fix ORC writer crash with empty input columns (#9808) @vuule
Change default dtype of all nulls column from float to object (#9803) @galipremsagar
Load native dependencies when Java ColumnView is loaded (#9800) @jlowe
Fix dtype-argument bug in dask_cudf read_csv (#9796) @rjzamora
Fix overflow for min calculation in strings::from_timestamps (#9793) @revans2
Fix memory error due to lambda return type deduction limitation (#9778) @karthikeyann
Revert regex $/EOL end-of-string new-line special case handling (#9774) @davidwendt
Fix missing streams (#9767) @karthikeyann
Fix make_empty_scalar_like on list_type (#9759) @sperlingxx
Update cmake and conda to 22.02 (#9746) @devavret
Fix out-of-bounds memory write in decimal128-to-string conversion (#9740) @davidwendt
Match pandas scalar result types in reductions (#9717) @brandon-b-miller
Fix regex non-multiline EOL/$ matching strings ending with a new-line (#9715) @davidwendt
Fixed build by adding more checks for int8, int16 (#9707) @razajafri
Fix null handling when boolean dtype is passed (#9691) @galipremsagar
Fix stream usage in segmented_gather() (#9679) @mythrocks

📖 Documentation

Update decimal dtypes related docs entries (#10072) @galipremsagar
Fix regex doc describing hexadecimal escape characters (#10009) @davidwendt
Fix cudf compilation instructions. (#9956) @esoha-nvidia
Fix see also links for IO APIs (#9895) @galipremsagar
Fix build instructions for libcudf doxygen (#9837) @davidwendt
Fix some doxygen warnings and add missing documentation (#9770) @karthikeyann
update cuda version in local build (#9736) @karthikeyann
Fix doxygen for enum types in libcudf (#9724) @davidwendt
Spell check fixes (#9682) @karthikeyann
Fix links in C++ Developer Guide. (#9675) @bdice

🚀 New Features

Remove libcudacxx patch needed for nvcc 11.4 (#10057) @robertmaynard
Allow CuPy 10 (#10048) @jakirkham
Add in support for NULL_LOGICAL_AND and NULL_LOGICAL_OR binops (#10016) @revans2
Add groupby.transform (only support for aggregations) (#10005) @shwina
Add partitioning support to Parquet chunked writer (#10000) @devavret
Add jni for sequences (#9972) @wbo4958
Java bindings for mixed left, inner, and full joins (#9941) @jlowe
Java bindings for JSON reader support (#9940) @wbo4958
Enable transpose for string columns in cudf python (#9937) @galipremsagar
Support structs for cudf::contains with column/scalar input (#9929) @ttnghia
Implement mixed equality/conditional joins (#9917) @vyasr
Add cudf::strings::extract_all API (#9909) @davidwendt
Implement JNI for cudf::scatter APIs (#9903) @ttnghia
JNI: Function to copy and set validity from bool column. (#9901) @mythrocks
Add dictionary support to cudf::copy_if_else (#9887) @davidwendt
add run_benchmarks target for running benchmarks with json output (#9879) @karthikeyann
Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
Add_suffix and add_prefix for DataFrames and Series (#9846) @mayankanand007
Add JNI for cudf::drop_duplicates (#9841) @ttnghia
Implement per-list sequence (#9839) @ttnghia
adding series.transpose (#9835) @mayankanand007
Adding support for Series.autocorr (#9833) @mayankanand007
Support round operation on datetime64 datatypes (#9820) @mayankanand007
Add partitioning support in parquet writer (#9810) @devavret
Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
Add decimal128 support to Parquet reader and writer (#9765) @vuule
Optimize groupby::scan (#9754) @PointKernel
Add sample JNI API (#9728) @res-life
Support min and max in inclusive scan for structs (#9725) @ttnghia
Add first and last method to IndexedFrame (#9710) @isVoid
Support min and max reduction for structs (#9697) @ttnghia
Add parameters to control row group size in Parquet writer (#9677) @vuule
Run compute-sanitizer in nightly build (#9641) @karthikeyann
Implement Series.datetime.floor (#9571) @skirui-source
ceil/floor for DatetimeIndex (#9554) @mayankanand007
Add support for decimal128 in cudf python (#9533) @galipremsagar
...

v21.12.02

v21.12.01

@PointKernel

🚨 Breaking Changes

Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
Remove sizeof and standardize on memory_usage (#9544) @vyasr
Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
Refactor sorting APIs (#9464) @vyasr
Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
JNI: Support nested types in ORC writer (#9334) @firestarman
Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
Various internal MultiIndex improvements (#9243) @vyasr

🐛 Bug Fixes

Fix read_parquet bug for bytes input (#9669) @rjzamora
Use _gather internal for sort_* (#9668) @isVoid
Fix behavior of equals for non-DataFrame Frames and add tests. (#9653) @vyasr
Dont recompute output size if it is already available (#9649) @abellina
Fix read_parquet bug for extended dtypes from remote storage (#9638) @rjzamora
add const when getting data from a JNI data wrapper (#9637) @wjxiz1992
Fix debrotli issue on CUDA 11.5 (#9632) @vuule
Use std::size_t when computing join output size (#9626) @jlowe
Fix usecols parameter handling in dask_cudf.read_csv (#9618) @galipremsagar
Add support for string 'nan', 'inf' & '-inf' values while type-casting to float (#9613) @galipremsagar
Avoid passing NativeFileDatasource to pyarrow in read_parquet (#9608) @rjzamora
Fix test failure with cuda 11.5 in row_bit_count tests. (#9581) @nvdbaranec
Correct _LIBCUDACXX_CUDACC_VER value computation (#9579) @robertmaynard
Increase max RLE stream size estimate to avoid potential overflows (#9568) @vuule
Fix edge case in tdigest scalar generation for groups containing all nulls. (#9551) @nvdbaranec
Fix pytests failing in cuda-11.5 environment (#9547) @galipremsagar
compile libnvcomp with PTDS if requested (#9540) @jbrennan333
Fix segmented_gather() for null LIST rows (#9537) @mythrocks
Deprecate DataFrame.label_encoding, use private _label_encoding method internally. (#9535) @bdice
Fix several test and benchmark issues related to bitmask allocations. (#9521) @nvdbaranec
Fix for inserting duplicates in groupby result cache (#9508) @karthikeyann
Fix mismatched types error in clip() when using non int64 numeric types (#9498) @davidwendt
Match conda pinnings for style checks (revert part of #9412, #9433). (#9490) @bdice
Make sure all dask-cudf supported aggs are handled in _tree_node_agg (#9487) @charlesbluca
Resolve hash_columns FutureWarning in dask_cudf (#9481) @pentschev
Add fixed point to AllTypes in libcudf unit tests (#9472) @karthikeyann
Fix regex handling of embedded null characters (#9470) @davidwendt
Fix memcheck error in copy-if-else (#9467) @davidwendt
Fix bug in dask_cudf.read_parquet for index=False (#9453) @rjzamora
Preserve the decimal scale when creating a default scalar (#9449) @revans2
Push down parent nulls when flattening nested columns. (#9443) @mythrocks
Fix memcheck error in gtest SegmentedGatherTest/GatherSliced (#9442) @davidwendt
Revert "Fix quantile division / partition handling for dask-cudf sort… (#9438) @charlesbluca
Allow int-like objects for the decimals argument in round (#9428) @shwina
Fix stream compaction's drop_duplicates API to use stable sort (#9417) @ttnghia
Skip Comparing Uniform Window Results in Var/std Tests (#9416) @isVoid
Fix StructColumn.to_pandas type handling issues (#9388) @galipremsagar
Correct issues in the build dir cudf-config.cmake (#9386) @robertmaynard
Fix Java table partition test to account for non-deterministic ordering (#9385) @jlowe
Fix timestamp truncation/overflow bugs in orc/parquet (#9382) @PointKernel
Fix the crash in stats code (#9368) @devavret
Make Series.hash_encode results reproducible. (#9366) @bdice
Fix libcudf compile warnings on debug 11.4 build (#9360) @davidwendt
Fail gracefully when compiling python UDFs that attempt to access columns with unsupported dtypes (#9359) @brandon-b-miller
Set pass_filenames: false in mypy pre-commit configuration. (#9349) @bdice
Fix cudf_assert in cudf::io::orc::gpu::gpuDecodeOrcColumnData (#9348) @davidwendt
Fix memcheck error in groupby-tdigest get_scalar_minmax (#9339) @davidwendt
Optimizations for cudf.concat when axis=1 (#9333) @galipremsagar
Use f-string in join helper warning message. (#9325) @bdice
Avoid casting to list or struct dtypes in dask_cudf.read_parquet (#9314) @rjzamora
Fix null count in statistics for parquet (#9303) @devavret
Potential overflow of decimal32 when casting to int64_t (#9287) @codereport
Fix quantile division / partition handling for dask-cudf sort on null dataframes (#9259) @charlesbluca
Updating cudf version also updates rapids cmake branch (#9249) @robertmaynard
Implement one_hot_encoding in libcudf and bind to python (#9229) @isVoid
BUG FIX: CSV Writer ignores the header parameter when no metadata is provided (#8740) @skirui-source

📖 Documentation

Update Documentation to use TYPED_TEST_SUITE (#9654) @codereport
Add dedicated page for StringHandling in python docs (#9624) @galipremsagar
Update docstring of DataFrame.merge (#9572) @galipremsagar
Use raw strings to avoid SyntaxErrors in parsed docstrings. (#9526) @bdice
Add example to docstrings in rolling.apply (#9522) @isVoid
Update help message to escape quotes in ./build.sh --cmake-args. (#9494) @bdice
Improve Python docstring formatting. (#9493) @bdice
Update table of I/O supported types (#9476) @vuule
Document invalid regex patterns as undefined behavior (#9473) @davidwendt
Miscellaneous documentation fixes to cudf (#9471) @galipremsagar
Fix many documentation errors in libcudf. (#9355) @karthikeyann
Fixing SubwordTokenizer docs issue (#9354) @mayankanand007
Improved deprecation warnings. (#9347) @bdice
doc reorder mr, stream to stream, mr (#9308) @karthikeyann
Deprecate method parameters to DataFrame.join, DataFrame.merge. (#9291) @bdice
Added deprecation warning for .label_encoding() (#9289) @mayankanand007

🚀 New Features

Enable Series.divide and DataFrame.divide (#9630) @vyasr
Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
Add handling of mixed numeric types in to_dlpack (#9585) @galipremsagar
Support re.Pattern object for pat arg in str.replace (#9573) @davidwendt
Add JNI for lists::drop_list_duplicates with keys-values input column (#9553) @ttnghia
Support structs column in min, max, argmin and argmax groupby aggregate() and scan() (#9545) @ttnghia
Move libcudacxx to use rapids_cpm and use newer versions (#9539) @robertmaynard
Add scan min/max support for chrono types to libcudf reduction-scan (not groupby scan) (#9518) @davidwendt
Support args= in apply (#9514) @brandon-b-miller
Add groupby scan min/max support for strings values (#9502) @davidwendt
Add list output option to character_ngrams() function (#9499) @davidwendt
More granular column selection in ORC reader (#9496) @vuule
add min_periods, ddof to groupby covariance, & correlation aggregation (#9492) @karthikeyann
Implement Series.datetime.floor (#9488) @skirui-source
Enable linting of CMake files using pre-commit (#9484) @vyasr
Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
Augment order_by to Accept a List of null_precedence (#9455) @isVoid
Add format API for list column of strings (#9454) @davidwendt
Enable Datetime/Timedelta dtypes in Masked UDFs (#9451) @brandon-b-miller
Add cudf python groupby.diff (#9446) @karthikeyann
Implement lists::stable_sort_lists for stable sorting of elements within each row of lists column (#9425) @ttnghia
add ctest memcheck using cuda-sanitizer (#9414) @karthikeyann
Support Unary Operations in Masked UDF (#9409) @isVoid
Move Several Series Function to Frame (#9394) @isVoid
MD5 Python hash API (#9390) @bdice
Add cudf strings is_title API (#9380) @davidwendt
Enable casting to int64, uint64, and double in AST code. (#9379) @vyasr
Add support for writing ORC with map columns (#9369) @vuule
extract_list_elements() with column_view indices (#9367) @mythrocks
Reimplement lists::drop_list_duplicates for keys-values lists columns (#9345) @ttnghia
Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
JNI: Support nested types in ORC writer (#9334) @firestarman
Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
Add shallow hash function and shallow equality comparison for column_view (#9312) @karthikeyann
Add CudaMemoryBuffer for cudaMalloc memory using RMM cuda_memory_resource (#9311) @rongou
Add parameters to control row index stride and stripe size in ORC writer (#9310) @vuule
Add na_position param to dask-cudf sort_values (#9264) @charlesbluca
Add ascending parameter for dask-cudf sort_values (#9250) @charlesbluca
New array conversion methods (#9236) @vyasr
Series apply method backed by masked UDFs (#9217) @brandon-b-miller
Grouping by frequency and resampling (#9178) @shwina
Pure-python masked UDFs (#9174) @brandon-b-miller
Add Covariance, Pearson correlation for sort groupby (libcudf) (#9154) @karthikeyann
Add calendrical_month_sequence in c++ and date_range in python (#8886) @shwina

🛠️ Improvements

Followup to PR 9088 comments (#9659) @cwharris
Update cuCollections to version that supports installed libcudacxx (#9633) @robertmaynard
Add 11.5 dev.yml to cudf (#9617) @galipremsagar
Add xfail for parquet reader 11.5 issue (#9612) @galipremsagar
remove deprecated Rmm.initialize method (#9607) @rongou
Use HostColumnVectorCore for ch...

v21.10.01

@vyasr

🚨 Breaking Changes

Remove Cython APIs for table view generation (#9199) @vyasr
Upgrade pandas version in cudf (#9147) @galipremsagar
Make AST operators nullable (#9096) @vyasr
Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
Support additional format specifiers in from_timestamps (#9047) @davidwendt
Expose expression base class publicly and simplify public AST API (#9045) @vyasr
Add support for struct type in ORC writer (#9025) @vuule
Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
Java bindings for conditional join output sizes (#9002) @jlowe
Move compute_column API out of ast namespace (#8957) @vyasr
cudf.dtype function (#8949) @shwina
Refactor Frame reductions (#8944) @vyasr
Add nested column selection to parquet reader (#8933) @devavret
JNI Aggregation Type Changes (#8919) @revans2
Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
Change cudf docs theme to pydata theme (#8746) @galipremsagar
Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
Make groupby transform-like op order match original data order (#8720) @isVoid

🐛 Bug Fixes

fixed_point cudf::groupby for mean aggregation (#9296) @codereport
Fix interleave_columns when the input string lists column having empty child column (#9292) @ttnghia
Update nvcomp to include fixes for installation of headers (#9276) @devavret
Fix Java column leak in testParquetWriteMap (#9271) @jlowe
Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby (#9263) @davidwendt
Fixing empty input to getMapValue crashing (#9262) @hyperbolic2346
Fix duplicate names issue in MultiIndex.deserialize (#9258) @galipremsagar
Dataframe.sort_index optimizations (#9238) @galipremsagar
Temporarily disabling problematic test in parquet writer (#9230) @devavret
Explicitly disable groupby on unsupported key types. (#9227) @mythrocks
Fix gather for sliced input structs column (#9218) @ttnghia
Fix JNI code for left semi and anti joins (#9207) @jlowe
Only install thrust when using a non 'system' version (#9206) @robertmaynard
Remove zlib from libcudf public CMake dependencies (#9204) @robertmaynard
Fix out-of-bounds memory read in orc gpuEncodeOrcColumnData (#9196) @davidwendt
Fix gather() for STRUCT inputs with no nulls in members. (#9194) @mythrocks
get_cucollections properly uses rapids_cpm_find (#9189) @robertmaynard
rapids-export correctly reference build code block and doc strings (#9186) @robertmaynard
Fix logic while parsing the sum statistic for numerical orc columns (#9183) @ayushdg
Add handling for nulls in dask_cudf.sorting.quantile_divisions (#9171) @charlesbluca
Approximate overflow detection in ORC statistics (#9163) @vuule
Use decimal precision metadata when reading from parquet files (#9162) @shwina
Fix variable name in Java build script (#9161) @jlowe
Import rapids-cmake modules using the correct cmake variable. (#9149) @robertmaynard
Fix conditional joins with empty left table (#9146) @vyasr
Fix joining on indexes with duplicate level names (#9137) @shwina
Fixes missing child column name in dtype while reading ORC file. (#9134) @rgsl888prabhu
Apply type metadata after column is slice-copied (#9131) @isVoid
Fix a bug: inner_join_size return zero if build table is empty (#9128) @PointKernel
Fix multi hive-partition parquet reading in dask-cudf (#9122) @rjzamora
Support null literals in expressions (#9117) @vyasr
Fix cudf::hash_join output size for struct joins (#9107) @jlowe
Import fix (#9104) @shwina
Fix cudf::strings::is_fixed_point checking of overflow for decimal32 (#9093) @davidwendt
Fix branch_stack calculation in row_bit_count() (#9076) @mythrocks
Fetch rapids-cmake to work around cuCollection cmake issue (#9075) @jlowe
Fix compilation errors in groupby benchmarks. (#9072) @nvdbaranec
Preserve float16 upscaling (#9069) @galipremsagar
Fix memcheck read error in libcudf contiguous_split (#9067) @davidwendt
Add support for reading ORC file with no row group index (#9060) @rgsl888prabhu
Various multiindex related fixes (#9036) @shwina
Avoid rebuilding cython in build.sh (#9034) @brandon-b-miller
Add support for percentile dispatch in dask_cudf (#9031) @galipremsagar
cudf resolve nvcc 11.0 compiler crashes during codegen (#9028) @robertmaynard
Fetch correct grouping keys agg of dask groupby (#9022) @galipremsagar
Allow where() to work with a Series and other=cudf.NA (#9019) @sarahyurick
Use correct index when returning Series from GroupBy.apply() (#9016) @charlesbluca
Fix Dataframe indexer setitem when array is passed (#9006) @galipremsagar
Fix ORC reading of files with struct columns that have null values (#9005) @vuule
Ensure JNI native libraries load when CompiledExpression loads (#8997) @jlowe
Fix memory read error in get_dremel_data in page_enc.cu (#8995) @davidwendt
Fix memory write error in get_list_child_to_list_row_mapping utility (#8994) @davidwendt
Fix debug compile error for csv_test.cpp (#8981) @davidwendt
Fix memory read/write error in concatenate_lists_ignore_null (#8978) @davidwendt
Fix concatenation of cudf.RangeIndex (#8970) @galipremsagar
Java conditional joins should not require matching column counts (#8955) @jlowe
Fix concatenate empty structs (#8947) @sperlingxx
Fix cuda-memcheck errors for some libcudf functions (#8941) @davidwendt
Apply series name to result of SeriesGroupby.apply() (#8939) @charlesbluca
cdef packed_columns as cppclass instead of struct (#8936) @charlesbluca
Inserting a cudf.NA into a DataFrame (#8923) @sarahyurick
Support casting with Pandas dtype aliases (#8920) @sarahyurick
Allow sort_values to accept same kind values as Pandas (#8912) @sarahyurick
Enable casting to pandas nullable dtypes (#8889) @brandon-b-miller
Fix libcudf memory errors (#8884) @karthikeyann
Throw KeyError when accessing field from struct with nonexistent key (#8880) @NV-jpt
replace auto with auto& ref for cast<&> (#8866) @karthikeyann
Add missing include<optional> in binops (#8864) @karthikeyann
Fix select_dtypes to work when non-class dtypes present in dataframe (#8849) @sarahyurick
Re-enable JSON tests (#8843) @vuule
Support header with embedded delimiter in csv writer (#8798) @davidwendt

📖 Documentation

Add IO docs page in cudf documentation (#9145) @galipremsagar
use correct namespace in cuio code examples (#9037) @cwharris
Restructuring Contributing doc (#9026) @iskode
Update stable version in readme (#9008) @galipremsagar
Add spans and more include guidelines to libcudf developer guide (#8931) @harrism
Update Java build instructions to mention Arrow S3 and Docker (#8867) @jlowe
List GDS-enabled formats in the docs (#8805) @vuule
Change cudf docs theme to pydata theme (#8746) @galipremsagar

🚀 New Features

Revert "Add shallow hash function and shallow equality comparison for column_view (#9185)" (#9283) @karthikeyann
Align DataFrame.apply signature with pandas (#9275) @brandon-b-miller
Add struct type support for drop_list_duplicates (#9202) @ttnghia
support CUDA async memory resource in JNI (#9201) @rongou
Add shallow hash function and shallow equality comparison for column_view (#9185) @karthikeyann
Superimpose null masks for STRUCT columns. (#9144) @mythrocks
Implemented bindings for ceil timestamp operation (#9141) @shaneding
Adding MAP type support for ORC Reader (#9132) @rgsl888prabhu
Implement interleave_columns for lists with arbitrary nested type (#9130) @ttnghia
Add python bindings to fixed-size window and groupby rolling.var, rolling.std (#9097) @isVoid
Make AST operators nullable (#9096) @vyasr
Java bindings for approx_percentile (#9094) @andygrove
Add dseries.struct.explode (#9086) @isVoid
Add support for BaseIndexer in Rolling APIs (#9085) @galipremsagar
Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
Add handling for nested dicts in dask-cudf groupby (#9054) @charlesbluca
Added Series.dt.is_quarter_start and Series.dt.is_quarter_end (#9046) @TravisHester
Support nested types for nth_element reduction (#9043) @sperlingxx
Update sort groupby to use non-atomic operation (#9035) @karthikeyann
Add support for struct type in ORC writer (#9025) @vuule
Implement interleave_columns for structs columns (#9012) @ttnghia
Add groupby first and last aggregations (#9004) @shwina
Add DecimalBaseColumn and move as_decimal_column (#9001) @isVoid
Python/Cython bindings for multibyte_split (#8998) @jdye64
Support scalar months in add_calendrical_months, extends API to INT32 support (#8991) @isVoid
Added Series.dt.is_month_end (#8989) @TravisHester
Support for using tdigests to compute approximate percentiles. (#8983) @nvdbaranec
Support "unflatten" of columns flattened via flatten_nested_columns(): (#8956) @mythrocks
Implement timestamp ceil (#8942) @shaneding
Add nested column selection to parquet reader (#8933) @devavret
Expose conditional join size calculation (#8928) @vyasr
Support Nulls in Timeseries Generator (#8925) @isVoid
Avoid index equality check in _CPackedColumns.from_py_table() (#8917) @charlesbluca
Add dot product binary op (#8909) @charlesbluca
Expose days_in_month function in libcudf and add python bindings (#8892) @isVoid
Series string repeat (#8882) @sarahyurick
Python binding for quarters (#8862) @shaneding
Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
Add Java bindings for AST ...

v21.08.03

v21.08.02

v21.08.01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

Uh oh!

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

Uh oh!

Uh oh!

Uh oh!

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

Uh oh!

Uh oh!

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Releases: rapidsai/cudf

v22.04.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

Uh oh!

v22.02.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

Uh oh!

v21.12.02

Uh oh!

v21.12.01

Uh oh!

v21.12.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

Contributors

Uh oh!

v21.10.01

Uh oh!

v21.10.00

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

Contributors

Uh oh!

v21.08.03

Uh oh!

v21.08.02

Uh oh!

v21.08.01

Uh oh!