Tags: modin-project/modin
Tags
Modin 0.33.2 This patch release includes some bug fixes. Key Features and Updates Since 0.33.1 ------------------------------------- * Stability and Bugfixes * FIX-#5961: Preserve dtypes when inserting column to empty frame. (#7601) * FIX-#7551: Fix name ambiguity for `value_counts()` on Pandas backend (#7585) * FIX-#7595: Log backend switching information with the modin logger. (#7597) * Update testing suite * TEST-#7598: Allow xgboost to log to root. (#7599) * TEST-#7602: Fix test_pickle by correctly using fixtures. (#7603) * Uncategorized improvements Contributors ------------ @sfc-gh-vrpatel @sfc-gh-mvashishtha
Modin 0.33.1 This patch releases fixes a regression introduced in Modin 0.33.0. Key Features and Updates Since 0.33.0 ------------------------------------- * Stability and Bugfixes * FIX-#7582: Add copy parameter to __array__ methods. (#7584) Contributors ------------ @sfc-gh-mvashishtha
Modin 0.33.0 This release introduces a set of features for switching Modin execution between multiple backends (e.g. Ray and local Pandas) manually or automatically. It also includes several bug fixes. Key Features and Updates Since 0.32.0 ------------------------------------- * Stability and Bugfixes * FIX-#7327: Use sort parameter of DataFrame.stack (#7396) * FIX-#7346: Handle execution on Dask workers to avoid creating conflicting clients (#7347) * FIX-#7375: Fix Series.duplicated dropping name (#7395) * FIX-#7381: Fix Series binary operators ignoring fill_value (#7394) * FIX-#7383: Avoid broadcast issue in partition manager with custom NPartitions (#7399) * FIX-#7404: Implement interchange protocol for datetime columns (#7434) * FIX-#7405: Internally sort indices for loc/iloc set (#7440) * FIX-#7413: Always use positional index before computing argmin/argmax (#7463) * FIX-#7461: Set backend correctly with environment variables. (#7462) * FIX-#7465: Properly implement Series.rename_axis (#7466) * FIX-#7486: Add support for `.astype(pandas.CategoricalDtype(…))` (#7487) * FIX-#7490: Exclude move_to and _update_inplace from casting. (#7491) * FIX-#7495: Separate extensions for aliases. (#7496) * FIX-#7521: Fix wrong extension being used when backend is pinned (#7546) * FIX-#7528: Dispatch module-level extensions to the correct backend (#7529) * FIX-#7532: Display choices in error message of environment vars (#7533) * FIX-#7536: setuptools / ray version conflict in pkg_resources._vendor (#7537) * FIX-#7538: set_backend should exit early if there is nothing to do (#7539) * FIX-#7547: native qc move_to_me_cost does not work with non-subclasses (#7548) * FIX-#7553: Fix groupby when AutoSwitchBackend is disabled. (#7554) * FIX-#7555: Get the correct extension when AutoSwitchBackend is False. (#7556) * FIX-#7559: Create the dummy query compiler just once per backend. (#7560) * FIX-#7562: Raise AttributeError for missing extension properties. (#7563) * FIX-#7569: Fix handling of pyarrow dtype and empty dataframes (#7570) * FIX-#7576: Fix ambiguous AttributeError message (#7577) * FIX-#7578: Change groupby extension allow list and fix cached_property extensions (#7579) * Performance enhancements * PERF-#7397: Avoid materializing index/columns in shape checks (#7398) * Refactor Codebase * REFACTOR-#7315: Refactor axis checks in squeeze (#7400) * REFACTOR-#7418: Rename internal interchange protocol methods. (#7422) * REFACTOR-#7427: Require query compilers to expose engine and storage format. (#7430) * REFACTOR-#7470: Combine backend casting and extension code at the API layer. (#7485) * REFACTOR-#7493: Improve the clarity of the costing functions (#7494) * REFACTOR-#7527: Add more costing logic to the base query compiler. (#7530) * REFACTOR-#7534: Provide internal, overridable method for max_shape (#7535) * REFACTOR-#7564: Fix docstrings for transfer thresholds. (#7565) * Update testing suite * TEST-#7419: Fix a few errors in CI (#7420) * TEST-#7421: Fix unidist with APT-installed MPI (#7423) * TEST-#7431: Fix formatting for isort 6 and black 25 (#7432) * TEST-#7437: Check execution-filter outputs correctly in CI. (#7438) * TEST-#7441: Correctly skip sanity tests if we don't need them. (#7442) * TEST-#7457: Fix SSL certificate error in notebooks by using http. (#7458) * TEST-#7497: Skip tests requiring lxml on windows. (#7500) * TEST-#7571: xfail test_read_csv_s3_issue4658 due to missing s3 bucket (#7572) * Documentation improvements * DOCS-#7566: Add pandas on snowflake + backend pinning to documentation page (#7567) * New Features * FEAT-#7433: Replace NativeDataFrameMode with a complete "native" execution. (#7436) * FEAT-#7445: Add metrics interface so third-parties can collect metrics from the modin frontend (#7444) * FEAT-#7448: Allow QueryCompilerCaster to apply cost-optimization on automatic casting (#7464) * FEAT-#7455: Add Backend config variable as an alias for execution. (#7456) * FEAT-#7459: Add methods to get and set backend. (#7460) * FEAT-#7468: Add progress bar for engine switch (#7469) * FEAT-#7472: Add an option register dataframe and series accessors with a particular backend. (#7473) * FEAT-#7474: Register general functions with a particular backend. (#7489) * FEAT-#7475: Choose the correct __init__ method from extensions and apply casting to __init__. (#7488) * FEAT-#7477: Move the query compiler calculator so it can be used in more places (#7478) * FEAT-#7480: Implement max_cost interface (#7481) * FEAT-#7482: Add "from_qc" API to QueryCompiler and BackendCostCalculator to handle asymmetric information scenarios (#7483) * FEAT-#7492: Allow I/O function accessors. (#7502) * FEAT-#7505: Support post-operation automatic backend switch. (#7506) * FEAT-#7507: Support pre-operation automatic backend switch. (#7512) * FEAT-#7509: Add AutoSwitchBackend configuration variable (#7510) * FEAT-#7511: Support pre-operation switch for init by passing arguments to cost functions. (#7531) * FEAT-#7521: Support pinning objects to a backend (#7522) * FEAT-#7523: Improve formal definition of the automatic switching algorithm (#7524) * FEAT-#7540: Ability to configure NativeQueryCompiler AutoSwitch Settings (#7561) * FEAT-#7542: Support post-operation backend switch for groupby. (#7545) * FEAT-#7543: Let plugins register groupby accessors. (#7575) * FEAT-#7549: Emit metrics on auto-switch and casting behavior (#7550) * FEAT-#7557: Add operation and size information to backend switch progress (#7558) * FEAT-#7573: Dispatch __array_ufunc__ to query compilers (#7574) Contributors ------------ @CRiddler @YarShev @anmyachev @data-makerman @devin-petersohn @emmanuel-ferdman @mpeleshenko @noloerino @sfc-gh-dpetersohn @sfc-gh-jkew @sfc-gh-joshi @sfc-gh-mvashishtha
Modin 0.32.0 This release introduces support for Polars API, a new query compiler for small data, more functions that can use dynamic partitioning, as well as several bug fixes. Key Features and Updates Since 0.31.0 ------------------------------------- * Stability and Bugfixes * FIX-#0000: Fix type hint (#7343) * FIX-#7113: Fix docstring overrides for subclasses. (#7354) * FIX-#7134: Use a separate docstring class for BasePandasDataset. (#7353) * FIX-#7329: Do not sort columns on df.update (#7330) * FIX-#7351: Add ipython method calls to non-lookup list (#7352) * FIX-#7355: Cpu count would be set incorrectly on a cluster (#7356) * FIX-#7357: Fix `NoAttributeError` on `DataFrame.copy` (#7358) * FIX-#7371: Fix inserting datelike values into a DataFrame (#7372) * FIX-#7373: Try a previous version of `motoserver/moto` service, pin to 5.0.13 (#7374) * FIX-#7379: Fix __imul__ performing addition instead of multiplication (#7380) * FIX-#7387: Limit the number of pytest workers for tests with Ray engine on Windows (#7388) * FIX-#7389: Fix uploading artifacts (#7390) * Refactor Codebase * REFACTOR-#0000: Update copyright date (#7333) * Documentation improvements * DOCS-#0000: Update RunLLM Ask AI widget script path (#7345) * DOCS-#7335: Fix borken links in Modin Usage Examples page (#7336) * DOCS-#7382: Add documentation on how to use Modin Native query compiler (#7386) * New Features * FEAT-#4605: Add native query compiler (#7259) * FEAT-#7308: Interoperability between query compilers (#7376) * FEAT-#7331: Initial Polars API (#7332) * FEAT-#7337: Using dynamic partitionning in `broadcast_apply` (#7338) * FEAT-#7340: Add more granular lazy flags to query compiler (#7348) * FEAT-#7368: Add a new environment variable for using dynamic partitioning (#7369) Contributors ------------ @MortalHappiness @Retribution98 @YarShev @ZhipengXue97 @anmyachev @arunjose696 @devin-petersohn @likawind @sfc-gh-joshi @sfc-gh-mvashishtha
Modin 0.31.0 First release compatible with NumPy 2.0. Key Features and Updates Since 0.30.0 ------------------------------------- * Stability and Bugfixes * FIX-#7138: Stop reloading modules for custom docstrings. (#7307) * FIX-#7263: Empty docstrings should not be inherited (#7264) * FIX-#7272: Remove HDK engine (#7275) * FIX-#7277: Remove Cudf storage format as unmaintained (#7290) * FIX-#7278: Make sure `enable_logging` decorator preserve type hints (#7279) * FIX-#7292: Prepare Modin code to NumPy 2.0 (#7293) * FIX-#7295: Unpin numexpr to allow versions >= 2.8.4 to match pandas (#7296) * FIX-#7309: Update versioneer with `versioneer install --vendor` (#7311) * FIX-#7320: Bump the github-actions group with 3 updates (#7319) * FIX-#7321: Using 'C' engine instead of 'pyarrow' for getting metadata in 'read_csv' (#7322) * Performance enhancements * PERF-#7299: Avoid using `synchronize_labels` for `combine` function (#7300) * Refactor Codebase * REFACTOR-#7271: Remove `instance_type` attribute of axis partitions (#7268) * REFACTOR-#7273: Remove deprecated functions from utils.py, accessor.py and io.py (#7274) * REFACTOR-#7285: Remove deprecated configs (#7286) * REFACTOR-#7294: Reduce access of methods `_modin_frame` methods from `_query_compiler` (#7297) * REFACTOR-#7313: Add similar methods as in #7294 for operating on columns (#7314) * Update testing suite * TEST-#0000: Add a Dependabot config to auto-update GitHub action versions (#7318) * TEST-#7316: Run a subset of CI tests with python 3.10 and 3.11 on a scheduled basis (#7289) * Documentation improvements * DOCS-#0000: Adds RunLLM widget to docs (#7326) * DOCS-#7287: Update Modin on Dask documentation (#7288) * New Features * FEAT-#6574: UserWarning no longer displayed when Series/DataFrames are small (#7323) * FEAT-#7249: Add `reload_modin` feature (#7280) * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) * FEAT-#7283: Introduce MinRowPartitionSize and MinColumnPartitionSize (#7284) * FEAT-#7310: NumPy 2.0 support (#7312) Contributors ------------ @Jayson729 @Retribution98 @YarShev @anmyachev @arunjose696 @kurtmckee @sfc-gh-dpetersohn @vsreekanti
Modin 0.29.1 This release pins numpy<2. Key Features and Updates Since 0.29.0 ------------------------------------- * Stability and Bugfixes * FIX-#7302: Pin numpy<2 (072453b) * New Features * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) Contributors ------------ @anmyachev @sfc-gh-dpetersohn
Modin 0.28.3 This release pins numpy<2. Key Features and Updates Since 0.28.2 ------------------------------------- * Stability and Bugfixes * FIX-#7302: Pin numpy<2 (072453b) * New Features * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) Contributors ------------ @anmyachev @sfc-gh-dpetersohn
Modin 0.27.1 This release pins numpy<2. Key Features and Updates Since 0.27.0 ------------------------------------- * Stability and Bugfixes * FIX-#6968: Align API with pandas (#6969) * FIX-#7302: Pin numpy<2 (072453b) * New Features * FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262) Contributors ------------ @anmyachev @dchigarev @sfc-gh-dpetersohn
Modin 0.30.0 This release introduces support for DataFrame API standard, a distributed implementation for right merge/join, more efficient implementation of internal operators, which gives a performance boost to almost all distributed Modin functions, improved compatibility with pandas on pyarrow backend, type hints for pandas API to improve UX. Key Features and Updates Since 0.29.0 ------------------------------------- * Stability and Bugfixes * FIX-#0000: Fix badge in README.md (#7213) * FIX-#0000: Make merge tests more stable by sorting results (#7266) * FIX-#6967: Remove read_pickle_distributed/to_pickle_distributed functions as deprecated (#7258) * FIX-#7093: Make sure 'idxmax' and 'idxmin' can work with string columns (#7193) * FIX-#7102: Remove `enable_api_only` mode in modin logging (#7194) * FIX-#7103: Move lower-level functionality logging to debug (#7184) * FIX-#7143: Constructing a DataFrame from a Modin Series with tuple name should produce MultiIndex columns (#7214) * FIX-#7185: Add extra check for some config classes (#7189) * FIX-#7201: Update docs on how to enable Modin logs for high-level API and low-level API (#7209) * FIX-#7206: Make sure df.melt handle duplicate value_vars correctly (#7208) * FIX-#7219: Pin dataframe-api-compat>=0.2.7 (#7220) * FIX-#7221: Don't use 'use_legacy_dataset=False' for 'ParquetDataset' (#7222) * FIX-#7224: Importing modin.pandas.api.extensions overwrites re-export of pandas.api submodules (#7225) * FIX-#7233: Display property name in default_to_pandas error messages (#7269) * FIX-#7234: Deprecate HDK engine (#7235) * FIX-#7238: Fix docstring inheritance for `cached_property` and use it (#7239) * FIX-#7240: Allow `doc_checker.py` works with `functools.cached_property` (#7241) * FIX-#7246: Pin pyarrow>=10.0.1 as pandas 2.2.* does (#7247) * FIX-#7248: Make sure '_validate_dtypes_sum_prod_mean' works correctly with datetime types (#7237) * FIX-#7250: Revert "PERF-#6666: Avoid internal reset_index for left merge" (#7251) * Performance enhancements * PERF-#7227: Call 'modin_frame.combine()' for merge and join only when necessary (#7228) * PERF-#7230: Don't preserve bad partition for 'merge' (#7229) * Refactor Codebase * REFACTOR-#7242: Add type hints for `modin/core/dataframe/algebra/` (#7243) * REFACTOR-#7260: Use `extract_dtype` internal function in more places (#7261) * Update testing suite * TEST-#7049: Add some sanity tests with pyarrow-backed pandas dataframes (#7199) * TEST-#7191: Fix ASV after changing default branch (#7190) * Documentation improvements * DOCS-#0000: Fix a typo with MODIN_CPUS number (#7198) * DOCS-#0000: Supplement Optmization Notes with a link to configs (#7197) * DOCS-#7217: Update docs as to when Modin operators work best (#7218) * DOCS-#7255: Update docs as to from_* functions (#7256) * New Features * FEAT-#5394: Reduce amount of remote calls for Map operator (#7136) * FEAT-#5394: Reduce amount of remote calls for TreeReduce and GroupByReduce operators (#7245) * FEAT-#6492: Add `from_map` feature to create dataframe (#7215) * FEAT-#6498: Make Fold operator more flexible (#7257) * FEAT-#6808: Implement '__arrow_array__' for Series (#7200) * FEAT-#6890: Modin implementation of DataFrame API standard (#7216) * FEAT-#7139: Use ray-core instead of ray-default (#6955) * FEAT-#7187: Change "master" branch to "main" (#7188) * FEAT-#7202: Use custom resources for Ray (#7205) * FEAT-#7203: Make sure Modin works correctly with pandas, which uses pyarrow as a backend (#7204) * FEAT-#7207: Add the ability to assing a df to a columns selection without d2p (#7210) * FEAT-#7252: Add type hints for `base.py` (#7253) * FEAT-#7254: Support right merge/join (#7226) Contributors ------------ @Retribution98 @YarShev @anmyachev @arunjose696 @noloerino @sfc-gh-jkew
PreviousNext