v0.4.0

Important: This update includes a major change which may alter the reproducibility of some old pipelines - especially if split_by() was used on columns of type double. Take care to use versions prior to this release when re-running old code with LexOPS.

Update to split_by():

Simplified numeric splits in split_by(). This includes removing the use of the cut() method, and using the same method for double and integer types. The old method may have produced some unexpected behaviour when splitting by columns stored as double if the levels overlapped. See issue #6 for more details.

Another change with this new method is that, while splits can still be specified out of order (e.g., 4:5 ~ 1:3), the specified order is now preserved, whereas before an attempt was made to sort them. This means that A1 will now be 4:5, and A2 will be 1:3, whereas previous versions would have forced A1 to be the lower level of 1:3, and A2 to be the higher level of 4:5.

Other Major updates:

Related to the change above, numeric splits can no longer be overlapping at all (e.g., 1:2 ~ 2:3 used to be acceptable, but will now produce an error, as it is unclear to which group a value of 2 would belong).
Added equal_size argument to split_random(). Setting equal_size=TRUE will ensure that the split has equally (or as close to equal as possible) sized groups. This option will typically enable more candidate matches. This option was added in response to issue #4.
The generate() function checks that the id_col uniquely identifies items, and gives an error if this is not the case. This avoids duplicate IDs causing incorrect matching. Addresses issue #5.
run_shiny() now checks for stringdist package and will generate code to install if missing.

Minor Updates:

Updated to base R pipe, |>, in examples.
Unnecessary dependencies (vwr, plyr) have been removed from the shiny app.
All S3 methods now exported (previously only print.LexOPS_pipeline was exported).

Updates to Tests:

Removed deprecated testthat argument.
Now tests the equal_size argument of split_random().
Now tests that duplicates in id_col gives an error.
Ensured that variables that undergo scale() in tests are stored explicitly as numeric vectors. This addressed a deprecation warning from dplyr::filter() about 1-column matrices that was produced from one test.
Removed overlapping levels from all tests.
Added tests for split_by() errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.4.0

Uh oh!