10000 Merge130 by carlopi · Pull Request #17833 · duckdb/duckdb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Merge130 #17833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jun 6, 2025
Merged

Merge130 #17833

merged 27 commits into from
Jun 6, 2025

Conversation

carlopi
Copy link
Contributor
@carlopi carlopi commented Jun 6, 2025

More merging in main, with the twist that I did not see the proper merge conflict raised at #17806 (comment).

(@evertlammerts)

This also includes #17831

pdet and others added 27 commits June 3, 2025 16:16
This PR adds support for file rotation (setting `file_size_bytes` or
`row_groups_per_file`) together with the `write_empty_file false` flag.
This PR fixes duckdb#17759. The original parquet file is too large, and I was
unable to reproduce this issue when I reduced the file size, so I did
not add the test file.
* Just do signed compares to see if this fixes the Win32 test failure
Step further bringing github caching back to be functional, that should
translate to faster evaluation cycles on PRs.

Problem is that currently for the 3 set of linux extensions, that are a
bulk of CI time, cache items are added on every PR, making so that cache
items from base branches will get evicted, and means less effective
caching.

Basic is as follows:
* PR can access cache items from any predecessor. Cache items produced
by PRs can only be reused by the exact same PR
* Base branches (say `v1.3-ossivalis` or `main`) can access cache only
from other base branches, but their cache items can be used by anyone.
* When total cache size grows past 10 GB, GitHub will evict older items
(that are likely to be the base branches one)

Current situation that happens somewhat frequently is that PR pollute
the cache, keep invalidating it, eventually removing the only valuable
items in there. This PR aims at producing less items in the global
cache.
Backport the changes in duckdb#17776 so
we build with GCC 12 explicitly in the next bug fix release, instead of
only in 1.4.
…rsion. (duckdb#17791)

This PR introduces a new option, the `arrow_output_version`, which
allows us to specify a format version to which we should output our data
to. We default to V1.0.

This new variable kind of makes the `produce_arrow_string_view`
`arrow_output_list_view` useless, but I guess that removing them might
be slightly controversial, since it might break ppl's scripts. Hence I
went with inter-op.


https://arrow.apache.org/docs/format/Versioning.html#post-1-0-0-format-versions

Fix: duckdblabs/duckdb-internal#5058
* Just do signed compares to fix Win32 test failure
…b#17808)

Similar to duckdb#17807, this adds
locale-independent handling for isspace.

I have to say I don't remember exactly if I manage to reproduce a
problem with this or just looked right to have.

Feedback on the implementation welcome.
…ead of requiring it to be set globally (duckdb#17817)

This allows the column mapping mode to be modified on a per-file basis
by the MultiFileReader
This reverts commit 3a25808.
@Mytherin Mytherin merged commit 4d7cb70 into duckdb:main Jun 6, 2025
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0