Adding WITH ORDINALITY to DuckDB #16581

niykko · 2025-03-10T14:10:55Z

This is a continuation of #9014

I am resuming work on this function. If I understand correctly, you still want me to implement this feature by internally producing a ROW_NUMBER() window function instead of producing the column manually.

My first step is to take a look again at how these operators work and then try to find a solution.

…arting to lose my mind over here)

Mytherin · 2025-03-11T09:03:32Z

Thanks for the PR!

I think there are two distinct steps that need to happen to have WITH ORDINALITY work:

The scanner table functions (e.g. read_parquet, etc) that do not support lateral joins can support WITH ORDINALITY by pushing a ROW_NUMBER() OVER ()
The table in-out functions that do support lateral joins need to explicitly handle WITH ORDINALITY, and compute the ordinality column during execution (ideally in PhysicalTableInOutFunction).

Ideally this is generated during the bind phase already - i.e. when binding a scanner table function that has ordinality specified, the LogicalWindow with row_number() over () is generated. The optimizer can then (potentially) remove this again or do something more clever with this.

…ess with, i discarded these changes)

niykko · 2025-03-13T11:41:45Z

I'd like to note that when using any version higher than 24.1.0 for the python module black, the following files get changed when running "make format-fix":

modified: scripts/regression/benchmark.py
modified: tools/pythonpkg/tests/fast/numpy/test_numpy_new_path.py
modified: tools/pythonpkg/tests/fast/udf/test_null_filtering.py
modified: tools/pythonpkg/tests/fast/udf/test_scalar.py

Just so you know, it seems like this package has changed the formatting rules. It now inserts blank lines after certain comments.

…al get is not informed anymore & correct the tests (typos)

niykko · 2025-06-03T12:11:58Z

Hey @Mytherin ,
this PR is now ready for review. Please let me know what you think 😃
Please also note that the failing unittest ('Test deserialized plans from file') also fails on the production branch, so I assume my additions are not what causes this error.

…oaded

…ion correct order

…oaded

niykko · 2025-06-06T12:48:28Z

In regards to my previous comment: now it is really ready for review 😅
The serialization unittest is not broken anymore and the thread sanitizer has been silenced since we use proper mutex now.

Please let me know your thoughts 🙏

Mytherin

Thanks for the PR! Great that this is being picked up again. Looks good - some comments below.

It would be great to get some more tests going as well:

WITH ORDINALITY on larger tables, e.g. you could use the TPC-H data generator and then create tests with those
WITH ORDINALITY used in a view definition
WITH ORDINALITY on a view (should this even work?)
WITH ORDINALITY on a CTE
WITH ORDINALITY on a recursive CTE
WITH ORDINALITY used within a recursive CTE
WITH ORDINALITY where the table already contains a column called "ordinality"
Filters on the ORDINALITY column
Projecting out all columns except for the ORDINALITY column

Mytherin · 2025-06-11T13:55:59Z

src/include/duckdb/common/enums/ordinality_request_type.hpp

+
+namespace duckdb {
+
+enum class Ordinality_request_t : uint8_t { NOT_REQUESTED =
9E88
 0, REQUESTED = 1 };


Can we use consistent naming with the rest of the code base, i.e. call this OrdinalityType

And instead of NOT_REQUESTED - maybe use WITH_ORDINALITY and WITHOUT_ORDINALITY, i.e.:

OrdinalityType::WITH_ORDINALITY

Mytherin · 2025-06-11T13:56:07Z

src/include/duckdb/common/enums/ordinality_request_type.hpp

+
+enum class Ordinality_request_t : uint8_t { NOT_REQUESTED = 0, REQUESTED = 1 };
+
+struct ordinality_data_t {


OrdinalityData

Mytherin · 2025-06-11T14:00:15Z

src/execution/operator/projection/physical_tableinout_function.cpp

@@ -87,6 +96,8 @@ OperatorResultType PhysicalTableInOutFunction::Execute(ExecutionContext &context
 		state.input_chunk.SetCardinality(1);
 		state.row_index++;
 		state.new_row = false;
+		lock_guard<mutex> guard(gstate.ordinality_lock);
+		gstate.current_ordinality_idx = 1;


Given that the other scan state is all located in the local state - I suspect the global state is the wrong place for the current_ordinality_idx. Do we have a test with a larger UNNEST or similar table function that triggers multi-threading?

Mytherin · 2025-06-11T14:00:55Z

src/execution/operator/projection/physical_tableinout_function.cpp

+		auto result = function.in_out_function(context, data, input, chunk);
+		if (function.ordinality_data.ordinality_request == Ordinality_request_t::REQUESTED) {
+			const idx_t ordinality = chunk.size();
+			function.ordinality_data.SetOrdinality(chunk, gstate.current_ordinality_idx, ordinality);


We are reading gstate.current_ordinality_idx without a lock, whereas we are modifying it below with a lock - one of these is wrong. I suspect we need to move this to the thread-local state anyway however, in which case we don't need a lock at all.

Mytherin · 2025-06-11T14:01:46Z

src/execution/operator/scan/physical_table_scan.cpp

@@ -120,6 +123,12 @@ SourceResultType PhysicalTableScan::GetData(ExecutionContext &context, DataChunk
 		break;
 	}

+	if (function.ordinality_data.ordinality_request == Ordinality_request_t::REQUESTED) {
+		idx_t ordinality = chunk.size();
+		function.ordinality_data.SetOrdinality(chunk, g_state.ordinality_idx, ordinality);


For PhysicalTableScan - I think we can instead perform this using a window function. This will currently be incorrect when multi-threading.

Mytherin · 2025-06-11T14:05:56Z

test/api/serialized_plans/serialized_plans.binary

We should not need to re-generate the serialized plans here. If this is necessary something is going wrong with serialization which can impact forward and/or backward compatibility. Can you revert this change and instead fix what caused this to be necessary?

Mytherin · 2025-06-11T14:07:15Z

test/sql/storage_version/storage_version.db

Same here - we should not be re-generating the storage version

Mytherin · 2025-06-11T14:08:16Z

test/sql/ordinality/ordinality_basic.test

+PRAGMA enable_verification
+
+query II
+SELECT o,range FROM range(1) WITH ORDINALITY AS _(range,o);


Can we unify some of these test files? We are adding a lot of test files that contain almost no tests in them. Perhaps we can have two files (one for table in-out functions, one for "standard" table functions like read_csv, read_parquet and table scans).

Mytherin · 2025-06-11T14:09:36Z

test/sql/ordinality/ordinality_readcsv.test_slow

+query III
+SELECT * FROM read_csv('test/sql/ordinality/a1.csv') WITH ORDINALITY ORDER BY Numbers,Chars,ordinality;
+----
+7800 values hashing to d86e8a5e8df33cec28e0ed7b6c386419


In general we prefer to avoid using the hash comparison feature - we can verify two results are the same using the labeled results, we can use that to verify results instead, e.g.:

query III nosort read_csv_result SELECT * FROM read_csv('test/sql/ordinality/a1.csv') WITH ORDINALITY ORDER BY Numbers,Chars,ordinality; query III nosort read_csv_result SELECT *, row_number() OVER () AS ordinality FROM read_csv('test/sql/ordinality/a1.csv') ORDER BY Numbers,Chars,ordinality;

Mytherin · 2025-06-11T14:14:14Z

test/sql/ordinality/ordinality_range.test

+
+
+query III
+SELECT * FROM test AS t, LATERAL range(t.a) WITH ORDINALITY AS _(range,ordinality) ORDER BY range;


Can we alias ordinality to something else and explicitly select it to ensure the aliasing works? e.g. SELECT my_ord FROM range(t.a) _(range, my_ordinality)

niykko added 2 commits March 10, 2025 14:21

Reviving the project: activate in parser, add request type

042a959

Remove automatically generated include (no idea how this happened, st…

2867bb8

…arting to lose my mind over here)

Mytherin added the Changes Requested label Mar 11, 2025

Next steps

4f15a1d

duckdb-draftbot marked this pull request as draft March 11, 2025 14:57

niykko added 8 commits March 11, 2025 16:14

seperate runtime and statistical data fields

39f3d9c

fixes: namespace, comparison operators, name refactor

0be152e

explicit cast instead of implicit

3f6f77d

oversight, wrong index fixed

84876a9

basic functionality for InOut functions, now need to handle resets

c0564d8

works laterally for InOut Functions

dc72b2e

Added LogicalWindow

7519aaf

run make format-fix (this also touched some .py files i have no busin…

288d2a0

…ess with, i discarded these changes)

niykko added 15 commits March 13, 2025 12:56

renamed ordinality_data type to ordinality_data_t

32583b8

add SetOrdinality when input is not projected

6ab76f2

refactor: removed unneeded reset bool

af49571

removed corresponding entry in sourcestates

f06e6ca

manually added serialization for TableFunctionRef

5760249

add proper serialization (had to find the right file first)

8537d6a

ran serialization and enum scripts

5f50b63

remove previously added json entry, add test

09ec614

add WITH ORDINALITY to ToString() of tablefunctioref

3b668bd

actually return result ;)

1557c19

changed some tests

5cc50c7

fix output string

f51a8c6

fix serialization

577b38c

fixed some unittests

77bb42d

Merge branch 'main' into with_ordinality_reloaded

d3da38a

delegate table-function ordinality entirely to window operator (logic…

08c3c3e

…al get is not informed anymore & correct the tests (typos)

niykko marked this pull request as ready for review May 23, 2025 13:41

niykko added 2 commits May 23, 2025 23:29

format-fix

d262022

remove OS specific glob test

c39deeb

duckdb-draftbot marked this pull request as draft May 23, 2025 21:38

niykko added 2 commits May 24, 2025 00:41

safe index conversion

ce90577

update test description & move csv files

8cad54a

niykko marked this pull request as ready for review May 26, 2025 10:19

niykko and others added 2 commits June 3, 2025 10:54

change static cast to numeric cast

9f44e88

Merge branch 'main' into with_ordinality_reloaded

14e38ba

duckdb-draftbot marked this pull request as draft June 3, 2025 09:53

niykko marked this pull request as ready for review June 3, 2025 12:09

update indexes & remove #include from merge

8deaf51

duckdb-draftbot marked this pull request as draft June 4, 2025 11:55

niykko marked this pull request as ready for review June 4, 2025 11:56

niykko added 2 commits June 5, 2025 10:29

update storage file (run 'Generate serialized plans file')

2a74a5e

Merge remote-tracking branch 'upstream/main' into with_ordinality_rel…

830741d

…oaded

duckdb-draftbot marked this pull request as draft June 5, 2025 08:45

niykko marked this pull request as ready for review June 5, 2025 08:46

niykko added 2 commits June 5, 2025 15:32

order matters when deserializing: move line according to deserializat…

61cc551

…ion correct order

Merge remote-tracking branch 'upstream/main' into with_ordinality_rel…

fcb7166

…oaded

duckdb-draftbot marked this pull request as draft June 5, 2025 14:20

niykko marked this pull request as ready for review June 5, 2025 14:58

use mutex for changing globaloperator ordinality data

a23460b

duckdb-draftbot marked this pull request as draft June 6, 2025 09:18

Merge remote-tracking branch 'upstream/main' into with_ordinality_rel…

0298d2c

…oaded

niykko marked this pull request as ready for review June 6, 2025 09:22

Mytherin reviewed Jun 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding WITH ORDINALITY to DuckDB #16581

Adding WITH ORDINALITY to DuckDB #16581

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!


		namespace duckdb {

		enum class Ordinality_request_t : uint8_t { NOT_REQUESTED = 9E88 0, REQUESTED = 1 };


		enum class Ordinality_request_t : uint8_t { NOT_REQUESTED = 0, REQUESTED = 1 };

		struct ordinality_data_t {



		query III
		SELECT * FROM test AS t, LATERAL range(t.a) WITH ORDINALITY AS _(range,ordinality) ORDER BY range;

Adding WITH ORDINALITY to DuckDB #16581

Are you sure you want to change the base?

Adding WITH ORDINALITY to DuckDB #16581

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!