exists() method for repository #7884

cunhazera · 2020-03-16T12:27:16Z

Description
Nowadays we have the findByIdOptional and count to check if a record exists inside the database. count is much faster than selecting the whole object data and serializing it. But wouldn't it be nice to have an exists(id) method to abstract a db query that is so common?

The text was updated successfully, but these errors were encountered:

gastaldi · 2020-03-16T15:30:36Z

cc @FroMage @emmanuelbernard @loicmathieu

FroMage · 2020-03-17T11:00:32Z

Is it really that common to check for existence of a record without loading it? I've never had to do that myself.

cunhazera · 2020-03-17T11:34:18Z

@FroMage but isn't it too much to load all fields from db to check if it exists?

FroMage · 2020-03-17T13:43:28Z

Well, sure, but only if you need to check for existence without later using the entity. I've never seen such a use case. All my use-cases check if the object exist and then proceed to use the entity.

cunhazera · 2020-03-17T13:47:40Z

@FroMage yeah, it makes sense. In my case, I check the record exists. If it does, I throw a custom exception.

Postremus · 2020-03-17T13:52:17Z

@FroMage I also sometimes just check for existence in my apps, and then throw an NotFoundException.

If you want, I can try to implement this feature.

FroMage · 2020-03-17T14:20:00Z

and then throw an NotFoundException

But if it exists, what does it do?

alexander-dammeier · 2020-03-29T19:36:08Z

I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id.

emmanuelbernard · 2020-03-30T13:03:39Z

note that count is not too far from what you have in exists. if ( !Person.exists(1L) ) { ... } // vs if ( Person.count("id", 1L) != 1 ) { ... }

…

On Sun, Mar 29, 2020 at 9:36 PM Alexander Dammeier ***@***.***> wrote: I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7884 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AACJNWA7GUI7XQLGT4FREGDRJ6PLLANCNFSM4LMFG6LA> .

alexander-dammeier · 2020-03-30T14:02:32Z

But is exists faster?
Are all databases intelligent enough not to go through all rows to count duplicates in a column with an unique constraint?

edit:
but exists can also be used for columns without unque constraint so it should be faster than count in this situations.

emmanuelbernard · 2020-03-30T14:10:43Z

What do you want for the feature? The existence of the entity? Or the existence of some column value?
I thought the former based on the issue description. In this case, I don't think my count approach is any different than whatever a custom exist implementation would do performance wise. If anything it could be made more memory efficient than a select id from entity where id = ?. The count is done after the filtering.

The one annoyance with my solution is that the user must know that the id property is named "id"

lordofthejars · 2020-07-07T09:49:38Z

👍 to implement this, I think that of course you can do a count but this exists methods is more about clean code. At the end you can always do in Java list.size() == 0 but we do list.isEmpty() this is more or less the same.

abuinitski · 2024-12-20T09:59:18Z

count is not the same as exists.

Postgres example:

SELECT EXISTS(SELECT 1 FROM contact WHERE name='John') vs
SELECT COUNT(*) FROM contact WHERE name = 'John'

Now imagine that name has no index on it (on purpose), and request results in a full-table-scan. Count will have to actually scan the whole table, while EXISTS can stop as soon as it sees first "John" guy.

FroMage · 2024-12-20T13:40:17Z

Sure, but the question remains: what do you do once you know the entity exists?

abuinitski · 2024-12-23T02:23:45Z

Sure, but the question remains: what do you do once you know the entity exists?

The question is not what "to do", the question is what "not to do".

For example:

Do not create a duplicate.
Do not bother with custom native query and "ON CONFLICT" clause in case a duplicate already exists
Do not bother parsing native database exception code, and checking failing constraint name, to catch duplicate error and respond to client with a reasonable error message instead of an "unknown error something is wrong with my database idk"

Of course, race condition is still possible unless you raise transaction isolation level, but then it would be rare enough so I just can refer to low salary, unreasonable deadlines, and bad management in general to justify why my code has sporadic issues.

FroMage · 2025-01-08T16:19:09Z

My argument was that in most cases, once we know the entity exists, we typically load it.

For example, if you want to avoid creating a duplicate, you check that it exists, but then… what do you do with the object you wanted to create? Don't you turn the insert into an update? For that you'll need the entity.

I'm only trying to get a better idea of the usefulness of this. I can believe it's more efficient than count, for sure. That's one argument in its favour already.

I didn't understand yet how useful it can be to check for something's existence and if it exists, not need it. If it doesn't exist, there's no problem, of course.

alexander-dammeier · 2025-01-10T11:40:18Z

In our case, we often check for the existence of an entry because the existence alone tells us if a step in our process was executed successful. Otherwise we schedule a retry, so for this case we use it more like an event store and the entries are often immutable.

I think it is very dangerous to use an exist function to avoid duplicates as another process could create such an entry after you just checked it. In my opinion, unique constraints (ID conflicts) are the only viable option to truly avoid this and therefore we always check if the transaction failed because of a conflict. We then do not load the entry because knowing that it exists is enough to continue or there is simply nothing left to do.

markus-lehr-dedalus · 2025-05-30T06:51:56Z

Our company has a microservice architecture which also uses Kafka for communication.
Since Kafka leaves the fitering of messages to the clients, our app needs to decide if the received event is relevant for us.
For this I want to check if a used identifier in a Kafka message exists in our db (which contains millions of records, only a small subset of the billions of possible entities of our whole system).
Having to use a count instead is such a dealbreaker for us that I'll have to write a non-panache query instead.

FroMage · 2025-06-11T15:14:41Z

@gavinking is there an equivalent .exists(Object id) method in the latest ORM these days? I could not find anything in the Session API.

gavinking · 2025-06-11T15:25:22Z

No, and what would the semantics of such an operation be?

Is it enough to check the second-level cache, or does it have to always go to the database?
Does it acquire some sort of exclusive lock on that id so that no other transaction can insert or delete the row until the present transaction commits?

gavinking · 2025-06-11T15:26:14Z

But the main reason we don't have it is it's typically such a crappy and inefficient way to interact with the database.

gavinking · 2025-06-11T15:27:10Z

Probably the people who want this should be using upsert() or something else which doesn't suck.

gavinking · 2025-06-11T15:57:48Z

So, to be clear (I was posting from my phone). The people posting above are giving what looks to me like terrible advice on how to implement an upsert on databases which don't support the SQL merge statement.

The approach Hibernate uses to emulate merge on databases which don't have it, is the following:

attempt an UPDATE on the row, and check the returned rowcount (if it is 1, stop)
if it is 0, attempt an INSERT.

An alternative approach is to attempt the INSERT first, and have fancy code for checking if a SQLException represents a primary key violation.

The following is not a good solution:

attempt to SELECT the row
if it exists, attempt an UPDATE, or otherwise,
if not, attempt an INSERT and 🙏🙏that it wasn't inserted since the SELECT, since we don't have any sort of lock on the nonexistent row.

Anyway, people don't need to be implementing this stuff themselves, since it's already implemented by StatelessSession.upsert(), which will automatically use a MERGE on platforms which support it.

gavinking · 2025-06-11T16:05:59Z

@alexander-dammeier

I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id.

And this is terrible.

First of all: "aggregates" are not the right way to use JPA. You can read what we have to say about aggregates here: https://docs.jboss.org/hibernate/orm/7.0/introduction/html_single/Hibernate_Introduction.html#associations
Second of all: referential integrity is the responsibility of the database! You should have foreign key constraints, not rubbish select queries to validate referential integrity.

I'm on my knees begging everyone to put down the DDD Jonestown Kool-aid and just try using JPA and your relational database in the normal way that these things were intended to be used.

markus-lehr-dedalus · 2025-06-12T06:01:11Z

Again, in our distributed system (hospital), we receive a lot of Kafka messages - only a small part of which is relevant for our app.
Whether or not a message is relevant is only decided if the identifiers match some others in our DB.
If the identifiers don't match, we don't want to store an entity but ignore the message.
This is the expected case and there we don't want to upsert or merge or persist, but need to efficiently and quickly check if this DB entry exists. The events which we actually need to store are rare so an extra round trip is not the end of the world.

@gavinking - I see your point and I also read that others just need an upsert, but I'd like to believe that we actually have a legitimate usecase for this exists check.

FroMage · 2025-06-12T08:27:15Z

Yeah, agreed. I was sceptical because I assumed exists would always be followed by a load. @gavinking assumes it's for upsert. But @markus-lehr-dedalus was the first to raise a use-case where exists is not followed by either load or update but for other actions (perhaps not even on the DB: on other systems?).

I am not sure if it's a common use-case, though. But at least it's new to me :)

gavinking · 2025-06-12T08:59:22Z

@markus-lehr-dedalus In your use case, what are the transactional semantics of that?

gavinking · 2025-06-12T09:03:27Z

Another comment: in the case where exists() takes an id, I'm not clear on why a count query (for example, getResultCount() in Hibernate) is no good. I would have expected the query planner to be able to come up with something very efficient when it's the PK.

(Unclear to me why exists around a subquery would be better.)

markus-lehr-dedalus · 2025-06-12T09:25:21Z

@gavinking I'm not 100% certain what you mean by transactional semantics, but I can give you a workflow:

One of the specific workflows (simplified a bit) for us is that we get the Kafka message for any medical case update (@Incoming annotation on method, not @Channel on mutiny stuff).

After a bit of validation, we enter a method marked as @Transactional for processing the medical case.
This medical case has a number (the thing I called identifier, but not actually the DB id, not even unique constraint) and we check if our app has any entities (can be multiple as well, e.g. blood samples) which reference this medical case via this number (ideally with this exists method).
Since we don't store whole the medical case itself (huge, complex object and another app's responsibility), this number does not have any foreign key constraint in our app. I therefore don't think any optimization in getResultCount can be performed.

If we find any entity, then we know that we have to process the message's content further and create or update another whole different entity (physical location relevant for nurses) than the one we executed the exists check for.
After that, we send SSE updates to connected clients and then the transactional method ends.

If we don't find any entity (e.g. patient came in for something unrelated to blood samples), then we don't create any entity and the transactional method ends.

gavinking · 2025-06-12T10:49:40Z

And then you retry later?

markus-lehr-dedalus · 2025-06-13T05:30:33Z

No we don't retry later - the producers of the Kafka messages write to general purpose topics and consumers can subscribe to those changes. Imagine in the hospital someone shouting "Someone stubbed their toe!", but we might not be interested in this event. Whether or not we are interested - again - is written in our DB and can change at runtime.

You probably mean: If such an entity (which indicates we might have been interested in a medical case update) is created at a later point, then we must have missed those updates? Then we fetch the current information for the nurses via REST in a separate mechanism. See my attached impeccable ms paint diagram for clarity.

gavinking · 2025-06-13T06:26:08Z

So, look, since we already opened the door by adding SelectionQuery.getResultCount(), we could in principle add SelectionQuery.hasResults() which you could use as follows:

boolean exists = 
        session.createSelectionQuery("where key = ?1", Thing.class)
                .setParameter(1, key)
                .hasResults();

And the resulting SQL would be:

select 1 from things where key = ? limit 1

But honestly I'm just not clear on how generally-useful this would be. I mean, you can already write:

boolean exists = 
        sesssion.createQuery("select 1 from Thing where key = ?1 limit 1", int.class)
                .getSingleResultOrNull() != null;

which is really not much worse.

@FroMage ?

markus-lehr-dedalus · 2025-06-13T06:31:36Z

@gavinking I like your proposed solution because of the enhanced expressiveness - however, I totally understand that not a lot of people will use this.
Of course, I am not blocked by this in any way, because I came up with quite a similar sql statement myself.

FroMage · 2025-06-13T12:41:37Z

For Panache, this would be:

// assuming we're talking about a Panache query and this is not a primary key
boolean exists = Thing.exists("key", key);

Or for a primary key:

boolean exists = Thing.existsById(key);

I can trivially implement the first one (query) by using the first example .hasResults(). I am not sure if select 1 from Thing where key = ?1 limit 1 is the best implementation for all databases/queries. Is it the most optimal query? If yes, then I can also trivially implement the first one (query) with this (unless this falls into the same traps as counting which requires parsing the HQL and having multiple translations).

As for the second case (by id) the translation is not trivial, I think.

gavinking · 2025-06-14T09:21:50Z

Is it the most optimal query?

Generalizing over all SQL databases is definitely above my pay grade.

I suppose the main alternative would be:

select exists(select 1 from Thing where key = ?1)

I just asked ChatGPT, and it claims that they are equivalent on Postgres, and recommends use of limit or equivalent on MySQL, Db2, and SQL Server. For Oracle it thinks you should use rownum. I'd take all that with a grain of salt, since ChatGPT can be very convincingly wrong on this sort of stuff.

gavinking · 2025-06-16T07:11:36Z

See https://hibernate.atlassian.net/browse/HHH-19541 and https://github.com/hibernate/hibernate-orm/pull/10334/files.

FroMage · 2025-06-16T08:47:01Z

I suppose it means I have to reopen this to support this new method :)

cunhazera added the kind/enhancement New feature or request label Mar 16, 2020

gastaldi added the area/panache label Mar 16, 2020

loicmathieu mentioned this issue Jul 7, 2020

Implement exists in PanacheEntity #10513

Closed

cunhazera closed this as completed Jan 14, 2022

FroMage reopened this Jun 16, 2025

exists() method for repository #7884

exists() method for repository #7884

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!