8000 exists() method for repository · Issue #7884 · quarkusio/quarkus · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

exists() method for repository #7884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cunhazera opened this issue Mar 16, 2020 · 37 comments
Open

exists() method for repository #7884

cunhazera opened this issue Mar 16, 2020 · 37 comments
Labels
area/panache kind/enhancement New feature or request

Comments

@cunhazera
Copy link
cunhazera commented Mar 16, 2020

Description
Nowadays we have the findByIdOptional and count to check if a record exists inside the database. count is much faster than selecting the whole object data and serializing it. But wouldn't it be nice to have an exists(id) method to abstract a db query that is so common?

@cunhazera cunhazera added the kind/enhancement New feature or request label Mar 16, 2020
@gastaldi
Copy link
Contributor
gastaldi commented Mar 16, 2020

cc @FroMage @emmanuelbernard @loicmathieu

@FroMage
Copy link
Member
FroMage commented Mar 17, 2020

Is it really that common to check for existence of a record without loading it? I've never had to do that myself.

@cunhazera
Copy link
Author

@FroMage but isn't it too much to load all fields from db to check if it exists?

@FroMage
Copy link
Member
FroMage commented Mar 17, 2020

Well, sure, but only if you need to check for existence without later using the entity. I've never seen such a use case. All my use-cases check if the object exist and then proceed to use the entity.

@cunhazera
Copy link
Author

@FroMage yeah, it makes sense. In my case, I check the record exists. If it does, I throw a custom exception.

@Postremus
Copy link
Member
Postremus commented Mar 17, 2020

@FroMage I also sometimes just check for existence in my apps, and then throw an NotFoundException.

If you want, I can try to implement this feature.

@FroMage
Copy link
Member
FroMage commented Mar 17, 2020

and then throw an NotFoundException

But if it exists, what does it do?

@alexander-dammeier
Copy link

I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id.

@emmanuelbernard
Copy link
Member
emmanuelbernard commented Mar 30, 2020 via email

@alexander-dammeier
Copy link
alexander-dammeier commented Mar 30, 2020

But is exists faster?
Are all databases intelligent enough not to go through all rows to count duplicates in a column with an unique constraint?

edit:
but exists can also be used for columns without unque constraint so it should be faster than count in this situations.

@emmanuelbernard
Copy link
Member

What do you want for the feature? The existence of the entity? Or the existence of some column value?
I thought the former based on the issue description. In this case, I don't think my count approach is any different than whatever a custom exist implementation would do performance wise. If anything it could be made more memory efficient than a select id from entity where id = ?. The count is done after the filtering.

The one annoyance with my solution is that the user must know that the id property is named "id"

@lordofthejars
Copy link
Contributor

👍 to implement this, I think that of course you can do a count but this exists methods is more about clean code. At the end you can always do in Java list.size() == 0 but we do list.isEmpty() this is more or less the same.

@abuinitski
Copy link
abuinitski commented Dec 20, 2024

count is not the same as exists.

Postgres example:

  • SELECT EXISTS(SELECT 1 FROM contact WHERE name='John') vs
  • SELECT COUNT(*) FROM contact WHERE name = 'John'

Now imagine that name has no index on it (on purpose), and request results in a full-table-scan. Count will have to actually scan the whole table, while EXISTS can stop as soon as it sees first "John" guy.

@FroMage
Copy link
Member
FroMage commented Dec 20, 2024

Sure, but the question remains: what do you do once you know the entity exists?

@abuinitski
Copy link
abuinitski commented Dec 23, 2024

Sure, but the question remains: what do you do once you know the entity exists?

The question is not what "to do", the question is what "not to do".

For example:

  1. Do not create a duplicate.
  2. Do not bother with custom native query and "ON CONFLICT" clause in case a duplicate already exists
  3. Do not bother parsing native database exception code, and checking failing constraint name, to catch duplicate error and respond to client with a reasonable error message instead of an "unknown error something is wrong with my database idk"

Of course, race condition is still possible unless you raise transaction isolation level, but then it would be rare enough so I just can refer to low salary, unreasonable deadlines, and bad management in general to justify why my code has sporadic issues.

@FroMage
Copy link
Member
FroMage commented Jan 8, 2025

My argument was that in most cases, once we know the entity exists, we typically load it.

For example, if you want to avoid creating a duplicate, you check that it exists, but then… what do you do with the object you wanted to create? Don't you turn the insert into an update? For that you'll need the entity.

I'm only trying to get a better idea of the usefulness of this. I can believe it's more efficient than count, for sure. That's one argument in its favour already.

I didn't understand yet how useful it can be to check for something's existence and if it exists, not need it. If it doesn't exist, there's no problem, of course.

@alexander-dammeier
Copy link

In our case, we often check for the existence of an entry because the existence alone tells us if a step in our process was executed successful. Otherwise we schedule a retry, so for this case we use it more like an event store and the entries are often immutable.

I think it is very dangerous to use an exist function to avoid duplicates as another process could create such an entry after you just checked it. In my opinion, unique constraints (ID conflicts) are the only viable option to truly avoid this and therefore we always check if the transaction failed because of a conflict. We then do not load the entry because knowing that it exists is enough to continue or there is simply nothing left to do.

@markus-lehr-dedalus
Copy link

Our company has a microservice architecture which also uses Kafka for communication.
Since Kafka leaves the fitering of messages to the clients, our app needs to decide if the received event is relevant for us.
For this I want to check if a used identifier in a Kafka message exists in our db (which contains millions of records, only a small subset of the billions of possible entities of our whole system).
Having to use a count instead is such a dealbreaker for us that I'll have to write a non-panache query instead.

@FroMage
Copy link
Member
FroMage commented Jun 11, 2025

@gavinking is there an equivalent .exists(Object id) method in the latest ORM these days? I could not find anything in the Session API.

@gavinking
Copy link

No, and what would the semantics of such an operation be?

  1. Is it enough to check the second-level cache, or does it have to always go to the database?
  2. Does it acquire some sort of exclusive lock on that id so that no other transaction can insert or delete the row until the present transaction commits?

@gavinking
Copy link

But the main reason we don't have it is it's typically such a crappy and inefficient way to interact with the database.

@gavinking
Copy link

Probably the people who want this should be using upsert() or something else which doesn't suck.

@gavinking
Copy link
gavinking commented Jun 11, 2025

So, to be clear (I was posting from my phone). The people posting above are giving what looks to me like terrible advice on how to implement an upsert on databases which don't support the SQL merge statement.

The approach Hibernate uses to emulate merge on databases which don't have it, is the following:

  1. attempt an UPDATE on the row, and check the returned rowcount (if it is 1, stop)
  2. if it is 0, attempt an INSERT.

An alternative approach is to attempt the INSERT first, and have fancy code for checking if a SQLException represents a primary key violation.

The following is not a good solution:

  1. attempt to SELECT the row
  2. if it exists, attempt an UPDATE, or otherwise,
  3. if not, attempt an INSERT and 🙏🙏that it wasn't inserted since the SELECT, since we don't have any sort of lock on the nonexistent row.

Anyway, people don't need to be implementing this stuff themselves, since it's already implemented by StatelessSession.upsert(), which will automatically use a MERGE on platforms which support it.

@gavinking
Copy link

@alexander-dammeier

I use "exists" to prove if an id really refers to an existing aggregate when i only hold the id in another aggregate. It is a business rule in our system so i have to check it but i do not need the hole entity, only the already given id.

And this is terrible.

I'm on my knees begging everyone to put down the DDD Jonestown Kool-aid and just try using JPA and your relational database in the normal way that these things were intended to be used.

@markus-lehr-dedalus
Copy link

Again, in our distributed system (hospital), we receive a lot of Kafka messages - only a small part of which is relevant for our app.
Whether or not a message is relevant is only decided if the identifiers match some others in our DB.
If the identifiers don't match, we don't want to store an entity but ignore the message.
This is the expected case and there we don't want to upsert or merge or persist, but need to efficiently and quickly check if this DB entry exists. The events which we actually need to store are rare so an extra round trip is not the end of the world.

@gavinking - I see your point and I also read that others just need an upsert, but I'd like to believe that we actually have a legitimate usecase for this exists check.

@FroMage
Copy link
Member
FroMage commented Jun 12, 2025

Yeah, agreed. I was sceptical because I assumed exists would always be followed by a load. @gavinking assumes it's for upsert. But @markus-lehr-dedalus was the first to raise a use-case where exists is not followed by either load or update but for other actions (perhaps not even on the DB: on other systems?).

I am not sure if it's a common use-case, though. But at least it's new to me :)

@gavinking
Copy link

@markus-lehr-dedalus In your use case, what are the transactional semantics of that?

@gavinking
Copy link
gavinking commented Jun 12, 2025

Another comment: in the case where exists() takes an id, I'm not clear on why a count query (for example, getResultCount() in Hibernate) is no good. I would have expected the query planner to be able to come up with something very efficient when it's the PK.

(Unclear to me why exists around a subquery would be better.)

@markus-lehr-dedalus
Copy link

@gavinking I'm not 100% certain what you mean by transactional semantics, but I can give you a workflow:

One of the specific workflows (simplified a bit) for us is that we get the Kafka message for any medical case update (@Incoming annotation on method, not @Channel on mutiny stuff).

After a bit of validation, we enter a method marked as @Transactional for processing the medical case.
This medical case has a number (the thing I called identifier, but not actually the DB id, not even unique constraint) and we check if our app has any entities (can be multiple as well, e.g. blood samples) which reference this medical case via this number (ideally with this exists method).
Since we don't store whole the medical case itself (huge, complex object and another app's responsibility), this number does not have any foreign key constraint in our app. I therefore don't think any optimization in getResultCount can be performed.

If we find any entity, then we know that we have to process the message's content further and create or update another whole different entity (physical location relevant for nurses) than the one we executed the exists check for.
After that, we send SSE updates to connected clients and then the transactional method ends.

If we don't find any entity (e.g. patient came in for something unrelated to blood samples), then we don't create any entity and the transactional method ends.

@gavinking
Copy link

And then you retry later?

@markus-lehr-dedalus
Copy link

No we don't retry later - the producers of the Kafka messages write to general purpose topics and consumers can subscribe to those changes. Imagine in the hospital someone shouting "Someone stubbed their toe!", but we might not be interested in this event. Whether or not we are interested - again - is written in our DB and can change at runtime.

You probably mean: If such an entity (which indicates we might have been interested in a medical case update) is created at a later point, then we must have missed those updates? Then we fetch the current information for the nurses via REST in a separate mechanism. See my attached impeccable ms paint diagram for clarity.

Image

@gavinking
Copy link

So, look, since we already opened the door by adding SelectionQuery.getResultCount(), we could in principle add SelectionQuery.hasResults() which you could use as follows:

boolean exists = 
        session.createSelectionQuery("where key = ?1", Thing.class)
                .setParameter(1, key)
                .hasResults();

And the resulting SQL would be:

select 1 from things where key = ? limit 1

But honestly I'm just not clear on how generally-useful this would be. I mean, you can already write:

boolean exists = 
        sesssion.createQuery("select 1 from Thing where key = ?1 limit 1", int.class)
                .getSingleResultOrNull() != null;

which is really not much worse.

@FroMage ?

@markus-lehr-dedalus
Copy link

@gavinking I like your proposed solution because of the enhanced expressiveness - however, I totally understand that not a lot of people will use this.
Of course, I am not blocked by this in any way, because I came up with quite a similar sql statement myself.

@FroMage
Copy link
Member
FroMage commented Jun 13, 2025

For Panache, this would be:

// assuming we're talking about a Panache query and this is not a primary key
boolean exists = Thing.exists("key", key);

Or for a primary key:

boolean exists = Thing.existsById(key);

I can trivially implement the first one (query) by using the first example .hasResults(). I am not sure if select 1 from Thing where key = ?1 limit 1 is the best implementation for all databases/queries. Is it the most optimal query? If yes, then I can also trivially implement the first one (query) with this (unless this falls into the same traps as counting which requires parsing the HQL and having multiple translations).

As for the second case (by id) the translation is not trivial, I think.

@gavinking
Copy link

Is it the most optimal query?

Generalizing over all SQL databases is definitely above my pay grade.

I suppose the main alternative would be:

select exists(select 1 from Thing where key = ?1)

I just asked ChatGPT, and it claims that they are equivalent on Postgres, and recommends use of limit or equivalent on MySQL, Db2, and SQL Server. For Oracle it thinks you should use rownum. I'd take all that with a grain of salt, since ChatGPT can be very convincingly wrong on this sort of stuff.

@gavinking
Copy link

See https://hibernate.atlassian.net/browse/HHH-19541 and https://github.com/hibernate/hibernate-orm/pull/10334/files.

@FroMage
Copy link
Member
FroMage commented Jun 16, 2025

I suppose it means I have to reopen this to support this new method :)

@FroMage FroMage reopened this Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/panache kind/enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

10 participants
0