8000 Enhancement: Add Configurable Timeout to Socket Connection Methods by mz1999 · Pull Request #153 · eclipse-ee4j/orb · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Enhancement: Add Configurable Timeout to Socket Connection Methods #153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 4, 2025

Conversation

mz1999
Copy link
Contributor
@mz1999 mz1999 commented Jul 15, 2023

The ORB does not set a timeout when establishing a connection, and in some cases, such as when the opposite endpoint has set up firewall rules or is connecting to a non-existent IP address, the connection process will block for a long time at the native method:

"http-listener-1(3)" #42 daemon prio=5 os_prio=31 cpu=10.88ms elapsed=436.72s tid=0x00007f8218374800 nid=0xec03 runnable  [0x0000700004bd7000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.Net.connect0(java.base@17.0.7/Native Method)
    at sun.nio.ch.Net.connect(java.base@17.0.7/Net.java:579)
    at sun.nio.ch.Net.connect(java.base@17.0.7/Net.java:586)
    at sun.nio.ch.SocketChannelImpl.connect(java.base@17.0.7/SocketChannelImpl.java:853)
    at com.sun.corba.ee.impl.misc.ORBUtility.openSocketChannel(ORBUtility.java:92)
    at org.glassfish.enterprise.iiop.impl.IIOPSSLSocketFactory.createSocket(IIOPSSLSocketFactory.java:306)
    at com.sun.corba.ee.impl.transport.ConnectionImpl.<init>(ConnectionImpl.java:229)
    at com.sun.corba.ee.impl.transport.ConnectionImpl.<init>(ConnectionImpl.java:254)
    at com.sun.corba.ee.impl.transport.ContactInfoImpl.createConnection(ContactInfoImpl.java:108)
    at com.sun.corba.ee.impl.protocol.ClientRequestDispatcherImpl.beginRequest(ClientRequestDispatcherImpl.java:200)
    ...

By using tcpdump to capture packets, it was discovered that the kernel was constantly performing TCP retransmissions, with the number of retransmissions being controlled by the kernel parameter net.ipv4.tcp_syn_retries and using an exponential backoff strategy. On my machine, it takes 130 seconds for the timeout to occur.

tcp retransmissions

Therefore a configurable timeout should be set for the connection process. In fact, ORB already has a configurable property com.sun.corba.ee.transport.ORBTCPConnectTimeouts, but the connection process does not currently utilize this parameter.

This PR introduces a significant enhancement to the socket connection methods, providing more flexibility and control over socket connections.Additionally I add unit tests to ensure the correct functionality of the updated methods. These tests cover both normal operation and timeout scenarios.

@dmatej dmatej added this to the 5.0.1 milestone Feb 3, 2025
@dmatej dmatej added the enhancement New feature or request label Feb 3, 2025
@dmatej dmatej modified the milestones: 5.0.1, 5.0.0 Feb 3, 2025
@dmatej dmatej merged commit 40da3f9 into eclipse-ee4j:master Feb 4, 2025
2 checks passed
@dmatej
Copy link
Contributor
dmatej commented Feb 21, 2025

Unfortunately this PR causes failure of two GlassFish test blocks. It is reproducible locally too. I tried to revert it and the security_all block passed. If someone can do a quick look on the diff, so we would avoid reverting it and patched the code instead, I would appreciate it.

[exec] -----------------------------------------
     [exec] -     Converter Sample AppClient: FAIL      -
     [exec] -----------------------------------------
     [exec] Total PASS: 0
     [exec] Total FAIL: 1
     [exec] Total DID NOT RUN: 0
     [exec] -----------------------------------------
     [exec] in flushAll , creating new testSuiteHash
     [exec] in flushAll , creating new testSuiteHash
     [exec] Caught an unexpected exception!
     [exec] java.rmi.MarshalException: CORBA COMM_FAILURE 1330446344 Maybe; nested exception is: 
     [exec]     org.omg.CORBA.COMM_FAILURE: FINE: 00410008: Connection abort  vmcid: OMG  minor code: 8 completed: Maybe
     [exec]     at com.sun.corba.ee.impl.javax.rmi.CORBA.Util.mapSystemException(Util.java:194)
     [exec]     at com.sun.corba.ee.impl.presentation.rmi.StubInvocationHandlerImpl.privateInvoke(StubInvocationHandlerImpl.java:180)
     [exec]     at com.sun.corba.ee.impl.presentation.rmi.StubInvocationHandlerImpl.invoke(StubInvocationHandlerImpl.java:119)
     [exec]     at com.sun.corba.ee.impl.presentation.rmi.codegen.CodegenStubBase.invoke(CodegenStubBase.java:206)
     [exec]     at com.sun.s1peqe.security.ssl.converter.ejb._ConverterRemoteHome_DynamicStub.create(com/sun/s1peqe/security/ssl/converter/ejb/_ConverterRemoteHome_DynamicStub.java)
     [exec]     at com.sun.s1peqe.security.ssl.converter.client.ConverterClient.run(ConverterClient.java:128)
     [exec]     at com.sun.s1peqe.security.ssl.converter.client.ConverterClient.main(ConverterClient.java:84)
     [exec]     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
     [exec]     at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
     [exec]     at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     [exec]     at java.base/java.lang.reflect.Method.invoke(Method.java:569)
     [exec]     at org.glassfish.appclient.client.acc.AppClientContainer.launch(AppClientContainer.java:365)
     [exec]     at org.glassfish.appclient.client.AppClientFacade.launch(AppClientFacade.java:145)
     [exec]     at org.glassfish.appclient.client.AppClientGroupFacade.main(AppClientGroupFacade.java:39)
     [exec] Caused by: org.omg.CORBA.COMM_FAILURE: FINE: 00410008: Connection abort  vmcid: OMG  minor code: 8 completed: Maybe
     [exec]     at jdk.proxy2/jdk.proxy2.$Proxy39.connectionAbort(Unknown Source)
     [exec]     at com.sun.corba.ee.impl.transport.ConnectionImpl.doOptimizedReadStrategy(ConnectionImpl.java:1271)
     [exec]   
8000
  at com.sun.corba.ee.impl.transport.ConnectionImpl.doWork(ConnectionImpl.java:829)
     [exec]     at com.sun.corba.ee.impl.threadpool.ThreadPoolImpl$WorkerThread.performWork(ThreadPoolImpl.java:476)
     [exec]     at com.sun.corba.ee.impl.threadpool.ThreadPoolImpl$WorkerThread.run(ThreadPoolImpl.java:519)
     [exec] Caused by: org.omg.CORBA.COMM_FAILURE: FINE: 00410011: IOException received when reading from connection SocketOrChannelConnectionImpl[ java.nio.channels.SocketChannel[connected local=/127.0.0.1:35380 remote=/127.0.1.1:3920] ESTABLISHED true true]  vmcid: OMG  minor code: 11  completed: No
     [exec]     at jdk.proxy2/jdk.proxy2.$Proxy39.ioexceptionWhenReadingConnection(Unknown Source)
     [exec]     at com.sun.corba.ee.impl.transport.ConnectionImpl.nonBlockingRead(ConnectionImpl.java:1449)
     [exec]     at com.sun.corba.ee.impl.transport.ConnectionImpl.doOptimizedReadStrategy(ConnectionImpl.java:1225)
     [exec]     ... 3 more
     [exec] Caused by: java.io.IOException: End-of-stream
     [exec]     at com.sun.corba.ee.impl.transport.ConnectionImpl.nonBlockingRead(ConnectionImpl.java:1442)
     [exec]     ... 4 more

@mz1999
Copy link
Contributor Author
mz1999 commented Feb 21, 2025

Unfortunately this PR causes failure of two GlassFish test blocks. It is reproducible locally too. I tried to revert it and the security_all block passed. If someone can do a quick look on the diff, so we would avoid reverting it and patched the code instead, I would appreciate it.

Could you please provide more details on how to reproduce the test failures locally?  I'm trying to investigate and resolve this issue.

@dmatej
Copy link
Contributor
dmatej commented Feb 21, 2025

You can check out my branch in eclipse-ee4j/glassfish#25364 and then run

mvn clean install -T4C -Pfastest && ./runtests.sh security_all

4 tests will fail. I already know what is wrong, but I am still not sure how to fix it. The cause is in somehow ambiguous java API, which allows to use same methods for blocking or nonblocking connect. Original code allowed both, so you did not see it and changed it to just blocking.

However there are some methods using selectors which depend on nonblocking socket channel. And that is probably causing the issue you tried to fix.

If we would not be able to fix it until Sunday, I will revert it and we can release 5.0.1 soon after that. GF8 is blocked by ORB for a long time now. Maybe we should refactor it somehow, so it would be cleaner and more reliable. It seems that

  • nonblocking originally worked well but blocking did not set timeout
  • nonblocking now is not supported at all but blocking sets the timeout
  • I would expect that if somebody configured nonblocking AND we provided blocking sc, it would throw an exception, ie IllegalStateException, not IOException. See the ConnectionImpl.defineSocket method.

FYI, @pzygielo and @arjantijms

defineSocket(useSelectThreadToWait,
orb.getORBData().getSocketFactory().createSocket(socketType, new InetSocketAddress(hostname, port)));
orb.getORBData().getSocketFactory().createSocket(
orb.getORBData().connectionSocketType(),
Copy link
Contributor
@dmatej dmatej Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yet one question - why you replaced the socketType parameter with orb.getORBData().connectionSocketType()? It seems it did not change anything in test results, but ...

SocketChannel socketChannel = null;
Socket socket = null;

if (orb.getORBData().connectionSocketType().equals(ORBConstants.SOCKETCHANNEL)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, here it is too ... type was unused, and it took the value from the orbdata ... why?

@mz1999
Copy link
Contributor Author
mz1999 commented Feb 21, 2025
9003

I'm a bit fuzzy on the details of this change since it was a while ago.

If the issue is confirmed and the release is urgent, I'd recommend reverting it for now. I'll take some time next week to investigate the issue thoroughly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0