-
Notifications
You must be signed in to change notification settings - Fork 53
socket-connect for CCL causing subsequent "Bad File Descriptor during Read" on large files #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Note that to replicate successfully, you probably need to not have an active ipv6 net connection. The resolver for e.g. google.com will return at least one ipv6 address, but then if you try to open an active socket on that address but you don't have a net interface to support ipv6, the make-socket will fail and that's when you can replicate the bug. |
@gendl I can't seem to replicate, but not sure about two points:
which for a 1Mib files of zeros created via |
A more correct link is via
|
No. This has to be done at the link level, or is removing something from ANSI |
@easye ipv6 would have to be disabled at the link level. It needs to be present in |
To be more precise - ipv6 doesn't necessarily have to be disabled globally in your whole OS - it's enough that the particular hostname you are doing the drakma call to meets two conditions: 1. It has an active AAAA record such that a dns lookup with |
Standing down from trying to replicate. @gendl find me on IRC to chat about next steps? |
I am now replicating reliably on any Linux CCL 1.12 I try, including WSL Ubuntu and Docker. To try it on Docker you can try the prebuilt Gendl/CCL image from dockerhub. In order to run the same version of the prebuilt Gendl image please follow these steps:
This should start Swank listening on port 5200, to which you can attach from an emacs with a reasonably recent Slime loaded by doing Note that the Edit: Actually you can access the huge-boxes-sequence test data directly if you cloned the gendl repo - in the container, it will show up automatically as So the
Then load the code above (adjusting the path in the defun I cannot replicate on MacOS but maybe it's because ipv6 actually works there. I will experiment and report here. @easye I'll look for you on IRC also. |
I cannot replicate under a Linux vultr host. I'll triage the exact details, but once I do, I will chase @gendl down on IRC. |
I have established a quick way to predict whether this error will replicate on your platform. Basically the error will happen if your host or network is not configured for ipv6, yet the resolver returns ipv6 addresses if you ask explicitly for them. Here is how to determine if this is the case on your testing platform:
Assuming the above returns an ipv6 address, now do:
If that returns a socket without errors, then your setup is configured for ipv6 and has ipv6 connectivity. If not, then not, and likely you will be able to replicate the error in this Issue description. |
Note that if you have cloned the gendl repository, you can also just run the pipeline tests directly from a shell with
|
I have (finally) managed to replicate the "Bad File Descriptor during Read" issue on native Linux without reference to gendl. I am now rummaging through the SLIME inspector at the point where the condition is signalled after having read 491520 of 5259205 bytes. More information when I have it… For completeness, the "bad" patch in "bad" only because it causes the error to surface: it could well be a problem somehow in |
@easye could you describe how you managed to replicate? Maybe @binghe could have a go at replicating (I only presume he might want to try that since he assigned himself here)? It might be telling that doing e.g. a |
I essentially followed your instructions on a cloud hosted Linux instance.I have a git repo with code and notes in an org file that may be of use.Away from a proper keyboard at the moment, so sharing the repo isn’t possible but I will do if folks are interested.Sent from my iPadOn May 18, 2023, at 20:48, Forked third-party libs ***@***.***> wrote:
@easye could you describe how you managed to replicate? Maybe @binghe could have a go at replicating (I only presume he might want to try that since he assigned himself here)? It might be telling that doing e.g. a (sleep 0.5) after the http client call and before the big file read, makes it so the problem doesn't show up...
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hi,
In upgrading from 0.8.3 to 0.8.6 I start to get errors on CCL in my test suite where it happens to do several http client calls (with zaserve client or drakma) then open and try to read a large file immediately.
The problem is isolated to function
socket-connect
frombackend/clozure.lisp
.First, download and place the
huge-boxes-sequence.data
file in a known location (or see below for a way to run from a container launched from a script in a clone of the Gendl repository, which avoids the need to download this file manually).Here is where you can download the
huge-boxes-sequence
file I've been testing with.Now replicate as follows:
Then call
If your host does not have ipv6 connectivity, you will likely get the "Bad File Descriptor during Read"
I think it may be leaving a dangling openmcl socket when it tries to call
openmcl-socket:make-socket
and fails when trying with ipv6 (as it now does explicitly all the time).As noted in the code above, the error only happens if a large file read is attempted immediately after the http client call. It only happens on hosts where the resolver will give a ipv6 address for e.g.
"google.com"
but then there is no ipv6 connectivity to that host. See below for a quick way to test whether this is the case in your setup or not.I wonder if there is something the CCL socket code is not cleaning up, when a call to
openmcl-socket:make-socket
is tried and fails with an error. Or does it have to do with the:deadline
:nodelay
or:connect-timeout
Anyway i can look into this further but it's late here now so I thought i'd post the simple example for replicating, in case anyone cares to try replicating.
Thanks,
Dave
The text was updated successfully, but these errors were encountered: