Releases: gmtsciencedev/scitq
v1.0rc8
This new version of SCITQ solves a lot of issues with uwsgi. Users of v1.0rc5/rc6/rc7 should upgrade as soon as possible.
The new deploy system split scitq service in two services, scitq-main and scitq-queue. The first is a pure Flask app and run in a uwsgi multi-process setup. The second is pure python multithreaded app. This new setup is more stable and provides a huge performance improvement.
Under the hood, this involved numerous changes including changing UI socket.io to the more classic jQuery, but the resulting performance improvement in the UI made it worth it. Also, the management of worker creation and destruction if a lot better with a job queue at the bottom of the main UI screen. In case of errors in workers deploy (which happens notably with some specific instances, like OVH i1-180 which we like very much, but often missing in some regions), this makes life a lot easier and enable not to go to the provider console to have more info about what is going on.
For the end user, scitq remains the same, simply more performant and more user-friendly.
v1.0rc7
- We have fixed a rare bug when the server was overloaded due to synchronicity of scitq.lib
- scitq.lib asynchronous state is now fully flexible (asynchronous option is present for all put/post/delete calls) and sound defaults have been set up for all API calls (like ping is now synchronous). Execution state change from 'attributed' to 'accepted' is now synchronous (fix from the above-mentioned bug).
- Enhance AWS config support using one of the proposal from aws/aws-cli#1270, we do not need anymore AWS_ENDPOINT_URL environment variable to be set (its value is read from .aws/config file as is the case with awscli provided endpoint plugin is loaded). Then we could simplify docs and templates as this is not required anymore.
PS: this was released once as rc6 and a last minute blocking bug required this new release
v1.0rc5
uwsgi and crash recovery added
- now uwsgi can be used to deploy and this improves performances a lot (and thus becomes the new default)
- scitq server crash is automatically recovered (except running ansible processes like when a node was not fully deployed before crash)
- fix deploy by source which was broken (so now I can debug without committing)
- cleaning a batch is now a lot quicker
- some glitches in doc were fixed
v1.0rc4
Solve ghost process issues in worker
v1.0rc3
Better handling of worker processes (+ some UI glitches fixed + partial handling of task relaunch when already task is supposedly running)
v1.0rc2
More robust idle callback (permits worker undeploy when load is heavy)
v1.0rc1
scitq is now very stable for its current usage and it was tested in a large number of case at GMT so it should fit a large number of use case. It is always complex to say if a quite versatile system like scitq is completely stable but what hase been tested so far:
- queuing several hundreds of large scientific tasks on a large number of node (not very limiting),
- efficient s3 transfer in (was really optimized lately) and out,
- automatic cloud worker management,
- ease of upgrade
- docker images (thought for people using Kubernetes), including a docker in docker image for the worker image, a little challenging to setup (but working fine now) - requires to be run with --privileged flag though
v1.0b19
Display version, code cleaning & docker images
v1.0b18
Better end of process error captation and task dispatch engine
Fix concurrency control and estimation in client
- sometimes some extra processes were launched at client creation, this is now prevented,
- adjusting concurrency by database request should occur less often.