-
Notifications
You must be signed in to change notification settings - Fork 71
Issue to execute function when on k8 #895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @krho999 , many thanks for using Faasm! How have you deployed Faasm on Kubernetes? Have you used Also, if you are getting started with Faasm, I would recommend deploying on a |
Hey, yes I run the compose version and could execute your h 8000 ellomp and hellompi but when running custom script I will get the pool_runner error 2024-12-09 21:08:39 Error: Number of processes must be 2 this is from the docker on my Mac. The script is MPI Cartesian implementation I can add it here in case you want to have a look. When I use a normal HPW cluster the MPI works with no issue for me so not sure what can be wrong here. I might rework it from Cartesian to normal Rank MPI com and see if that helps, unfortunately on the kodeamn cluster I have from VMs none of the scripts work so I think that might be a hardware issue Small update, After refactoring to common world coms it still throws different error regarding gatherRv but the script caalculates.
I have commented out the GatherRv as that has a different error running in the planner saying it could not map the MPI function. |
Hey @krho999 , very interesting to see that you are using Faasm's MPI support! We have done a lot of work to support MPI and OpenMP, particularly extending faabric to act as a messaging and state layer. You can find a lot of scripts to execute large-scale MPI/OpenMP jobs here, and the code to cross-compile them here. Regarding Regarding cartessian communicators, they should work. IIRC we only support 2D communicators, but not 3D. In the If you let me know more about your use case I may be able to help further. Lastly, is it possible that to deploy Faasm on a self-managed K8s cluster (not on AKS) you need to do a bit more work. But it should be relatively straightforward. |
Thanks for all the information. Regarding the MPI, I have changed the implementation as above and the calculation works now with rank so that is okay, I can rework the gatherv and have that also work with some other MPI called FAASM supports. For the customer K8, I was able to deploy it but even the hello.cpp in the d
8000
emo folder will fail with the above pool_runner issue which seems like the hardware I am using isn't working correctly after compiling via the cpp.cli. The CPU is older and has SIMD only in limited support so I was only able to run WAMR worker and upload. Could this be the issue? While I am working on my project, I also have a question. When invoking how do I pass a parameter to my executable? like the size of a matrix or the number of MPI processes I request? I cannot figure that out |
I have an additional question as I am also trying to compile an extended version of the stencil with both OMP and MPI but during compilation, I run into a problem:
I used inv libfaasmp.build to reinstall the function and this CMake to include it based on the omp and mpi CMakes
which is in custom mpi_omp folder and was added to the higher-level cmake add_subdirectory(mpi_omp) |
Regarding your hardware compatibility: that is a very good point. The If you are running in a test CLI environment, after re-building the To pass command line arguments, bot the Lastly, a limitation of Faasm is that we cannot run hybrid (MPI + OMP) programs. The main reason why it currently does not work, related to the error you are seeing, is that to use OMP we need to build the WASM toolchain and sysroot with the |
Thanks for all the help, I will have a look into the MPI+OMP and try to run MPI on the wasm32-wasi-threads. Would you be able to give me some starting guidelines for rebuilding the MPI lib or would it be as simple as changing a variable? I am basically trying to see how this can be used in a current state and do some experiments on it as a part of my Master thesis and would be keen to try to make this work for my 5-point stencil so I can compare it to some less demanding HPW workload like MapReduce. |
Compiling You will see that changing the target is only changing the environment variables that we use in our sysroot. After compilation, you can check if the libraries are where you expect them to be by I expect more complications will arise at runtime, when you try to run the code. |
Aa okay I see, but this is the inv libfaasmpi.build right? That seems to be broken in the cpp.cli throwing
Which seems to be related to the way how the environment variable copied the fabric into itself, I manually put it into the third-party folder but that gives a different issue which I am trying to figure out
since the Cmake text files should be linked correctly but the actual function cannot be retrieved from the main cmake here, am I missing something in the setup why this is happening? Thank you for all the responses, this thread is beneficial for understanding the FAASM better. |
Have you cloned faasm with recursive submodules? |
Yes as in the read me |
Nwm i have fixed that one, one question though as I also wanted to try the faasm python scheduling and followed the quick steps you provided here unfortunately in GCP the kubernetes worker throws [14:42:50] [69] [E] Error executing _start: Exception: failed to call unlinked import function (env, __faasm_get_py_user) with a lot of those warnings. I am not sure if there is a step missing and I am looking in your documentation what could be causing this just had no luck so far. Any Idea why this would happen? The plan is to try to train tensor CNN with the chain and then merge the models to get the final model out. |
Unfortunately, Python support is broken at this point :-( That being said, I would expect a slightly different error than the one you are seeing... |
Yes ran the these commands as provided at the documentation page faasmctl cli.python unfortunately the above is what I get, it was just an attempt to see the difference in performance. |
As I said, Python support is broken. The reason why I did not recognize the error is because, when it worked, it worked only with the WAVM WASM runtime. Now, the default WASM runtime is WAMR. Python is not even supported in WAMR (i.e. not even broken), hence the errors you are seeing. We are interested in resurrecting Python support for Faasm with WAMR, is this something you would be interested in contributing? I could provide some pointers. |
Hey,
I have an issue executing anything on the deployed k8 cluster created on 4 VMs with kubeadm.
As a start I have ran the faasmctl deploy.k8 which created the init file and I tested the endpoints via curl to be sure.
I had to add this to the ymls as I am executing stuff on control-plane and it didn`t want to spin without it
There was no change in env vars in the ymls files, I am running the wamr worker and upload ymls and i can upload and compile the hello.cpp with the cpp.cli container however after running compile and successful upload with 200, the invoke will get to infinit loop on waiting for response as the worker pod will crash with
This seems like something either I did wrong when building this VMs and connecting them as cluster or something I did wrong when deploying the k8 via faasmctl. Does anyone have an idea where I could be messing up?
The VMs are ubuntu VMS with:
OS Image: Ubuntu 22.04.5 LTS
Operating System: linux
Architecture: amd64
cpu: 4
Any help is appreciated, the overall goal is trying to run different HPC workloads on this and see how this performs.
Kind regards
The text was updated successfully, but these errors were encountered: