8000 rstudio in kubernetes - runAsNonRoot context · Issue #888 · rocker-org/rocker-versioned2 · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

rstudio in kubernetes - runAsNonRoot context #888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
radhupr opened this issue Dec 16, 2024 · 17 comments
Open

rstudio in kubernetes - runAsNonRoot context #888

radhupr opened this issue Dec 16, 2024 · 17 comments
Labels
pre-built images Related to pre-built images question

Comments

@radhupr
Copy link
radhupr commented Dec 16, 2024

Container image name

rocker/rstudio:4.4.2

Container image digest

No response

What operating system are you seeing the problem on?

Linux

System information

Kubernetes cluster 1.30
Docker image : rocker/rstudio:4.4.2

Bug description

Hi Team,
I want to run rstudio server (free version) on kubernetes. If I'm taking wrong approach here, please guide me on how to do the setup in kubernetes.
I'm using the image rocker/rstudio:4.4.2 and trying to run it as nonRoot user. (same noted with image rocker/tidyverse:4.4.2)
The pod spec is as follows

spec:
      securityContext:
        runAsNonRoot: true
        runAsUser: 1001
        runAsGroup: 1001
        seccompProfile:
          type: RuntimeDefault
      containers:
      - name : rstudio
        image:  rocker/tidyverse:4.4.2
        env:
          - name: USERID
            value: "1001"
          - name: GROUPID
            value: "1001"
        securityContext:
          allowPrivilegeEscalation: false
        resources:  
          requests:
            memory: "200Mi"  
            cpu: "3000m"     
          limits:
            memory: "5000Mi"

The container is failing to start with below error
s6-overlay-preinit: fatal: unable to mkdir /var/run/s6: Permission denied

Reference to discussion forum on same issue: https://forum.posit.co/t/rstudio-server-in-kubernetes/195626/4

Can you help in addressing the issue.

How to reproduce this bug?

Run a pod with above mentioned spec. The container fail to startup.
@radhupr radhupr added the bug Something isn't working label Dec 16, 2024
@eitsupi eitsupi added question and removed bug Something isn't working labels Dec 16, 2024
@nathanweeks
Copy link
nathanweeks commented Dec 21, 2024

The approach described in the Rocker Singularity guide, which calls rserver directly, could be adapted to run RStudio Server on Kubernetes with a non-root user.

Minimal example using a Pod (though StatefulSet would probably be a better choice), disregarding Ingress (or Gateway, etc.), persistent volume for /home/rstudio, and storing the password in a Secret (assuming authentication isn't handled at the Ingress layer):

apiVersion: v1
kind: Pod
metadata:
  name: rstudio
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 1000
  containers:
    - name: rstudio
      image: ghcr.io/rocker-org/rstudio:4.4.2
      ports:
        - containerPort: 8787
      env:
        - name: USER
          value: rstudio
        - name: PASSWORD
          value: my-password
      volumeMounts:
        - name: rstudio-home
          mountPath: /home/rstudio
        - name: run
          mountPath: /run
        - name: var-lib-rstudio-server
          mountPath: /var/lib/rstudio-server
      securityContext:
        allowPrivilegeEscalation: false
      resources:  
        requests:
          memory: "200Mi"  
          cpu: "3000m"     
        limits:
          memory: "5000Mi"
      command: ["rserver", "--auth-none=0", "--auth-pam-helper-path=pam-helper", 
                           "--auth-stay-signed-in-days=30", "--auth-timeout-minutes=0",
                           "--server-user=rstudi
8000
o"]
  volumes:
    - name: rstudio-home
      emptyDir: {}
    - name: run
      emptyDir: {}
    - name: var-lib-rstudio-server
      emptyDir: {}

Example: create the pod in the default namespace, and use port-forwarding to access:

kubectl apply -f rstudio.yaml
kubectl port-forward rstudio 8787
... point your web browser to http://localhost:8787, and log in with user "rstudio" and password "my-password" ...

@radhupr
Copy link
Author
radhupr commented Jan 19, 2025

@nathanweeks Is it not possible to have persistent volumes in the setup? When I try to mount /home with persistent volume, it doesnt go through.

@benz0li
Copy link
Contributor
benz0li commented Jan 20, 2025

@radhupr Check out Zero to JupyterHub with Kubernetes + Authentication + JupyterLab R docker stack1.
ℹ This allows you to mount the home directory (/home/<username>) on a per-user basis.

Reference deployment using Docker Swarm + GitHub OAuth + JupyterLab docker stacks: https://demo.jupyter.b-data.ch

Footnotes

  1. Maybe the binder image of this repository works, too.

@nathanweeks
Copy link

@nathanweeks Is it not possible to have persistent volumes in the setup? When I try to mount /home with persistent volume, it doesnt go through.

Using a PV for /home should work in this case assuming /home/rstudio exists & is writable by the rstudio user when the "rstudio" container starts. But if the goal is a multi-user setup as described in #893, then @benz0li 's suggestion looks like it could be a more automated solution (though I'm not sure if/how it handles non-root containers, if that is a security requirement)?

On a somewhat-related note: it's possible for users to create on-demand non-root rstudio server containers with persistent-volume-backed home directories on an OpenShift cluster (example referenced here: #747 (comment)).

@benz0li
Copy link
Contributor
benz0li commented Jan 20, 2025

though I'm not sure if/how it handles non-root containers

b-data's/my JupyterLab docker stack containers – like the original Jupyter docker stacks ones – run as non-root (uid=1000(jovyan) gid=100(users)) by default.

@eitsupi eitsupi added the pre-built images Related to pre-built images label Jan 25, 2025
@perllaghu
Copy link

We have RStudio running in a kubernetes cluster, along-side other jupyter-based images - in our case, it's a our own image derived from rocker/binder

We renamed /home/rstudio to /home/jovyan, and linked /home/studio to /home/jovyan

We had some fun & games getting things to route through our various proxies..... but it was mostly a case of reading docs & not trying 3 changes at once :)

@cboettig
Copy link
Member

Thanks @perllaghu ! The images in https://github.com/rocker-org/ml now take this strategy as well. This is similar to the approach in rocker/binder but inherits directly from jupyter docker stack images.

@radhupr
Copy link
Author
radhupr commented Feb 23, 2025

@perllaghu Could you give some information about yur setup in Kubernetes?
I'm using helm from https://github.com/jupyterhub/zero-to-jupyterhub-k8s/ and changed the user image in values file as below to get what we wanted.

singleuser:
  image:
    name: glcr.b-data.ch/jupyterlab/r/tidyverse
    tag: "4.4.2"

But I have issues while processing big chunk of data in User's R studio session. The cpu usage goes very high and the session crashes. How have you been configuring or addressing file upload and big data processing in your setup?

@benz0li
Copy link
Contributor
benz0li commented Feb 23, 2025

I'm using helm from https://github.com/jupyterhub/zero-to-jupyterhub-k8s/ and changed the user image in values file as below to get what we wanted.

singleuser:
  image:
    name: glcr.b-data.ch/jupyterlab/r/tidyverse
    tag: "4.4.2"

@radhupr I assume you are using

singleuser:
  cmd: start-singleuser.sh

in addition for glcr.b-data.ch/jupyterlab/r/tidyverse.

But I have issues while processing big chunk of data in User's R studio session. The cpu usage goes very high and the session crashes. How have you been configuring or addressing file upload and big data processing in your setup?

Regarding file upload: There is no limitation in JupyterLab or RStudio.
ℹ If you are using ingress-nginx: Increase the proxy-body-size.

Regarding big data processing: Kubernetes may set some resource limits.
ℹ Check with prlimit. Sample output on a fairly unlimited system:

RESOURCE   DESCRIPTION                             SOFT      HARD UNITS
AS         address space limit                unlimited unlimited bytes
CORE       max core file size                         0         0 bytes
CPU        CPU time                           unlimited unlimited seconds
DATA       max data size                      unlimited unlimited bytes
FSIZE      max file size                      unlimited unlimited bytes
LOCKS      max number of file locks held      unlimited unlimited locks
MEMLOCK    max locked-in-memory address space   8388608   8388608 bytes
MSGQUEUE   max bytes in POSIX mqueues            819200    819200 bytes
NICE       max nice prio allowed to raise             0         0 
NOFILE     max number of open files             1048576   1048576 files
NPROC      max number of processes            unlimited unlimited processes
RSS        max resident set size              unlimited unlimited bytes
RTPRIO     max real-time priority                     0         0 
RTTIME     timeout for real-time tasks        unlimited unlimited microsecs
SIGPENDING max number of pending signals        1029510   1029510 signals
STACK      max stack size                       8388608 unlimited bytes

@radhupr
Copy link
Author
radhupr commented Feb 23, 2025

I wasnt setting the cmd for singleuser. I will set it and test.
The fileupload is wokring. But its take a lot of time to download files from any remote url :(

Data processing :
The pod get cpu throttled if I set resource limits. But I tried to remove cpu limits and it consumes all of the cpu in the running server and the server crashes. I was wondering if there is any better solution for handling big data processing with this solution

@cboettig
Copy link
Member

@radhupr are you setting memory limits? cpu use by itself rarely crashes a server, but can be correlated with RAM use.

@benz0li
Copy link
Contributor
benz0li commented Feb 23, 2025

I wasnt setting the cmd for singleuser. I will set it and test. The fileupload is wokring.

@radhupr Setting singleuser.cmd: start-singleuser.sh ensures that the startup scripts are run.
ℹ For more information, see JupyterLab R docker stack: Notes > Tweaks.

Data processing : The pod get cpu throttled if I set resource limits. But I tried to remove cpu limits and it consumes all of the cpu in the running server and the server crashes. I was wondering if there is any better solution for handling big data processing with this solution

Use something like

singleuser:
  cpu:
    limit: 2
    guarantee: 0.1
  memory:
    limit: 8G

to limit to 2 cores and 8 GB RAM. Adapt limits to your needs.


Default values:

singleuser:
  cpu:
    limit:
    guarantee:
  memory:
    limit:
    guarantee: 1G

https://github.com/jupyterhub/zero-to-jupyterhub-k8s/blob/d31863b6cf4f8674262690134de1e17132b8fc6a/jupyterhub/values.yaml#L426-L431

This means no CPU limit and no RAM limit. And @cboettig is right: If a pod uses up all RAM, the server (host) is at risk.

@radhupr
Copy link
Author
radhupr commented Mar 6, 2025

@benz0li Thanks for the response. The issue was indeed the limit setup. I had set it under wrong indentation. :|
BTW was there any solution / suggestion to import data from external sources eg: Azure blob? I could use some R script to download files now. But It would be easier to have a mount solution or other transfer method for data import.

@benz0li
Copy link
Contributor
benz0li commented Mar 6, 2025

But It would be easier to have a mount solution or other transfer method for data import.

@radhupr Have a look at the AzureR package family or Rclone.

@radhupr
Copy link
Author
radhupr commented Mar 25, 2025

@benz0li Will you be able to help to get a quick answer to this

@radhupr
Copy link
Author
radhupr commented Apr 13, 2025

@benz0li It was temporary issue I guess. I got it working now. Thanks for the references.
We are about to take this into our use in production. Should I be concerned about the future maintenance and updates for this chart - Do you have any other adopters of this already?

@benz0li
Copy link
Contributor
benz0li commented Apr 14, 2025

Should I be concerned about the future maintenance and updates for this chart - Do you have any other adopters of this already?

@radhupr Off-topic here. Ask over at https://github.com/orgs/b-data/discussions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pre-built images Related to pre-built images question
Projects
None yet
Development

No branches or pull requests

6 participants
0