-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to best clean up artifacts? #54
Comments
This looks like a practical solution to the problem, One question, though:
would a new directory be created with each run request? If so, you would
not be able to first create a program to write a file with a particular
structure and then a program that attempts to read that file. I guess some
kind of session management , if that is not already in place, is required.
Regards,
Arjen
Op do 22 sep. 2022 om 19:55 schreef Milan Curcic ***@***.***>:
… When the client makes a request to the playground server, the server
copies two files to the Docker container: the program source (main.f90) and
input data (input.txt or similar).
Deleting these files is easy because they're tractable by the Python
function that handles the request.
However, how to best handle the artifacts that can be created by calling
execute_command_line or writing data to a file via open() and write()
statements? These could be written anywhere in the user-writable part of
the container (/home/fortran).
Worse, a creative user could overwrite existing files in the container
that are necessary for fpm on the container to work.
A proposed solution that came up on GSoC calls for this project goes along
the lines of:
1. Create a uniquely named directory (e.g. using uuid.uuid4() and
place all needed artifacts (e.g. fpm, gfortran, shared libs) or their
symlinks in that directory.
2. Run the program in the container in that unique directory under
chroot and return the result. This will prevent the programs from
creating files outside of the directory.
3. Delete the directory when done (this part can be delegated to a
separate thread so that we can return the response to the user immediately).
What do you think?
—
Reply to this email directly, view it on GitHub
<#54>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAN6YRYY6HUXBQCLVKLU7C3V7SMRZANCNFSM6AAAAAAQTJHZVQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
How are multiple requests handled? They all access the same container? |
I think there are multiple instances, but once you create files you will be able to find them again after a few tries in one of the instances. |
Each server worker (think Python process, there are 3 currently) spins up a Docker container. For each incoming request, a server worker opens a new thread in which the request is handled. The threads share memory, so multiple requests that are incoming through the same worker use the same Docker container. In the initial implementation, @ashirrwad had each request spin up a new container to handle the request. However, this adds a few seconds of overhead to each request, so we pivoted to having one running container (per worker) that is being reused. Of course, this approach requires hardening because the container is stateful between multiple requests. |
I wonder if we end up needing |
I'm surprised that it's not one container instance per request. When I use Docker [on Linux] the containers start almost instantly, (<1sec). Also, if there are three persistent containers, then does that mean that no more than three concurrent requests can be sent in a short period of time? In addition to avoiding this persistence/cleanup problem, another advantage of doing one container per request is that you can limit CPU and memory usage for that container to ensure that it doesn't impact overall system performance (potential for a denial-of-service attack). |
I am surprised too, but I also suspect docker can be slow, especially on some low performance (free) VM in AWS. To scale this well is difficult, I think that's a full time job for a long time, to ensure security, performance, maintainability, it's probably also quite expensive, with multiple workers etc. But we don't need all that to get started. Yes, it seems using directories is probably fine for now to get started. |
Good question and I don't know. I haven't made a hard measurement and the preliminary conclusion about it being slow was based on local development environment. It would be certainly easy to try again, it's a matter of instantiating the container within request handling function vs. the top-level (module) scope as it is now. As I understand it, Gunicorn (our server) can serve many more requests than it has workers because they offload the work to threads. But, with the current approach we may have an issue of there being conflicts on the Docker container if there are too many users at once. Ashirwad doesn't think this happens because of how WSGI works, he may be correct, but I don't understand it well enough to tell for sure. Brad suggested on an early GSoC call to try plain |
When the client makes a request to the playground server, the server copies two files to the Docker container: the program source (main.f90) and input data (input.txt or similar).
Deleting these files is easy because they're tractable by the Python function that handles the request.
However, how to best handle the artifacts that can be created by calling
execute_command_line
or writing data to a file viaopen()
andwrite()
statements? These could be written anywhere in the user-writable part of the container (/home/fortran
).Worse, a creative user could overwrite existing files in the container that are necessary for
fpm
on the container to work.A proposed solution that came up on GSoC calls for this project goes along the lines of:
uuid.uuid4()
and place all needed artifacts (e.g. fpm, gfortran, shared libs) or their symlinks in that directory.chroot
and return the result. This will prevent the programs from creating files outside of the directory.What do you think?
The text was updated successfully, but these errors were encountered: