TUTORIAL: PyPy Sandboxing, and MobileCode Execution

Setting Up PyPy

Thankfully, this is much easier this year. It should only require two simple steps:

  1. apt-get install pypy
  2. apt-get install python-pypy.sandbox
  3. ln -s /usr/lib/python2.7/dist-packages/pypy/ /usr/lib/pypy/dist-packages/

On the class VM’s, this is all you’ll have to do.

However, if you’re running this on your own local machine, you may need additional steps. The versions of pypy and python-pypy.sandbox on the virtual machines are both 2.2.1. On my personal virtual machine, they are 2.6. For this version I had to do the additional step of:

  1. ln -s /usr/lib/pypy-sandbox/x86_64-linux-gnu/pypy-c-sandbox /usr/lib/pypy-sandbox/pypy-c-sandbox

If you run into trouble with a personal version on a personal machine, post your question to Piazza.

To test if everything is setup correctly, execute the following commands:

cd <playground-install>/src/playground/sandbox/pypy
pypy pypy_interact_x.py /usr/lib/pypy-sandbox/pypy-c-sandbox

You don’t have to run it from the playground/sandbox/pypy directory; doing it this way was just to make the command-line simpler to parse. Speaking of which, let’s break it down:

  • pypy – This is the pypy interpreter
  • playground/sandbox/pypy/pypy_interact_x.py – I wrote this slightly modified version of the default pypy_interact.py script for convenience
  • /usr/lib/pypy-sandbox/pypy-c-sandbox – A special interpreter that performs the sandboxing functions

When you run the sandbox in this way, you get what appears to be a normal python prompt (ignore ‘import site failed’). To take a look at your sandboxed environment, import os and run a couple commands like “os.listdir(‘/’)” and so forth. Check out the tmp directory and you’ll notice it does not have the same contents as your real tmp directory. Right now, this tmp directory is completely virtual. It does not exist on your file system at all.

You can re-run this shell with a real tmp directory using the following command line:

pypy pypy_interact_x.py --tmp=/tmp /usr/lib/pypy-sandbox/pypy-c-sandbox

This tells the sandbox to use the real /tmp as the tmp directory in the virtual filesystem.

Moreover, you can also execute a script in this sandbox. To do so, you first need to create the script, copy it into the tmp directory, then tell the modified interpreter to run it from there. For example:

cp script.py /tmp/script.py
pypy pypy_interact_x.py --tmp=/tmp /usr/lib/pypy-sandbox/pypy-c-sandbox /tmp/script.py

The reason you have to copy it to the sandbox first is because the interpreter is going to run it from the virtual filesystem. When you specify /tmp/script.py as an argument, that is loaded from the virtual /tmp directory, not the real one.

If this all sounds very complicated, the good news is you don’t have to actually run any of this manually. It is all launched by playground for remote code execution when so specified. And that is the next topic of this tutorial.

Configuring Pypy For Mobile Code Execution

I’ve added a new mobile code handler that takes an incoming code string, writes it out to the tmp directory, and executes it using pypy. It reads results out of the standard output for transmitting back to the sender. ParallelTSP and the MobileCodeServer have already been modified to support it. Your system will work correctly out of the box.

So what do you need to do? Everyone should take a look at the configuration of the various mobile codehandlers and make sure they don’t want to make changes. Here is the relevent code:

  • Class RunMobileCodeHandler in playground/network/client/ClientMessageHandlers.py
  • Class SandboxCodeRunner in playground/sandbox/pypy/SandboxCodeRunner.py
  • Class Server in apps/mobilecodeservice/Server.py

First, take a look at the constructor of RunMobileCodeHandler. You’ll notice that it takes a kargs-style dictionary called  “executionHandlers”. This dictionary is copied into the local data structure. If no handler is present for the mechanism specified by “serialized” it assigns one (self.__serializedCodeHandler). If you look at the method __runCodeThrowExceptions, you’ll notice that it looks for a handler based on a specific “mechanism” string. If it can’t find one, and strict is disabled, it loads a handler under the key “__default__”. If no such handler is found, it returns the default handler (self.__defaultCodeHandler).

In summary, RunMobileCodeHandler is configured with a dictionary of handlers based on a mechanism key. If that key is found, the specified handler is used, otherwise it falls back to a specified default handler or, if no default handler is specified, an internal default handler. If you look in the __call__ method, you’ll see that the mechanism string is specified by the incoming network packet.

Notice that none of the handlers were designed for the sandbox. That handler is the class SandboxCodeRunner. If you look at that code, you’ll see how it creates the script, loads it into the pypy sandbox, and executes it.

So how do we specify that we want to use SandboxCodeRunner as a handler for code?

Turn to the mobile code server (Server.py in apps/mobilecodeservice) and look at line 265. It should look like this:

runMobileCodeHandler = RunMobileCodeHandler(self, sandbox=SandboxCodeRunner())

Remember that the RunMobileCodeHandler takes a kargs style dictionary, so you simply specify k=v pairs within the constructor. So this call here would create a dictionary with the key “sandbox” set to SandboxCodeRunner. You could modify this constructor to specify the default handler (__default__), and so forth.

Keep in mind that the mechanism string is specified by the code sender. So if you look in BasicMobileCodeClient, you’ll see the following lines:

request = MessageData.GetMessageBuilder(definitions.playground.base.RunMobileCode)
request["ID"].setData(state.execId)
request["pythonCode"].setData(state.mobileCodeString)
request["mechanism"].setData("sandbox")

Thus, ParallelTSP sets the mechanism to “sandbox”. When this packet arrives at your mobile code server, this mechanism will trigger the SandboxCodeRunner to execute its contents.

Extending the Sandbox

The out-of-the-box sandbox ONLY allows reading. Nothing can be written to disk, not even in the tmp directory. What if you wanted to extend it?

The class ExtensibleSandboxedProc in the playground/sandbox/pypy/extensible_sandbox.py allows you to overwrite the handling mechanism for every syscall that the system intercepts. The file is currently empty, except for calls to the super-methods. Most of these methods are implemented in the base class SimpleIOSandboxedProc, but some are in VirtualizedSandboxedProc. Both of these files are default to pypy and can be found in /usr/lib/python2.7/dist-packages/pypy/sandbox/rpython/translator/sandbox/sandlib.py.

If you browse through this file, you’ll notice that both read and write are in SimpleIOSandboxedProc, but that open is handled by VirtualizedSandboxedProc. You’ll want to look at the open method to see how file  descriptors are stored (and mapped to real files), and then how files are read from in read. Take a look at the write function (do_ll_os__ll_os_write), and notice that it allows writing to files with file descriptor 1 or 2. What do you think that means? Notice that all other writes are not allowed. If you want to enable writing, you’ll have to figure out how to handle these instructions.

Remember, you need not, and should not, go modifying the default pypy-sandbox implementation in this file. Instead, you can override the methods in the subclass provided as part of Playground.

Leave a Reply

Your email address will not be published. Required fields are marked *