Welcome to rpyc_docker’s documentation!

Rpyc_docker uses the python module Rpyc to control docker containers, that means you can run multiple instances of docker containers and control them externally and transparently using rpyc.

It can be used for other things, but right now it is primarily designed to run multiple selenium headless browsers on a single server or vps. The advantage of this approach is each browser is self contained in its own docker virtual machine, this makes it easier to customize the browser. Also if one of the browsers crashes, since it is isolated in its own virual machine it will not bring down the entire grid.

Since rpyc works over network protocols, it would be possible to have one server controlling multiple headless browsers on other machines. This feature has not been added in yet. If I have a need to do this, I might do this in the future.

Requirements

Setup

  1. Download rpyc_docker from its repository . To download click the buttom marked Download Zip.
  2. Unpack the archive.
  3. From the command line switch to the directory docker_files/rpyc_docker
  4. Built the docker image with the command “docker build -t=”rpyc_docker” .” If you wish you can create your own custom docker image.
  5. Install the package with “pip2 install .” You could also use “python setup.py install” but it is better to use pip.

Examples

All of the examples are in ipython notebooks, that way you can experiment with the examples and modify them.

  1. Using the Browser Object - how to setup and use browser object stand alone.
  2. Using a single Browser Object running inside a docker container to scrape duckduckgo and python subreddit. BrowserRpycWorker.
  3. Using a Manager to run a series of requests to query duckerduckgo and fetch the headlines off a python subreddit simultaniously. Each job will run in its own isolated docker container. Example using manager to run multiple dockers at once

Contents:

Indices and tables