Short guide for working with Ray

This section describes how to work with Ray. Specifically, we will discuss how to do distributed computing with the Ray library use the C++ API.

Note

Ray is a Python library aiming at facilitating distributed computing of Python programs. It exposes an API that can be used by C++ programs however, at the time of writing, this API is not as mature as the Python counterpart.

Install Ray

Ray is a Python library that exposes a C++ API. Thus, you can use pip in order to install it on your system:

pip install ray

See the official Installing Ray page for more options. However, in our case we want to install Ray alongside its C++ API so install it using:

pip install -U ray[cpp]

Assuming that everything has worked fine, let’s generate a project template. This is what we will be using throughout this tutorial.

ray cpp --generate-bazel-project-template-to ray-template

The command above will create a directory called ray-template with the following structure

BUILD.bazel
example.cc
run.sh
thirdparty
WORKSPACE

Let’s now review the core componets of Ray. You can find more information at Overview of Ray.

Ray core components

Ray consists of a number of libraries. These are:

  • Ray core

  • Ray AIR

  • Ray cluster

Schematically these are shown in the image below

_images/ray_libs.png

Stack of Ray libraries. Image from [1].

Ray core

Ray core is a rather low-level distributed computing framework. Its main key abstractions are

With Ray we can execute arbitrary functions on separate workers. These are called remote functions and asynchronously invocing such a remote function is called a Task.

A Ray task is just a function that is executed asynchronously. A function cannot hold any state and this sometimes may be problematic. If our application requires stateful computing, we will have to use another Ray core abstraction namely an Actor. A Ray Actor is essentially a worker that can hold state. Every time we create a new actor a new worker will be created. The methods of the actor are scheduled on that specific worker and can access and mutate the state of that worker.

See example Using Ray core Tasks, Actors and Objects about how to use these abstractions in C++.

In Ray, tasks and actors create and compute on objects. We refer to these objects as remote objects because they can be stored anywhere in a Ray cluster, and we use object refs to refer to them. Remote objects are cached in Ray’s distributed shared-memory object store, and there is one object store per node in the cluster. In the cluster setting, a remote object can live on one or many nodes, independent of who holds the object ref(s).

References

  1. Overview of Ray