Design

Pulsar implements two layers of components on top of python asyncio module:

Actors

An Actor is the atom of pulsar’s concurrent computation, they do not share state between them, communication is achieved via asynchronous inter-process message passing, implemented using asyncio socket utilities.

Messages are exchanged using single bidirectional connections between any actor and are encoded using the unmasked websocket protocol as the actor messages tutorial highlights. A pulsar actor can be process based as well as thread based and can perform one or many activities.

_images/actors.svg

The theory

The actor model is the cornerstone of the Erlang programming language. Python has very few implementation and all of them seem quite limited in scope.

The Actor model in computer science is a mathematical model of concurrent computation that treats “actors” as the universal primitives of concurrent digital computation: in response to a message that it receives, an actor can make local decisions, create more actors, send more messages, and determine how to respond to the next message received.

—Wikipedia

Actor’s properties

  • Each actor has its own process (not intended as an OS process) and they don’t shares state between them.
  • Actors can change their own states.
  • Actors can create other actors and when they do that they receive back the new actor address.
  • Actors exchange messages in an asynchronous fashion.

Why would one want to use an actor-based system?

  • No shared memory and therefore locking is not required.
  • Race conditions greatly reduced.
  • It greatly simplifies the control flow of a program, each actor has its own process (flow of control).
  • Easy to distribute, across cores, across program boundaries, across machines.
  • It simplifies error handling code.
  • It makes it easier to build fault-tolerant systems.

Implementation

An actor can be processed based (default) or thread based and controls one running event loop. To obtain the actor controlling the current thread:

from pulsar.api import get_actor

actor = get_actor()

When a new processed-based actor is created, a new process is started and the actor takes control of the main thread of that new process. On the other hand, thread-based actors always exist in the master process (the same process as the arbiter) and control threads other than the main thread.

An Actor can control more than one thread if it needs to, via the executor() as explained in the CPU bound paragraph.

Note

Regardless of the type of concurrency, an actor always controls at least one thread, the actor io thread. In the case of process-based actors this thread is the main thread of the actor process.

An actor is a async object and therefore it has a _loop attribute, which can be used to register handlers on file descriptors. The Actor._loop is created just after forking (or after the actor’s thread starts for thread-based actors).

IO-bound

The most common usage for an Actor is to handle Input/Output events on file descriptors. An Actor._loop tells the operating system (through epoll or select) that it should be notified when a new connection is made, and then it goes to sleep. Serving the new request should occur as fast as possible so that other connections can be served simultaneously.

CPU-bound

Another way for an actor to function is to use its executor() to perform CPU intensive operations, such as calculations, data manipulation or whatever you need them to do.

Periodic task

Each Actor, including the Arbiter and Monitors, performs one crucial periodic task at given intervals. The periodic task interval in controlled by the timeout setting parameter (which by default is 30 seconds). The periodic timeout is maintained between 3 and 60 seconds.

Periodic task are implemented by the periodic_task() method.

The Arbiter

When using pulsar actor layer, you need to use pulsar in server state, that is to say, there will be a centralised Arbiter controlling the main event loop in the main thread of the master process. The arbiter is a specialised Actor which control the life of all Actor and monitors

To access the arbiter, from the main process, one can use the arbiter() high level function:

from pulsar.api import arbiter
arb = arbiter()
arb.is_running()  //  True/False

Application Framework

To aid the development of applications running on top of pulsar concurrent framework, the library ships with the Application class. Applications can be of any sorts or forms and the library is shipped with several battery included examples in the pulsar.apps module.

When an Application is called for the first time, a new monitor is added to the arbiter, ready to perform its duties.

Monitors

Monitors are specialised actors which share the arbiter event loop and therefore live in the main thread of the master process of your application.

It is possible to configure pulsar so that the arbiter delegates the management of some actors to monitors. The application layer is designed specifically to obtain such delegation in a straightforward way with an efficient and elegant API.

_images/monitors.svg

Internals

Spawning

Spawning a new actor is achieved via the spawn() function:

from pulsar.api import spawn

def start(actor, exc=None):
    # called once the actor is running
    ...

def task(actor, exc=None):
    # do something useful here
    ...

ap = spawn(start=start, periodic_task=task)

The value returned by spawn() is a Future, which resolves in an ActorProxy, a lightweight proxy for the remote actor, once the remote actor has started.

When spawning from an actor other than the arbiter, the workflow of the spawn() function is as follow:

  • send() a message to the arbiter to spawn a new actor.
  • The arbiter spawn the actor and wait for the actor’s handshake. Once the hand shake is done, it sends the response (the ActorProxy of the spawned actor) to the original actor.

Handshake

The actor hand-shake is the mechanism with which an Actor register its mailbox address with its manager. The actor manager is either a Monitor or the arbiter depending on which spawned the actor.

The handshake occurs when the monitor receives, for the first time, the actor notify message.

For the curious, the handshake is responsible for setting the ActorProxyMonitor.mailbox attribute.

If the hand-shake fails, the spawned actor will eventually stop.

Hooks

An Actor exposes three one time events which can be used to customise its behaviour and two many times event used when accessing actor information and when the actor spawn other actors. Hooks are passed as key-valued parameters to the spawn() function.

start

Fired just after the actor has received the hand-shake from its monitor. This hook can be used to setup the application and register event handlers. For example, the socket server application creates the server and register its file descriptor with the Actor._loop.

This snippet spawns a new actor which starts an Echo server:

from pulsar.api import spawn, TcpServer

class Echo:

    def __init__(self, address):
        self.address = address

    def __call__(self, actor, exc=None):
        actor._loop.create_task(self.create_echo_server())

    async def create_echo_server(address, actor, _):
        """Starts an echo server on a newly spawn actor
        """
        server = TcpServer(EchoServerProtocol)
        await server.start_serving(address=self.address)
        actor.servers['echo'] = server
        actor.extra['echo-address'] = server.address


proxy = await spawn(start=Echo(('localhost', 9898)))

The EchoServerProtocol is introduced in the echo server and client tutorial.

stopping

Fired when the Actor starts stopping.

periodic_task

Fired at every actor periodic task. This hook is best used for internal sanity/health checks of the actor or services the actor is performing.

on_info

Fired every time the actor status information is accessed via the info command:

def extra_info(actor, info=None):
    info['message'] = 'Hello'

proxy = spawn(on_info=extra_info)

The hook must accept the actor as first parameter and the key-valued parameter info (a dictionary).

on_params

Fired every time an actor is about to spawn another actor. It can be used to add additional key-valued parameters passed to the spawn() function.

Commands

An Actor communicates with another remote Actor by sending an action to perform. This action takes the form of a command name and optional positional and key-valued parameters. It is possible to add new commands via the command decorator as explained in the api documentation.

ping

Ping the remote actor abcd and receive an asynchronous pong:

send('abcd', 'ping')

echo

received an asynchronous echo from a remote actor abcd:

send('abcd', 'echo', 'Hello!')

info

Request information about a remote actor abcd:

send('abcd', 'info')

The asynchronous result will be called back with the dictionary returned by the Actor.info() method.

notify

This message is used periodically by actors, to notify their manager. If an actor fails to notify itself on a regular basis, its manager will shut it down. The first notify message is sent to the manager as soon as the actor is up and running so that the handshake can occur.

run

Run a function on a remote actor. The function must accept actor as its initial parameter:

def dosomething(actor, *args, **kwargs):
    ...

send('monitor', 'run', dosomething, *args, **kwargs)

stop

Tell the remote actor abc to gracefully shutdown:

send('abc', 'stop')

Exceptions

There are two categories of exceptions in Python: those that derive from the Exception class and those that derive from BaseException. Exceptions deriving from Exception will generally be caught and handled appropriately; for example, they will be passed through by a Future, and they will be logged and ignored when they occur in a callback.

However, exceptions deriving only from BaseException are never caught, and will usually cause the program to terminate with a traceback. (Examples of this category include KeyboardInterrupt and SystemExit; it is usually unwise to treat these the same as most other exceptions.)

Async Objects

An asynchronous object is any instance which exposes the _loop attribute. This attribute is the event loop where the instance performs its asynchronous operations, whatever they may be.

For example this is a class for valid async objects:

from asyncio import get_event_loop, new_event_loop


class SimpleAsyncObject:

    def __init__(self, loop=None):
        self._loop = loop or get_event_loop() or new_event_loop()

Properties:

  • Several classes in pulsar are async objects, for example: Actor, Connection, ProtocolConsumer, Store and so forth
  • A Future is an async object
  • However an async object is not necessarily a Future
  • When they use and task() decorators for their methods, _loop attribute is used to run the method
  • Pulsar provides the AsyncObject signature class, however it is not a requirement to derive from it

Note

An async object can also run its asynchronous methods in a synchronous fashion. To do that, one should pass a bright new event loop during initialisation. Check synchronous components for further details.