Skip to content

Parallel Execution

PX makes distributed computing feel like writing standard UNIX-style programs. You write code that reads from stdin or files and writes to stdout or files. PX handles the rest: spinning up cloud nodes, partitioning inputs, and running your code in parallel.

How PX Partitions Work

By default, PX assumes your input data is partitionable and automatically partitions it using a simple round-robin algorithm based on either input lines or input files.

Example: Processing 60,000 Images

Let's say you have:

  • 60,000 image files to process
  • 2 cloud nodes
  • 4 CPUs per node
  • Total: 8 available cores

PX will:

  1. Spin up 8 copies of your Python process (one per core)
  2. Distribute 4 processes per node
  3. Feed 7,500 files to each process

This is similar in principle to how GNU parallel works locally, but scaled across cloud infrastructure.

Process Parallelism

The default is 1:1 process parallelism per core. However, you can control job parallelism with the -p parameter:

bash
px run -p 16 'python process.py'

For I/O-bound workloads, it often makes sense to double or triple the number of processes beyond the core count to take advantage of I/O interleaving.

Job Scheduling

PX uses a simple but effective job queue:

  • The "head node" receives job submissions from clients
  • Each job is instantly scheduled to the cluster
  • If another job is running, new jobs enter a FIFO queue
  • Jobs push input data partitions directly to running processes over the cluster LAN

Pleasingly Parallel Workloads

PX works best with "pleasingly parallel" (sometimes called "embarrassingly parallel") workloads:

  • Processes that read stdin and produce stdout without network calls
  • Processes that read files as inputs and produce different files as outputs
  • Tasks that don't require coordination between parallel instances

What Makes Code Pleasingly Parallel?

A Linux process is naturally parallel and idempotent when it:

  • Processes independent inputs (files, lines, records)
  • Produces independent outputs
  • Doesn't talk to external systems (databases, KV stores, APIs)
  • Doesn't maintain shared state

Understanding Idempotency

When running distributed workloads, idempotency matters. Your code should produce the same result whether it runs once or multiple times on the same input.

Issues arise when programs:

  • Talk to databases or KV stores
  • Call external APIs
  • Maintain shared state across instances
  • Use non-deterministic operations

PX is designed for workloads where each parallel instance operates independently on its partition of the data.