Parallel Execution

PX makes distributed computing feel like writing standard UNIX-style programs. You write code that reads from stdin or files and writes to stdout or files. PX handles the rest: spinning up cloud nodes, partitioning inputs, and running your code in parallel.

How PX Partitions Work

By default, PX assumes your input data is partitionable and automatically partitions it using a simple round-robin algorithm based on either input lines or input files.

Example: Processing 60,000 Images

Let's say you have:

60,000 image files to process
2 cloud nodes
4 CPUs per node
Total: 8 available cores

PX will:

Spin up 8 copies of your Python process (one per core)
Distribute 4 processes per node
Feed 7,500 files to each process

This is similar in principle to how GNU parallel works locally, but scaled across cloud infrastructure.

Process Parallelism

The default is 1:1 process parallelism per core. However, you can control job parallelism with the -p parameter:

bash

px run -p 16 'python process.py'

For I/O-bound workloads, it often makes sense to double or triple the number of processes beyond the core count to take advantage of I/O interleaving.

Job Scheduling

PX uses a simple but effective job queue:

The "head node" receives job submissions from clients
Each job is instantly scheduled to the cluster
If another job is running, new jobs enter a FIFO queue
Jobs push input data partitions directly to running processes over the cluster LAN

Pleasingly Parallel Workloads

PX works best with "pleasingly parallel" (sometimes called "embarrassingly parallel") workloads:

Processes that read stdin and produce stdout without network calls
Processes that read files as inputs and produce different files as outputs
Tasks that don't require coordination between parallel instances

What Makes Code Pleasingly Parallel?

A Linux process is naturally parallel and idempotent when it:

Processes independent inputs (files, lines, records)
Produces independent outputs
Doesn't talk to external systems (databases, KV stores, APIs)
Doesn't maintain shared state

Understanding Idempotency

When running distributed workloads, idempotency matters. Your code should produce the same result whether it runs once or multiple times on the same input.

Issues arise when programs:

Talk to databases or KV stores
Call external APIs
Maintain shared state across instances
Use non-deterministic operations

PX is designed for workloads where each parallel instance operates independently on its partition of the data.

Parallel Execution ​

How PX Partitions Work ​

Example: Processing 60,000 Images ​

Process Parallelism ​

Job Scheduling ​

Pleasingly Parallel Workloads ​

What Makes Code Pleasingly Parallel? ​

Understanding Idempotency ​