Parallel Execution
PX makes distributed computing feel like writing standard UNIX-style programs. You write code that reads from stdin or files and writes to stdout or files. PX handles the rest: spinning up cloud nodes, partitioning inputs, and running your code in parallel.
How PX Partitions Work
By default, PX assumes your input data is partitionable and automatically partitions it using a simple round-robin algorithm based on either input lines or input files.
Example: Processing 60,000 Images
Let's say you have:
- 60,000 image files to process
- 2 cloud nodes
- 4 CPUs per node
- Total: 8 available cores
PX will:
- Spin up 8 copies of your Python process (one per core)
- Distribute 4 processes per node
- Feed 7,500 files to each process
This is similar in principle to how GNU parallel works locally, but scaled across cloud infrastructure.
Process Parallelism
The default is 1:1 process parallelism per core. However, you can control job parallelism with the -p parameter:
bash
px run -p 16 'python process.py'For I/O-bound workloads, it often makes sense to double or triple the number of processes beyond the core count to take advantage of I/O interleaving.
Job Scheduling
PX uses a simple but effective job queue:
- The "head node" receives job submissions from clients
- Each job is instantly scheduled to the cluster
- If another job is running, new jobs enter a FIFO queue
- Jobs push input data partitions directly to running processes over the cluster LAN
Pleasingly Parallel Workloads
PX works best with "pleasingly parallel" (sometimes called "embarrassingly parallel") workloads:
- Processes that read stdin and produce stdout without network calls
- Processes that read files as inputs and produce different files as outputs
- Tasks that don't require coordination between parallel instances
What Makes Code Pleasingly Parallel?
A Linux process is naturally parallel and idempotent when it:
- Processes independent inputs (files, lines, records)
- Produces independent outputs
- Doesn't talk to external systems (databases, KV stores, APIs)
- Doesn't maintain shared state
Understanding Idempotency
When running distributed workloads, idempotency matters. Your code should produce the same result whether it runs once or multiple times on the same input.
Issues arise when programs:
- Talk to databases or KV stores
- Call external APIs
- Maintain shared state across instances
- Use non-deterministic operations
PX is designed for workloads where each parallel instance operates independently on its partition of the data.