Quick Start
Get up and running with PX in under 5 minutes. This guide will walk you through installing PX, setting up your cloud credentials, and running your first PX job.
Prerequisites
- Python 3.10 or higher (installable via
uv) - A cloud account (GCP supported during private beta, with AWS, Azure, and DigitalOcean coming soon)
- Basic familiarity with the command line
Step 1: Install PX
Install PX using uv tool:
bash
uv tool install --from https://px.app/releases/nightly/c3b43af93c7083fbb6acedb0e8b40c152f429705/px.tar.gz pxWhy use uv?
During our private beta period, we're managing PX CLI releases manually and leveraging uv tool to pull in the third-party dependencies that the px tool needs. For more information on uv, see the official uv documentation.
Verify the installation:
bash
px --versionStep 2: Authenticate with Your Cloud
PX needs access to your cloud provider to create and manage compute resources.
bash
px cloud loginYour browser will open automatically to authenticate with GCP:
Authenticating with all cloud providers...
Authenticating with gcp...
Your browser has been opened to visit:
https://accounts.google.com/o/oauth2/auth?...Follow the prompts to authenticate. PX will securely store your credentials for future use.
Where are credentials stored?
Your Google Cloud credentials are stored in ~/.config/gcloud/ using the standard Google Cloud CLI configuration. PX uses these credentials to provision and manage cloud resources on your behalf.
Step 3: Create a Simple Test Script
Create a file called filesize.py:
python
import fileinput
import os
def main():
# Read filenames from arguments or stdin
for line in fileinput.input():
filename = line.strip()
# Get file size in bytes
file_size = os.stat(filename).st_size
# Print filename and size
print(f"{filename}: {file_size} bytes")
if __name__ == "__main__":
main()Just for illustration
This Python program is just for illustration purposes. You don't actually need a cluster to calculate file sizes, of course, but this simple example helps you get started with PX's workflow. PX supports any programming language for user code, including Python, Ruby, JavaScript, TypeScript, Rust, Zig, C/C++, Go, Java, and more.
Step 4: Run Your First Parallel Job
Now run your script in parallel across multiple cores:
bash
ls images/ | px run -p 4 'python filesize.py'This command will:
- Run your
filesize.pyscript 4 times in parallel - Each instance will run on a separate core
- You'll see output from all 4 processes
Step 5: Prepare Your Data
First, sync your local image files to cloud storage:
bash
rclone sync images/ gs://px-gcs-my-bucket/images/About rclone
rclone is a command-line tool for managing files on cloud storage. It supports syncing, copying, and mounting cloud storage as a local filesystem. For Google Cloud Storage setup and usage, see the rclone GCS documentation.
Then generate a list of files to process:
bash
find images/ -type f | sed 's|^images/|/px-gcs-my-bucket/images/|' > images.txtThis creates images.txt with filesystem paths like:
/px-gcs-my-bucket/images/photo1.jpg
/px-gcs-my-bucket/images/photo2.jpg
/px-gcs-my-bucket/images/photo3.jpgThe reason this works is because px.yaml will include a simple configuration to mount Google Cloud Storage at that filesystem location on every cluster node.
Step 6: Bring Up a Cluster
Create a px.yaml file to define your cluster configuration, then bring up your cluster:
bash
px cluster up filesCloud econometric model
PX uses a cloud econometric model to automatically select the most cost-effective resources for your workload. It consults a real-time pricing database to find the cheapest Spot Instances on Google Cloud Platform that meet your requirements, helping you minimize costs while maintaining performance.
You'll see output like this as PX provisions your cluster:
❯ px cluster up files
Provisioning cluster 'files' with spec file 'px.yaml'...
Generating cluster metadata...
Getting cluster status: cluster_name='files'
✓ Cluster metadata saved to ~/.px/clusters/files/metadata.json
Running on cluster: files
Considered resources (3 nodes):
------------------------------------------------------------------------------------------
INFRA INSTANCE vCPUs Mem(GB) GPUS COST ($) CHOSEN
------------------------------------------------------------------------------------------
GCP (us-central1-a) n4-standard-2[Spot] 2 8 - 0.09 ✔
------------------------------------------------------------------------------------------
⚙︎ Launching on GCP us-central1 (us-central1-a).
└── Instances are up.
✓ Cluster launched: files.
⚙︎ Syncing files.
Syncing workdir (to 3 nodes): ~/repos/my-project -> ~/workdir
✓ Synced workdir.
Mounting (to 3 nodes): gs://px-gcs-my-bucket -> /px-gcs-my-bucket
✓ Storage mounted.
✓ Setup detached.
⚙︎ Job submitted, ID: 2
Setting up PX...
✓ PX setup completedStep 7: Scale to Multiple Nodes
To run across multiple cloud nodes, simply increase the parallel count:
bash
px run --cluster files -p 16 -a images.txt 'python filesize.py'PX will automatically:
- Provision the necessary cloud instances
- Distribute your work across them
- Handle all the networking and coordination
- Clean up resources when done
How parallel execution works
PX handles parallel execution automatically based on the -p or --parallelism parameter. It partitions your input data, spawns the specified number of parallel processes, and manages task distribution across your cluster. For more information, see the Parallel Execution documentation.
Next Steps
Congratulations! You've successfully run your first PX job. Here's the full CLI reference:
Usage: px [OPTIONS] COMMAND [ARGS]...
PX - Parallel made simple: easy mode for multi-core & multi-node
Options
| Option | Description |
|---|---|
-V, --version | Show version and exit |
-h, --help | Show this message and exit |
Commands
| Command | Description |
|---|---|
cluster | Manage cloud clusters for distributed job execution |
doctor | Run diagnostic tests on specified subsystems to verify proper configuration |
login | Authenticate with cloud provider |
logs | Stream logs for a job running on a cluster |
run | Run a command in parallel across local or cluster resources |
superx | Manage PX Supervisor (superx) operations |
Cluster Commands
Usage: px cluster [OPTIONS] COMMAND [ARGS]...
Manage cloud clusters for distributed job execution
| Option | Description |
|---|---|
-h, --help | Show this message and exit |
| Command | Description |
|---|---|
down | Terminate a cluster and all associated cloud resources |
report | Show cluster usage report with costs and resource details |
up | Create and provision a cluster on the cloud |
Run Commands
Usage: px run [OPTIONS] COMMAND [ARGS]...
Run a command in parallel across local or cluster resources
Set --cluster CLUSTER_NAME or set PX_CLUSTER_NAME environment variable to run on a remote cluster instead
| Option | Description |
|---|---|
-a, --args-file FILE | Read input arguments from a file instead of stdin or command line |
-p, --parallelism INTEGER | Number of parallel tasks to run (default: system capacity) |
-c, --cluster TEXT | Run on specified cluster instead of locally |
-d, --detach | Submit job and return without streaming logs (cluster mode only) |
-v, --verbose | Enable verbose output |
-h, --help | Show this message and exit |