Quick Start

Get up and running with PX in under 5 minutes. This guide will walk you through installing PX, setting up your cloud credentials, and running your first PX job.

Prerequisites

Python 3.10 or higher (installable via uv)
A cloud account (GCP supported during private beta, with AWS, Azure, and DigitalOcean coming soon)
Basic familiarity with the command line

Step 1: Install PX

Install PX using uv tool:

bash

uv tool install --from https://px.app/releases/nightly/latest/px.tar.gz px

Why use uv?

During our private beta period, we're managing PX CLI releases manually and leveraging uv tool to pull in the third-party dependencies that the px tool needs. For more information on uv, see the official uv documentation.

Verify the installation:

bash

px --version

Step 2: Authenticate with Your Cloud

PX needs access to your cloud provider to create and manage compute resources.

bash

px cloud login

Your browser will open automatically to authenticate with GCP:

Authenticating with all cloud providers...
Authenticating with gcp...
Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?...

Follow the prompts to authenticate. PX will securely store your credentials for future use.

Where are credentials stored?

Your Google Cloud credentials are stored in ~/.config/gcloud/ using the standard Google Cloud CLI configuration. PX uses these credentials to provision and manage cloud resources on your behalf.

Step 3: Create a Simple Test Script

Create a file called filesize.py:

python

import fileinput
import os

def main():
    # Read filenames from arguments or stdin
    for line in fileinput.input():
        filename = line.strip()

        # Get file size in bytes
        file_size = os.stat(filename).st_size

        # Print filename and size
        print(f"{filename}: {file_size} bytes")

if __name__ == "__main__":
    main()

Just for illustration

This Python program is just for illustration purposes. You don't actually need a cluster to calculate file sizes, of course, but this simple example helps you get started with PX's workflow. PX supports any programming language for user code, including Python, Ruby, JavaScript, TypeScript, Rust, Zig, C/C++, Go, Java, and more.

Step 4: Run Your First Parallel Job

Now run your script in parallel across multiple cores:

bash

ls images/ | px run -p 4 'python filesize.py'

This command will:

Run your filesize.py script 4 times in parallel
Each instance will run on a separate core
You'll see output from all 4 processes

Step 5: Prepare Your Data

First, sync your local image files to cloud storage:

bash

rclone sync images/ gs://px-gcs-my-bucket/images/

About rclone

rclone is a command-line tool for managing files on cloud storage. It supports syncing, copying, and mounting cloud storage as a local filesystem. For Google Cloud Storage setup and usage, see the rclone GCS documentation.

Then generate a list of files to process:

bash

find images/ -type f | sed 's|^images/|/px-gcs-my-bucket/images/|' > images.txt

This creates images.txt with filesystem paths like:

/px-gcs-my-bucket/images/photo1.jpg
/px-gcs-my-bucket/images/photo2.jpg
/px-gcs-my-bucket/images/photo3.jpg

The reason this works is because px.yaml will include a simple configuration to mount Google Cloud Storage at that filesystem location on every cluster node.

Step 6: Bring Up a Cluster

Create a px.yaml file to define your cluster configuration, then bring up your cluster:

bash

px cluster up files

Cloud econometric model

PX uses a cloud econometric model to automatically select the most cost-effective resources for your workload. It consults a real-time pricing database to find the cheapest Spot Instances on Google Cloud Platform that meet your requirements, helping you minimize costs while maintaining performance.

You'll see output like this as PX provisions your cluster:

❯ px cluster up files
Provisioning cluster 'files' with spec file 'px.yaml'...
Generating cluster metadata...
Getting cluster status: cluster_name='files'
✓ Cluster metadata saved to ~/.px/clusters/files/metadata.json
Running on cluster: files
Considered resources (3 nodes):
------------------------------------------------------------------------------------------
 INFRA                 INSTANCE              vCPUs   Mem(GB)   GPUS   COST ($)   CHOSEN
------------------------------------------------------------------------------------------
 GCP (us-central1-a)   n4-standard-2[Spot]   2       8         -      0.09          ✔
------------------------------------------------------------------------------------------
⚙︎ Launching on GCP us-central1 (us-central1-a).
└── Instances are up.
✓ Cluster launched: files.
⚙︎ Syncing files.
  Syncing workdir (to 3 nodes): ~/repos/my-project -> ~/workdir
✓ Synced workdir.
  Mounting (to 3 nodes): gs://px-gcs-my-bucket -> /px-gcs-my-bucket
✓ Storage mounted.
✓ Setup detached.
⚙︎ Job submitted, ID: 2
Setting up PX...
✓ PX setup completed

Step 7: Scale to Multiple Nodes

To run across multiple cloud nodes, simply increase the parallel count:

bash

px run --cluster files -p 16 -a images.txt 'python filesize.py'

PX will automatically:

Provision the necessary cloud instances
Distribute your work across them
Handle all the networking and coordination
Clean up resources when done

How parallel execution works

PX handles parallel execution automatically based on the -p or --parallelism parameter. It partitions your input data, spawns the specified number of parallel processes, and manages task distribution across your cluster. For more information, see the Parallel Execution documentation.

Next Steps

Congratulations! You've successfully run your first PX job. Here's the full CLI reference:

Usage: px [OPTIONS] COMMAND [ARGS]...

PX - Parallel made simple: easy mode for multi-core & multi-node

Options

Option	Description
`-V, --version`	Show version and exit
`-h, --help`	Show this message and exit

Commands

Command	Description
`cluster`	Manage cloud clusters for distributed job execution
`doctor`	Run diagnostic tests on specified subsystems to verify proper configuration
`login`	Authenticate with cloud provider
`logs`	Stream logs for a job running on a cluster
`run`	Run a command in parallel across local or cluster resources
`superx`	Manage PX Supervisor (superx) operations

Cluster Commands

Usage: px cluster [OPTIONS] COMMAND [ARGS]...

Manage cloud clusters for distributed job execution

Option	Description
`-h, --help`	Show this message and exit

Command	Description
`down`	Terminate a cluster and all associated cloud resources
`report`	Show cluster usage report with costs and resource details
`up`	Create and provision a cluster on the cloud

Run Commands

Usage: px run [OPTIONS] COMMAND [ARGS]...

Run a command in parallel across local or cluster resources

Set --cluster CLUSTER_NAME or set PX_CLUSTER_NAME environment variable to run on a remote cluster instead

Option	Description
`-a, --args-file FILE`	Read input arguments from a file instead of stdin or command line
`-p, --parallelism INTEGER`	Number of parallel tasks to run (default: system capacity)
`-c, --cluster TEXT`	Run on specified cluster instead of locally
`-d, --detach`	Submit job and return without streaming logs (cluster mode only)
`-v, --verbose`	Enable verbose output
`-h, --help`	Show this message and exit

Quick Start ​

Prerequisites ​

Step 1: Install PX ​

Step 2: Authenticate with Your Cloud ​

Step 3: Create a Simple Test Script ​

Step 4: Run Your First Parallel Job ​

Step 5: Prepare Your Data ​

Step 6: Bring Up a Cluster ​

Step 7: Scale to Multiple Nodes ​

Next Steps ​

Options ​

Commands ​

Cluster Commands ​

Run Commands ​

Quick Start

Prerequisites

Step 1: Install PX

Step 2: Authenticate with Your Cloud

Step 3: Create a Simple Test Script

Step 4: Run Your First Parallel Job

Step 5: Prepare Your Data

Step 6: Bring Up a Cluster

Step 7: Scale to Multiple Nodes

Next Steps

Options

Commands

Cluster Commands

Run Commands