Skip to content

Quick Start

Get up and running with PX in under 5 minutes. This guide will walk you through installing PX, setting up your cloud credentials, and running your first PX job.

Prerequisites

  • Python 3.10 or higher (installable via uv)
  • A cloud account (GCP supported during private beta, with AWS, Azure, and DigitalOcean coming soon)
  • Basic familiarity with the command line

Step 1: Install PX

Install PX using uv tool:

bash
uv tool install --from https://px.app/releases/nightly/c3b43af93c7083fbb6acedb0e8b40c152f429705/px.tar.gz px

Why use uv?

During our private beta period, we're managing PX CLI releases manually and leveraging uv tool to pull in the third-party dependencies that the px tool needs. For more information on uv, see the official uv documentation.

Verify the installation:

bash
px --version

Step 2: Authenticate with Your Cloud

PX needs access to your cloud provider to create and manage compute resources.

bash
px cloud login

Your browser will open automatically to authenticate with GCP:

Authenticating with all cloud providers...
Authenticating with gcp...
Your browser has been opened to visit:

    https://accounts.google.com/o/oauth2/auth?...

Follow the prompts to authenticate. PX will securely store your credentials for future use.

Where are credentials stored?

Your Google Cloud credentials are stored in ~/.config/gcloud/ using the standard Google Cloud CLI configuration. PX uses these credentials to provision and manage cloud resources on your behalf.

Step 3: Create a Simple Test Script

Create a file called filesize.py:

python
import fileinput
import os

def main():
    # Read filenames from arguments or stdin
    for line in fileinput.input():
        filename = line.strip()

        # Get file size in bytes
        file_size = os.stat(filename).st_size

        # Print filename and size
        print(f"{filename}: {file_size} bytes")

if __name__ == "__main__":
    main()

Just for illustration

This Python program is just for illustration purposes. You don't actually need a cluster to calculate file sizes, of course, but this simple example helps you get started with PX's workflow. PX supports any programming language for user code, including Python, Ruby, JavaScript, TypeScript, Rust, Zig, C/C++, Go, Java, and more.

Step 4: Run Your First Parallel Job

Now run your script in parallel across multiple cores:

bash
ls images/ | px run -p 4 'python filesize.py'

This command will:

  • Run your filesize.py script 4 times in parallel
  • Each instance will run on a separate core
  • You'll see output from all 4 processes

Step 5: Prepare Your Data

First, sync your local image files to cloud storage:

bash
rclone sync images/ gs://px-gcs-my-bucket/images/

About rclone

rclone is a command-line tool for managing files on cloud storage. It supports syncing, copying, and mounting cloud storage as a local filesystem. For Google Cloud Storage setup and usage, see the rclone GCS documentation.

Then generate a list of files to process:

bash
find images/ -type f | sed 's|^images/|/px-gcs-my-bucket/images/|' > images.txt

This creates images.txt with filesystem paths like:

/px-gcs-my-bucket/images/photo1.jpg
/px-gcs-my-bucket/images/photo2.jpg
/px-gcs-my-bucket/images/photo3.jpg

The reason this works is because px.yaml will include a simple configuration to mount Google Cloud Storage at that filesystem location on every cluster node.

Step 6: Bring Up a Cluster

Create a px.yaml file to define your cluster configuration, then bring up your cluster:

bash
px cluster up files

Cloud econometric model

PX uses a cloud econometric model to automatically select the most cost-effective resources for your workload. It consults a real-time pricing database to find the cheapest Spot Instances on Google Cloud Platform that meet your requirements, helping you minimize costs while maintaining performance.

You'll see output like this as PX provisions your cluster:

❯ px cluster up files
Provisioning cluster 'files' with spec file 'px.yaml'...
Generating cluster metadata...
Getting cluster status: cluster_name='files'
✓ Cluster metadata saved to ~/.px/clusters/files/metadata.json
Running on cluster: files
Considered resources (3 nodes):
------------------------------------------------------------------------------------------
 INFRA                 INSTANCE              vCPUs   Mem(GB)   GPUS   COST ($)   CHOSEN
------------------------------------------------------------------------------------------
 GCP (us-central1-a)   n4-standard-2[Spot]   2       8         -      0.09          ✔
------------------------------------------------------------------------------------------
⚙︎ Launching on GCP us-central1 (us-central1-a).
└── Instances are up.
✓ Cluster launched: files.
⚙︎ Syncing files.
  Syncing workdir (to 3 nodes): ~/repos/my-project -> ~/workdir
✓ Synced workdir.
  Mounting (to 3 nodes): gs://px-gcs-my-bucket -> /px-gcs-my-bucket
✓ Storage mounted.
✓ Setup detached.
⚙︎ Job submitted, ID: 2
Setting up PX...
✓ PX setup completed

Step 7: Scale to Multiple Nodes

To run across multiple cloud nodes, simply increase the parallel count:

bash
px run --cluster files -p 16 -a images.txt 'python filesize.py'

PX will automatically:

  • Provision the necessary cloud instances
  • Distribute your work across them
  • Handle all the networking and coordination
  • Clean up resources when done

How parallel execution works

PX handles parallel execution automatically based on the -p or --parallelism parameter. It partitions your input data, spawns the specified number of parallel processes, and manages task distribution across your cluster. For more information, see the Parallel Execution documentation.

Next Steps

Congratulations! You've successfully run your first PX job. Here's the full CLI reference:

Usage: px [OPTIONS] COMMAND [ARGS]...

PX - Parallel made simple: easy mode for multi-core & multi-node

Options

OptionDescription
-V, --versionShow version and exit
-h, --helpShow this message and exit

Commands

CommandDescription
clusterManage cloud clusters for distributed job execution
doctorRun diagnostic tests on specified subsystems to verify proper configuration
loginAuthenticate with cloud provider
logsStream logs for a job running on a cluster
runRun a command in parallel across local or cluster resources
superxManage PX Supervisor (superx) operations

Cluster Commands

Usage: px cluster [OPTIONS] COMMAND [ARGS]...

Manage cloud clusters for distributed job execution

OptionDescription
-h, --helpShow this message and exit
CommandDescription
downTerminate a cluster and all associated cloud resources
reportShow cluster usage report with costs and resource details
upCreate and provision a cluster on the cloud

Run Commands

Usage: px run [OPTIONS] COMMAND [ARGS]...

Run a command in parallel across local or cluster resources

Set --cluster CLUSTER_NAME or set PX_CLUSTER_NAME environment variable to run on a remote cluster instead

OptionDescription
-a, --args-file FILERead input arguments from a file instead of stdin or command line
-p, --parallelism INTEGERNumber of parallel tasks to run (default: system capacity)
-c, --cluster TEXTRun on specified cluster instead of locally
-d, --detachSubmit job and return without streaming logs (cluster mode only)
-v, --verboseEnable verbose output
-h, --helpShow this message and exit