Debugging

Debugging distributed systems is notoriously difficult. PX makes it easier by encouraging patterns that are naturally debuggable: single-process, single-threaded programs that operate on partitioned data.

Start Local, Scale Remote

The best way to debug PX jobs is to test locally first:

bash

# Test your script locally first
python process.py < sample_input.txt

# Run locally in parallel
px run -p 4 'python process.py'

# Then scale to the cloud
px run --cluster files -p 16 -a images.txt 'python process.py'

If your code works locally on a single input, it should work remotely on partitioned inputs — assuming your code is idempotent and doesn't rely on shared state.

Common Debugging Patterns

1. Test with Small Data First

Don't start by processing 60,000 files. Start with 10:

bash

# Create a small test dataset
mkdir test_data
cp sample_files/*.jpg test_data/

# Test locally
px run -p 2 'python process.py'

2. Check Your Input/Output Assumptions

Make sure your code handles:

Empty inputs gracefully
File permissions correctly
Output directories that may not exist
Partial or malformed data

3. Use Verbose Logging

Add logging to understand what each process is doing:

python

import logging
import os

logging.basicConfig(
    level=logging.INFO,
    format=f'[{os.getpid()}] %(message)s'
)

logger = logging.getLogger(__name__)
logger.info(f"Processing file: {filename}")

When running in parallel, the PID helps you distinguish which process is doing what.

Idempotency Checklist

Your code should be idempotent. Ask yourself:

✅ Does my code produce the same output given the same input?
✅ Can I re-run this code on the same data without problems?
✅ Does my code avoid external state (databases, APIs, global counters)?
✅ Are my file writes atomic or to unique output files?

Understanding Process Isolation

Each parallel instance of your code runs in its own Linux process. These processes:

Have separate memory spaces
Don't share variables or state
Each process a partition of the input data
Write to their own output locations

Think of each process as a completely independent script execution.

When Things Go Wrong

Job Failures

If a job fails, check:

Exit codes: Did your script exit with a non-zero status?
Logs: What was printed to stderr?
Input partitioning: Did some partitions have malformed data?

Partial Results

If you get partial results:

Check idempotency: Can you safely re-run on all inputs?
Check output handling: Are you overwriting vs. appending?
Check file locking: Are concurrent writes causing issues?

Performance Issues

If jobs are slow:

Profile locally first: Use standard profiling tools
Check I/O patterns: Are you reading the same file repeatedly?
Adjust parallelism: Try -p with different values
Monitor resource usage: Are you CPU-bound or I/O-bound?

Best Practices

Write Debuggable Code

python

# Good: Clear, single-purpose, debuggable
def process_image(input_path, output_path):
    """Process a single image file."""
    img = load_image(input_path)
    processed = apply_filter(img)
    save_image(processed, output_path)
    return output_path

# Avoid: Complex, stateful, hard to debug
class ImageProcessor:
    def __init__(self):
        self.cache = {}
        self.counter = 0
        self.db_conn = connect_to_db()
    # ... distributed state is hard to debug

Keep It Simple

The simpler your code, the easier it is to debug at scale:

Prefer pure functions over stateful classes
Avoid global variables and shared state
Use explicit inputs and outputs
Log important operations

Test Edge Cases

Test your code with:

Empty inputs
Very large inputs
Malformed data
Missing files
Permission issues

If your code handles these locally, it will handle them at scale.

Debugging ​

Start Local, Scale Remote ​

Common Debugging Patterns ​

1. Test with Small Data First ​

2. Check Your Input/Output Assumptions ​

3. Use Verbose Logging ​

Idempotency Checklist ​

Understanding Process Isolation ​

When Things Go Wrong ​

Job Failures ​

Partial Results ​

Performance Issues ​

Best Practices ​

Write Debuggable Code ​

Keep It Simple ​

Test Edge Cases ​