Eventually, the portable Runner will replace the classic Runner because it contains the generalized framework for executing Java, Python, Go, and more languages in the future. If you want to run Python pipelines with Beam on Flink you want to use the portable Runner. For more information on portability, please visit the Portability page. Using Python as it is to convert Python Jobs to PySpark, is a common mistake. The steps outlined in this blog post can make a smoother and more organized transition from Pandas to PySpark using.
These guides were replaced by a new, updated process on Monday, October 21, 2019. If you're starting a new project, we recommend using the updated process instead of the guides below. Seethis page for the new guides.
To best understand the below information, users should already have an understanding of:
- Using the command line to: navigate within directories, create/copy/move/delete files and directories, and run their intended programs (aka 'executables').
Many CHTC users have Python programs requiring Python versions that are not installed on CHTC's high throughput system, which includes the CHTC Pool, the UW Grid (flocking) and the Open Science Grid (GlideIn). Instead, you get to choose the version of Python you want, and bring it along with your jobs.
This guide details the steps needed to:
To run Python jobs, you will first need to build a Python installation for your jobs to use.
A. Get the Python Version You Need
Python Job Runner Tutorial
Before starting, locate the version of Python that you want to use from python.org. Transfer or download the appropriate source .tgz file to the submit server.
Using Distributions
Python Job Runners
Instead of installing Python from source, it is also possible to create a Python installation using a Python distribution. Examples include Anaconda and miniconda (from Continuum Analytics) and Enthought Canopy (from Enthought). The only change to the instructions below will be the source file (the distribution's install script, instead of source code) and the exact commands required to create a local installation (Step 3 below). Otherwise, the process is nearly identical - install the Python distribution locally and create a tarball of the installed directory.
One major drawback of using a distribution is the size of the installation - the full Anaconda distribution is over 300 MB, whereas a Python installation from sourcewith a few packages is less than 40 MB.

B. Create a Python Installation in an Interactive Job
Python Job Runner Online
Because a python installation can be computationally intensive, it should not be performed on the submit server. Instead, you can create your installation on a build server (dedicated), by using an interactive job. The interactive job is essentially a job without an executable; you are the one running the commands instead (in this case, to install Python).Like a regular HTCondor job, once you finish our installation on the build server, the output files (for us, our Python installation) will be transferred back to the submit server so that you can use it to submit your jobs.

Python Job Runner
Submit an Interactive Build Job
Instructions for submitting an interactive build job are here: http://chtc.cs.wisc.edu/inter-submit
Note that you should replacesource_code.tar.gzwith the name of the Python source tarball that you downloaded. If you downloaded additional source code for modules in part A, you should list those in thetransfer_input_filesline as well.Submit the interactive job and wait for it to start.
Prepare the Installation Directory
Once the interactive job starts, create a directory for the installation, which can be done with the
mkdircommand:Next, untar the source code that you transferred over. In the command below, replace
</li>python_source.tgzwith the name of your Python tarball.Install Python
To install Python, we will run a configuration script that includes an option to set the installation location. We will set the location to the directory we created above, and then complete the installation by running
make.Move into the untarred Python source directory (it should be named something like 'Python-#.#').
From that directory, type the following commands to compile and install Python to the directory you created in step 2:
</li>Check the Installation
Once these commands have finished executing, move back into the main working directory. Then, check the contents of your
pythondirectory. It should look like this:Finally, make sure you have a python exectuable. Run: You should see something like this:
The number of items may vary, depending on which version of Python you used. If you do not see the plain
pythonexectuable, (as above), do the following:Replace 'python3' with 'python2', if that's the version you've installed. Similarly, if you seepip3but not justpip, do:Install Modules
If you are installing any additional modules, do so now:
Set your
PATHvariable to include your Python installation:- Make sure
pipis installed. If you saw it listed inpython/binin the previous step, you can move on to the next step. If you don't see a version ofpip, follow these steps to install it:Go to the pip documentation page and follow the directions under 'Installing with get-pip.py'. You can download the
get-pip.pyscript by copying the link to the script and then typing: For each module needed by your code, run: pip should download all dependent packages and install them. Certain modules may take longer than others.
Exit the Interactive Job
Right now, if we exit the interactive job, nothing will be transferred back because we haven't created any new files in the working directory, just sub-directories. In order to transfer back our installation, we will need to compress it into a tarball file - not only will HTCondor then transfer back the file, it is generally easier to transfer a single, compressed tarball file than an uncompressed set of directories.
Run the following command to create your own tarball of the installation:
The installation is complete! You can now exit the interactive job and the tarball of your Python installation will return to the submit server with you.
</ol>- Your
executableshould be the script that you wrote above. - Change
transfer_input_filesto include your Python installation tarball (python.tar.gz), your Python scripts, and any input files your job needs. - Modify the CPU/memory request lines. Test a few jobs for disk space/memory usage in order to make sure your requests for a large batch are accurate! Disk space and memory usage can be found in the log file after the job completes.
We now have a python.tar.gz file that contains our entire Python installation. In order to use this installation in our HTCondor jobs, we will need to write a script that unpacks our Python installation and then runs our Python code. We will use this script as as the executable of our HTCondor submit file.
A sample script appears below. After the first line, the lines starting with hash marks are comments . You should replace 'myscript.py' with the name ofthe script you would like to run.
If you have additional commands you would like to be run within the job, you can add them to this base script. Once your script does what you would like, give it executable permissions by running:
A sample submit file can be found in our hello world example page. You should make the following changes in order to run Python jobs:How big is your installation tarball?
If your installation tarball is larger than 100 MB, you should NOT transfer the tarball using
transfer_input_files. Instead, you should use CHTC's web proxy,squid. In order to request space onsquid, email the research computing facilitators at chtc@cs.wisc.edu.