HECBioSim makes it easy for any UK-based bio-simulation scientist to get significant amounts of time on ARCHER, the UK National High Performance Computing Service. Used appropriately, this can make an enormous impact on the quantity and quality of the research that can be done. But learning how to use HPC resources, and analysing the huge amounts of data they can produce, can be a significant challenge.
In this course you will be introduced to Longbow, a Python tool created by the HECBioSim consortium, that allows you to use primary molecular dynamics packages (AMBER, GROMACS, LAMMPS, NAMD) with ease from the comfort of your own desktop. Longbow eliminates the day to day tasks of setting up job submission files and having to deal with Linux terminals or FTP transfer programs to upload and download your files. Longbow is designed to be as simple as possible to install and get going so that researchers can spend more time doing research.
Some useful links for future reference:
Longbow software page here
Longbow documentation here
Longbow support here
Setup Exercise 1 - Retrieve your ARCHER username and password from SAFE
Before we can start setting up Longbow we first need to retrieve and test that we can SSH into the ARCHER system. Follow the below steps to retrieve your machine account details for ARCHER.
Step 1 - First go to the ARCHER SAFE login webpage here, and log in with your SAFE account credentials (you will have done this as asked in your registration email).
Step 2 - Once logged in, go to the "login accounts" option on the navigation menu at the top of the page. A drop down menu should appear listing various options including a list of user accounts you have setup. As part of the preparation for this workshop you were asked to request an account under e280 in which you will have created a username. Select this username from the menu.
Step 3 - You will be presented with a page listing various system resource metrics, towards the bottom right there are some buttons. We need the button labelled "View Login Account Password" when you click this button a box will appear below asking for your SAFE password. Enter your SAFE password and click the button labelled "view". You will be shown your system generated password for gaining entry to ARCHER login nodes (make a note of this for the next step).
That's it, we can now move onto the next step and setup the password-less SSH.
Setup Exercise 2 - Setting up password-less SSH
Before proceeding to install Longbow, it is important to ensure that you can get access to your ARCHER account, and to then setup password-less entry via SSH. This is because Longbow makes many calls to your remote machine in each Longbow session, and without this, you would have to keep inputting your password. This would be cumbersome and annoying! To get started, please work through the following steps and make sure everything is working as it should before moving on.
Step 1 - The first thing we should do before setting up password-less SSH, is to verify that SSH works, accept the key signature and change the default password. So get the username and password handy from the previous exercise and use them with SSH, open a terminal and type (replace "username" with the username from SAFE, you will be prompted for your password, this is same one as you looked up previously):
You may also be asked to accept the key signature, you should answer yes to this. You will then be asked to change your password, you should do this and then make a note of it for the next step. You may then log out and return to your local machine ternimal (type exit and press enter).
Step 2 - We will now create an SSH key pair by typing the following into a terminal (this needs to be created locally, so if you are still logged in to ARCHER then logout):
You will see something like the following, you will be prompted to enter a file and passphrase, you can just press enter 3 times to leave the default file with blank password since we want password-less SSH:
Generating public/private rsa key pair.
Enter file in which to save the key (/home/juan/.ssh/id_rsa):
Created directory '/home/juan/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/juan/.ssh/id_rsa.
Your public key has been saved in /home/juan/.ssh/id_rsa.pub.
The key fingerprint is:1a:13:f5:77:7c:32:ea:00:42:4c:99:a8:59:1d:6e:17 juan@trique-ponee
The key's randomart image is: +--[ RSA 2048]----+ | =++E | | oo=. o . | | + = o . . = .| | o . + . . o + | | o S . . | | + o | | . . | | | | | +-----------------+
Step 3 - We then need to copy the public part of the key pair to ARCHER, there is a utility that will do this automatically for you (replace "username" with your ARCHER username you retrieved in exercise 1, it will also ask for a password, this will now be the password that you changed in step 1):
Step 4 - Test the SSH connection can be established without asking for a password (replace "username" with your ARCHER username):
You should be now logged into ARCHER without being asked for a passcode, if this is not the case, seek support from the tutors or see this guide for more information.
Setup Exercise 3 - Installation of Longbow
To install Longbow on the cluster machines here at Bristol we will need to do the alternate install using setup.py as pip is not available. You could in theory install pip, however it is just as fast to install Longbow using the setup script, just bear in mind that if you install Longbow on your own system later that you can likely use the much easier pip install method (this is all covered in the documentation). Follow the instructions below to get Longbow installed and tested.
Step 1 - Install Longbow using pip.
$ pip install longbow --user
In this case, we need to use the --user flag to do a local install of Longbow as you will not have root permissions to do system-wide install.
You should see some output relating to the installation, this is normal. Once complete we should do some tests to make sure everything works.
Step 2 - Testing. Before we go on we need to make sure everything is working. The following check should be performed:
First check if Longbow can be launched, during testing the paths on the cluster machines were not setup correctly (this may have been resolved)
$ longbow --version
If the above command works you should see the Longbow version number output to the terminal console, if this is the case then you can skip to the next exercise. If however you got a message saying something along the lines of "executable not found" then there is likely a problem with paths. This is because the local bin path was not added to the system path by the system administrators, we can resolve this now by:
Open up your local bashrc file with your favourite text editor
$ nano ~/.bashrc
Somewhere near the bottom of the file put the following line:
Then save and exit (for nano this is done by pressing ctrl+x and then saying 'y' followed by 'enter' to save), then you will need to resource your bashrc by:
$ source ~/.bashrc
You should then try to do the Longbow version command again and make sure you can get the version number output.
Setup Exercise 4 - Editing the hosts configuration file
Now Longbow is setup and working we will need to tell it about the computing resource and the user account that we are going to use. To do this Longbow uses a configuration file called "hosts.conf", this file can be used to store details of many different compute resources or different configurations for the same compute resource (for example same machine different accounts). There is a hierarchy to parameters used in files and on the command-line and this can be used to create complex setups, but this is beyond the scope of this workshop and is covered in detail in the documentation.
For this workshop we will be creating just one machine profile, to do this open up hosts.conf in your favourite editor, this file is created during installation in the ~/.Longbow directory:
$ nano ~/.Longbow/hosts.conf
You will see that this file already contains some information, this is simply a template to show new users what the format of the file looks like. You should delete the information that was already in this file and replace it with the following (note that Longbow will default to the compute resource that appears first in this file if one is not specified at runtime), you should then replace all instances of "username" with the username that you created in safe prior to attending the workshop:
host = login.archer.ac.uk
user = username
remoteworkdir = /work/e280/e280/username/Longbow
account = e280-workshop
queue = R3704872
frequency = 120
maxtime = 01:00
That's it you have created your first machine profile in Longbow!
You will notice that this file has a very specific structure, this structure is the ini format. The name in the square brackets denotes the name of the compute resource (you can use this in Longbow to select different compute resources to run on otherwise Longbow defaults to the first one in the file), under this is listed parameters that describe the compute resource. This is quite a powerful way to describe compute resources, it is possible to setup multiple entries for the same compute resource that has different settings (such as different accounts or other parameters) or to setup different compute resources at different facilities if you have lots of accounts at different places. This file can be used to setup machine specific job defaults which can then be overridden by job files or command-line parameters (this however is beyond the scope of this workshop, however it is well documented in the documentation).
Simulation Exercise 1: Running a Single Job
For this example we are going to run a simple job, we are going to launch a job straight from the command-line so that you can see just how simple running simulations via Longbow can be. To get started we will make use of the 'Zanamivir Bound to H7N9 Neuraminidase' (part 2) example that you used in the previous session only we are going to be running it on ARCHER. To launch Longbow via the command-line the format is always of the following form:
$ longbow [longbow arguments] executable [executable arguments]
So launching Longbow in this way can almost be as simple as writing "longbow" in front of your normal command-line that you would use on your desktop machine. To get started with this example:
1. First copy over the input files that you used for the "part 2" example in the previous session (they were in the directory called "complex") into a new directory so you have a clean working space, this will be the files called "h7n9_zan.prmtop", "h7n9_zan.rst" and "mdconfig". If you deleted these files then you can re-download them from here, but you will need to re-modify them as guided here.
2. Once you have done this then in a terminal change to this new directory path (using cd) and then we can launch Longbow:
$ longbow --log exercise1.log --verbose namd2 mdconfig
You will see Longbow go through lots of configuration and setup and eventually pause shortly after submitting your job, this is normal and is designed to be light on logging (simulations that take months would make massive log files otherwise). Longbow will periodically check upon the status of your job and stage back your results so far, this is an excellent way to keep an eye on your simulation without having to log in to your compute machine and download a file to check progress.
When we launch Longbow like this the --log argument allows us to rename the log file that Longbow logs to (some programs default to a file called log and so does Longbow, so it is best to rename it with something meaningful). The --verbose flag simply instructs Longbow to give basic log output to the terminal you launch from, Longbow is silent by default so that it can be launched from small scale institutional cluster machines in a "headless" fashion. Another point to note is in real world simulations with Longbow, parameters given to Longbow on the command-line override those in any configuration file.
Launching Longbow like this without any configuration outside of the hosts.conf will pick up and use parameters either from the hosts.conf or use internal defaults (for example we didn't set the number of cores in hosts.conf so Longbow defaults to 24, already a lot bigger than the 4 you used earlier). You will notice that the ++ppn has been left out, and there is a good reason for this!
So all being well, this simulation should complete within 4-5 minutes or so (file transfer could take time if the computer cluster you are working on uses some network file systems, but this will not happen on your own machine) upon completion you will be notified.
Simulation Exercise 2: Lets make it bigger
OK, so you have seen how to quickly fire off a job to ARCHER using Longbow. Now we would like to make this simulation much longer so that you can get a lot more information from your simulations, this would have taken a prohibitively long time to run on your 4 core processor in the previous session. In this example we will modify the simulation input files and also use a job.conf file with Longbow to override the number of cores we will use for this simulation. To get going with this example:
1. Make a new copy of the input files as you did in example 1 in a new directory so you have a nice clean work space.
2. To make the simulation run for a much longer time we need to edit (using nano) the mdconfig file just as you did in the previous workshop, change numsteps to 125000 and change the frequency of the output file generation (DCDfreq) to 1000 so you should now get 250 ps of simulation with 125 frames of output.
3. Now we need to create a job configuration file so that we can make some changes without modifying our default settings in hosts.conf. So with your favourite text editor create a job.conf in the same directory as your input files from step 1:
$ nano job.conf
Now enter the following structure:
executable = namd2
resource = Archer
cores = 48
modules = namd
executableargs = mdconfig
So in the above file, you will notice that we have provided the executable and it's arguments here, so we will not need to do this on the command-line later. Also we have explicitly specified ARCHER as the machine to run on, although in this workshop we have only configured one so we could easily leave this out, but you will likely have access to many machines. We are also specifying how many cores we wish to run on so we are going to use 48 processor cores (so we can do much more simulation in the time available). The modules parameter is useful for loading Linux module on the compute machine, system administrators often install many different configurations or versions of software and provide modules as a way to load them, this is the way to use them in Longbow.
4. Now all we need to do is run it!
$ longbow --log exercise2.log --verbose --job job.conf
You will again see Longbow do lots of configuration and setup followed by submission to ARCHER. Longbow will then enter a monitoring stage where it will wait a few minutes between performing checks upon your simulation and staging back the results so far, it will appear to be frozen between these checks, this is normal. You will be notified when your job is complete.
Before moving on:
up next are some examples that showcase some of the more powerful use cases for Longbow, the jobs used in this section are simply to highlight the functionality of Longbow. Please find the files for the next two examples here.
Once downloaded extract the zip file to a place of your choice.
Simulation Exercise 3: Running Replica Jobs
Running replicates is very simple using Longbow. Replicates are useful when you have lots of very similar jobs that would/could be launched from an identical command-line if all input files were named in a fashion to allow this. Lets have a look how such a job would be configured and then run.
1. Change into the 'replicate' directory in the directory you just extracted. You will notice we have a number of files plus 5 subdirectories, rep1 – rep5. The three files placed in the top level directory ending in .prm .pdb and psf are common to all replicate jobs and to prevent lots of disk wastage Longbow can detect files used in this fashion and treat them accordingly. Now look at one of the subdirectories:
$ ls ./rep1
You will see that in here we have more input files, for this example we should pretend that the files in each directory have different starting coordinates, in reality they don't but in real world simulations this is a common setup.
2. Submit the job:
$ longbow --log exercise3.log --verbose --replicates 5 namd2 example.in
For each replica, Longbow will first look in the current rep* subdirectory for an input file and if it isn’t found, Longbow will then look for it in the parent directory of the subdirectory. Due to this all of the input files will be found and used in the simulations in the correct manner. You can see this by inspecting the generated submit.pbs in another terminal.
Note: that it is also possible to submit simulations with all input files in the parent folder and have Longbow generate the repx directories, this is particularly useful when doing seeded simulations.
Simulation Exercise 4: Running Multijobs
One of the most powerful features of Longbow is it’s ability to rapidly and simultaneously submit multiple jobs and monitor them to completion. To run multiple jobs under one Longbow instance, jobs should be specified in a job configuration file. Let’s try this ourselves;
1. Change into the directory 'multi' in the examples directory you previously downloaded. You will see there are three subdirectories in this directory, called “gromacs”, "lammps" and “namd”. So in this example we are going to submit three completely different jobs to ARCHER using three different MD codes.
2. Create a job configuration file and submit the job.
$ nano job.conf
Job files take the form of ini files like the hosts.conf you did earlier, so we need to set up three different sections with different parameters in each one:
executable = mdrun_mpi
resource = Archer
cores = 48
modules = gromacs
executableargs = -deffnm example
executable = namd2
resource = Archer
cores = 48
modules = namd
executableargs = example.in > example.out
executable = lmp_xc30
resource = Archer
cores = 48
modules = lammps/lammps-9Dec14
executableargs = -i example.in -sf opt
In this file we are specifying some information for each job separately, this idea could be used for example to run different jobs with different numbers of cores or on different machines (resource). But here we are using each jobs to use a different program (executable) and thus different command-line parameters and format (executableargs).
Parameters specified under a job in a job configuration file, take precedence over the same parameter that might be declared under the hosts.conf file, so this is a good way to provide job specific overrides for one offs.
3. Now launch the multijob and watch the fun happen!
$ longbow --log exercise4.log --verbose --job job.conf