The most difficult part of adapting to using Fireworks is probably not installing it, but actually adapting to database centric workflow style. What does that mean exactly? Well, the evolution of performing quantum chemistry calculations usually begins with performing a large number of calculations in a single file and outputting the results to differently named files within that directory. With a significantly large number of calculations, this quickly becomes a chaotic mess.

The next logical choice for retaining some order is usually to distribute these calculations between directories labeled in a way that the user can find them later. Individuals who are particularly interested in reproducibility are likely to divide calculations into individual runs as well so that each file represents its own run instance. This help to avoid overwriting information which can make following the progression of a calculation difficult or impossible. This tends to be much easier to manage, but there are still some hassles, such as needing to know the order of file structures to get specific information.

Using a database to store information about where calculations are located has all the advantages of both systems. Calculations are kept well organized in their own directory, while the database stores keywords which allow that information to be accessed easily. These keywords can be as simple as storing the names of the directories you would have used previously. The trick is getting used to this way of collecting information. I'll try and go over some basic methodologies which could be generally useful for getting accustomed to this style.

1 Running Quantum Espresso

Before we try running Fireworks at all, we should make sure we can run Quantum Espresso first. In a previous post, I went over installing Fireworks on Sherlock for SUNCAT users. This post also goes over getting the libraries needed for Quantum Espresso working in any environment. If you've followed this approach, you should be able to run the following example script with little trouble.

First, we construct a script file to run. Copy the following into a file named script.py.

Now we can run the script and see if everything works properly. By submitting to the development queue, we should be able to run this test quickly.

sbatch -p dev script.py

2 Creating module Fireworks functions

Fireworks is designed to run scripts in modular fashion, so that individual tasks which are common to many unit cells can be called upon easily and assembled different types of calculations. For my purposes, I find it very useful to collect the trajectory of a relaxation after it is completed. For this, ase-espresso has a built in function called pwlog2trajectory.

In the example, I'll be calling all of the functions I use from the following python scripts. It's a long one because it draws from many modules I incorporate in my workflow. I hope to be more diligent about posting in the future to help break each of these concepts up into individual posts.

2.1 Saving a Quantum Espresso trajectory

Line 129 of this code doesn't seem to operate correctly for my purposes when I was originally attempting to store the trajectory calculation information, so I modified it into my own version which works slightly differently.

In this function, the atoms object information for an input.traj file is used, so I'm always relying on that being in the calculation directory before having run this script. Another important caveat is that this script is designed for incorporating the entire relaxation trajectory. This will not necessarily work correctly if you re-run your Quantum Espresso calculations inside of the same directory as this will append images on to the end of the log file. Generally speaking, this is poor practice when performing calculations for reproducibility reasons.

Now that we have a functional ability to use Quantum Espresso, we can break our calculation into nicely modular functions which we can use in different situations as needed. These will need to incorporate certain aspects in order to be easily searchable in the database later down the road.

2.2 Writing a database-friendly input file

The first function I use quite frequently is one which converts an atoms object into a string format. This allows me to use an atoms object as the input to my calculation from any computer. That way, I can stage them from my personal computer ans simply have Fireworks unpack and run them once they are on the cluster.

Now that I have this function, I can store it into my path can call it to turn an atoms object into a JSON string which is safe to add into the database. Lets revisit the original quantum espresso example and see how this works.

This provides a lovely bundled up atoms object which is complete with the ase-espresso keywords we expect to be run with this atoms object. Not only is this exactly what we need to run the calculation, it's also perfect documentation for calling on later to identify what this calculation is.

2.3 Recovering an atoms object from encoding

Now that we have a JSON representation of our atoms object, we will be able to send this string to the database have it stored for future use. Once the calculation gets called up on one of the clusters however, we will need some way of turning that JSON string back into an atoms object so that the local installation of ASE knows how to use it as an atoms object.

Now we can convert our JSON encoding back into an atoms object complete with the calculation parameters which were attached before they left the computer we generated them on. I do this on the same machine here simply to demonstrate the concept.

2.4 Performing a relaxation

Now that we have all the tools we need to transfer an atoms object (and its tags) to the database and back again, we're ready to write a simple relaxaton script for executing on a generic atoms object.

from ase.io import read
from fw.fwio import atoms_to_encode
from qeio import log_to_atoms
from espresso import espresso
def get_potential_energy(in_file='input.traj'):
""" Performs a ASE get_potential_energy() call with
the ase-espresso calculator and the keywords
defined inside the atoms object information.
This can be a singlepoint calculation or a
full relaxation depending on the keywords.
"""
# Read the input file from the current directory
atoms = read(in_file)
# Planewave basis set requires periodic boundary conditions
atoms.set_pbc([1, 1, 1])
# Setting up the calculator
calc = espresso(**atoms.info)
atoms.set_calculator(calc)
# Perform the calculation and write trajectory from log.
atoms.get_potential_energy()
images = log_to_atoms(out_file='output.traj')
# Save the calculator to the local disk for later use.
try:
calc.save_flev_output()
except(RuntimeError):
calc.save_output()
return atoms_to_encode(images)

With this script, we can perform a relaxation test using a script very similar to our first example without ever submitting anything to Fireworks. It would be a good idea to run this using a similar sbatch command on Sherlock to ensure that all of the functions are setup correctly before proceeding to the next step.

2.5 Recovering an existing calculator

From the previous script, we now have a calculation which is a good save point for building off of new calculations. By loading in the calc.tgz file which was saved, we can restart our calculation from the finished relaxation using the following script.

from espresso import espresso
from ase.io import read
def get_relaxed_calculation(in_file='output.traj'):
""" Attach a stored calculator in the current directory
to the provided atoms object.
Then return the atoms object with the calculator attached.
"""
# Read the last geometry from the input file
atoms = read(in_file)
# Reinitialize the calculator from calc.tgz and attach it.
calc = espresso(**atoms.info)
calc.load_flev_output()
atoms.set_calculator(calc)
return atoms

If we combine this with a post-processing operation, like collecting th total potential, we can restart out calculation in a new place without having to perform the relaxation a second time. You should be able to perform whatever follow up operation you want on this compressed version of the calculation, so this is a nice way to store information without performing unnecessary post-processing that you may or may not use later.

3 Running through Fireworks

Finally, we are ready to submit a basic calculation to Fireworks using the tools discussed above. To keep this documentation interactive, I will be pulling my own credentials from a secure file. This will looks like the following.

After running this script, if you have already initiated Fireworks rapidfire on Sherlock, you can see how to do this in a previous post. Once it is turned on, Sherlock should submit a job to the queue automatically.

So what's going on here exactly? We'll for full details about the differences between a PyTask, Firework, and Wroflow, I would recommend looking into the in-depth documentation available on the Fireworks website. When this calculation runs, it will perform 2 distinct operations on the calculation node once it is started. The first, t0, is the qefw.encode_to_atoms function demonstrated previously. (NOTE: Make sure the qefw functions are on the PYHONPATH on the server before you run this example!). This will unpack the encoding which is stored in the database by passing encoding as an argument to the first function. This allows me to run this script from my local machine and still have the encoding uploaded to the database and decoded on the server. This is very useful for me because I enjoy using a heavy version of emacs which does not run well on the servers.

Once the t0 task is performed successfully, the t1 task will begin and perform the relaxation on the calculation node as well. Since the qefw.get_potential_energy function returns the images of the atoms objects encoded into JSON, we can also tell Fireworks to store that output into the 'spec.trajectory' field of the results by assigning storeddatavarname='trajectory'. This is extremely useful since most of the information I am ever looking for is included in the final trajectory file. If you need more specific information from the file, these functions should provide a pretty clear example of how you would go about doing that. Keep in mind that all information must be in a compatible format for the MongoDB database. JSON is a pretty safe bet in that regard.

You could also add in a t3 task which performs your own follow-up tasks, such as the get_total_potential example above.

You can monitor the progress of your calculation with the webgui the following bash script. For me, lpad is an alias with a -f argument which points to my my_fireworks.yaml file.

lpad webgui

4 Collecting information from Fireworks

Once the calculation has finished, you can collect the final trajectory simply by accessing the ID of the completed calcualtion from the LaunchPad.

The truly beautiful thing about this code is that it can be performed from my local machine, just the same way the previous submission script was. If you set up your calculation inputs and outputs, there's no need to interact with anything on the server during for regular usage purposes!

This is a wonderful tool and an excellent skill for any computational catalysis researcher to poses. Fireworks also have far more capabilities for automation then what I have covered here. In future posts, I will dive into these features in more detail as I continue to explore them.

I have finally made time to return to my blog and update some information now that I am at SUNCAT. I will be picking up from where I left off as I have made some changes to my setup due to some data loss during the transition.

Pelican has a lot of convenient features which make integration with my Github personal site easy. Including the pelican files in an orphan branch allows me to keep the build files contained in the same repository so I can get it quickly on …