Proposal

Title

A Parallel Perfect Matching Graph Algorithm for Chemical Bond Assignment.

Summary

We are implementing a system for multi-core CPUs that takes in a molecule whose bond orders are unknown and computes the optimal bond order assignment subject to biological constraints. We are building the system in C++ using OpenMP and will be using a parallel perfect matching algorithm to compute the bond assignment.

Background

A molecule is an electrically neutral group of two or more atoms held together by chemical bonds. The bonds between the atoms determine many of the properties of the molecule. One common problem in the field of Computational Biology is determining molecular structure for molecules whose atomic coordinates are known but whose bonds are unknown. Given the atomic coordinates, we can complete the picture for a given molecule by specifying an assignment of bonds to the atoms within the molecule.

We can represent the atoms within a molecule as vertices and the bonds between them as edges. We note however that a single molecule can contain multiple bond types (e.g. single, double, triple). Computing an optimal assignment of these various types of bonds requires determining a perfect weighted matching for the molecule while respecting certain biological constraints (e.g. all carbon atoms can only have one double bond). A number of open source libraries attempt to implement this matching algorithm, but they are exponential in the number of fused rings (cycles) within the molecule. Given that molecules encountered in practice can have hundreds of atoms, an exponential solution is highly impractical.

To solve this problem, we are implementing a parallel perfect weighted matching algorithm which can be used to determine an optimal assignment. The algorithm is detailed in secionts 3.1 and 3.2 of the course notes for the Fall 2004 edition of 15-859. We are also planning to implement a parallel memory-efficient algorithm to compute cofactor matrices, which will speed up the computation of the optimal weighted matching. We plan to follow the algorithm detailed in "A Parallel Algorithm for Calculation of Large Determinants with High Accuracy for GPUs and MPI Clusters" (Gleb Beliakov, Yuri Matiyasevich).

In addition, we are implementing a mechanism which allows the user to specify bonding constraints on different types of atoms. We are meeting with Professor Geoffrey Hutchinson, who proposed the project idea, to discuss which constraint types would be useful to support.

The Challenge

The parallel graph algorithms described in the paper assume that all vertices are identical. However, a given molecule can contain a wide variety of different atoms, some of which are subject to biological constraints. We will need to develop procedures to adapt the input graph so that the results of the algorithm respect the constraints on the molecules.

Our algorithm for computing cofactor matrices relies on Gaussian elimination, so its memory accesses will be very predictable. We'll also be taking advantage of spatial and temportal locality by partitioning the matrix into blocks.

Resources

Starter Code

We will be using starter code provided by Dr. Hutchison in the following repository. For our purposes, we will be using a CPU. We will be using C++ and OpenMp.

Papers

These two papers contain the parallel algorithm for obtaining a perfect matching we want to implement.

Goals and Deliverables

Core Goals

We plan to achieve a significant speedup in the process of chemical bond assignment by making the process parallel across a number of metrics. Currently the baseline implementation runs in exponential time. The algorithm we are implementing will make the runtime polynomial.

Our first core goal is to have this polynomial implementation working very efficiently and scale properly with different number of cores. We want to come up with an implementation and make optimizations that allow us to divide the work among all the cores evenly and utilize the all the available contexts within them.

Our second core goal is to parallelize matrix determinant calculations as this will help in speeding up our first goal. We plan to achieve maximum speedup by figuring out the dependencies among different steps of calculating the determinant and scheduling this work in an optimal way across the available resources.

Demo

We will make a non-interactive demo, however we will show speed-up graphs of how our algorithm performed when scaled across different input sizes and different number of cores.

Stretch Goals

We plan to make a web application that would make our demo interactive. We want to be able to represent the molecules visually and also show the bond adding in real time. We also want to generate the speed up graph for the particular machine through the app itself.

Platform Choice

We have chosen to make this project for the CPU. The reason for choosing CPU is that our program will be mostly run on real-sized systems like laptops. Even though the algorithm is scalable to work on a GPU, we want to implement a version for the CPU because of the needs of the bigger projects our program will be a part of.

Secondly, we are using C++ and OpenMP. The primary reason for using C++ is that we can achieve very high performance. Other than that, it also has a great support of libraries that we can use. The reason for using OpenMP is that it works very well with C++ and gives a lot of primitives that help in writing parallelizable code.

Finally, for our stretch goal, we will use Django to make our web app.

Schedule

Date Objective
Friday, April 8 Complete Python implementation of Matrix cofactor algorithm and weighted matching algorithm.
Friday, April 15 Implement cofactor algorithm and weighted matching algorithm in C++/OpenMP
Friday, April 22 Implement constraint mechanisms in Python (for testing) and C++/OpenMP
Friday, April 29 Finish performance optimizations and implement basic website
Friday, May 6th Finish performance tuning and website