Finding novel or non-standard metabolic pathways, possibly spanning multiple species, has important applications in fields such as metabolic engineering, metabolic network analysis, and metabolic network reconstruction. Traditionally, this has been a manual process, but the large volume of metabolic data now available has created a need for computational tools to automatically identify biologically relevant pathways. This thesis presents new algorithms for automatically finding biologically meaningful linear and branched metabolic pathways in multi-genome scale metabolic networks. These algorithms utilize atom mapping data, which provides the correspondence between atoms in the substrates to atoms in the products of a chemical reaction, to find pathways which conserve a given number of atoms between desired start and target compounds.
The first algorithm presented identifies atom conserving linear pathways by explicitly tracking atoms during an exploration of a graph structure constructed from the atom mapping data. The explicit tracking of atoms enables finding branched pathways because it provides automatic identification of the reactions and compounds through which atoms are lost or gained. The thesis then describes two algorithmic approaches for identifying branched metabolic pathways based upon atom conserving linear pathways. One approach takes one linear pathway at a time and attempts to add branches that connect loss and gain compounds. The other approach takes a group of linear pathways and attempts to merge pathways that move mutually exclusive sets of atoms from the start to the target compounds. Comparisons to known metabolic pathways demonstrate that atom tracking causes the algorithms to avoid many unrealistic connections, often found in previous approaches, and return biologically meaningful pathways. While the theoretical complexity of finding even linear atom conserving pathways is high, by choosing the appropriate representations and heuristics, and perhaps due to the structure of the underlying data, the algorithms in this thesis have practical running times on real data. The results also demonstrate the potential of the algorithms to find novel or non-standard pathways that may span multiple organisms.