Molecular crystals are a class of solids comprising molecular building blocks bound by van der Waals (vdW) interactions. They are used as functional materials for various applications, including organic electronics and photovoltaics, non-linear optics, and primarily pharmaceuticals because most drugs are marketed as solid forms of the active ingredient. Owing to the weak nature of vdW interactions, the same molecule may crystallize in several different structures, known as polymorphs. Polymorphs may be very close in energy and yet possess markedly different physical and chemical properties. For device applications, crystal structure may affect the electronic and optical properties. For pharmaceuticals, the crystal may affect the dissolution rate and thus the drug bioavailability. The ability to predict all the possible polymorphs of a particular molecule and their properties is therefore critically important.
Molecular crystal structure prediction is extremely challenging because it requires searching a high-dimensional space with quantum mechanical accuracy. To predict the structure of molecular crystals we develop the first-principles genetic algorithm (GA) code, GAtor, and its associated structure generation package, Genarris. GAs rely on the evolutionary principle of survival of the fittest to perform global optimization. The target property is mapped onto a fitness function and structures with a high fitness have an increased probability to “mate” and propagate structural “genes”. The process repeats iteratively until an optimum is found. In addition to predicting the most stable polymorphs of molecular crystals, we tailor property-based GAs to discover potential polymorphs with enhanced electronic properties.
For polymorph prediction, an energy-based fitness function is used, which assigns a higher fitness to structures with lower energy. In addition, evolutionary niching has been implemented in GAtor to perform multimodal optimization by simultaneously evolving several sub-populations. Machine learning is used to dynamically cluster the population by structural similarity. A cluster-based fitness function is then used to steer the GA towards promising under-sampled regions of the configuration space. This reduces initial population and selection biases (evolutionary drift) and improves the GA performance. An example is shown here for 1,3-dibromo-2-chloro-5-fluorobenzene. Energy-based fitness preferentially samples a basin that contains layered structures. Evolutionary niching enhances sampling in the region of the experimental structure, which has a zigzag packing motif.
GAtor offers many options, including two selection schemes, two crossover schemes, and a variety of mutation operators, designed for molecular crystals, which balance exploration and exploitation by breaking or preserving space group symmetries. The user defines the selection scheme (tournament, T, or roulette wheel, R), the crossover scheme (standard, SC, or symmetric, SymC), and the rate of crossover vs. mutation. The recommended best practice for crystal structure prediction is to run GAtor several times with different settings. The figure below demonstrates how the experimental structure of tricyano-1,4-dithiino[c]-isothiazole was generated in seven GAtor runs with different settings via different evolutionary routes, starting from initial pool structures.
Genarris generates random structures of molecular crystals in all possible space groups and applies physical constraints on intermolecular distances. Machine learning is used for clustering by structural similarity. For fast energy evaluations, Genarris employs the Harris approximation. The total electron density of the crystal is constructed by superposition of single molecule densities, calculated only once. The total energy is then evaluated by applying density functional theory (DFT) to the Harris density without performing a self-consistency cycle. As shown here for the binding energy curve of a 5-cyano-3-hydroxythiophene dimer, this is a reasonable approximation as long as the molecules are not unphysically close to each other. Genarris considers structural stability and diversity through a series of clustering and selection steps to produce curated populations of structures that can be used to initialize GAtor and/or as training sets for machine learning.