Genarris 2.0 Beta Documentation

How to use API Documentation

The execution of Genarris is controlled by a configuration file. The configuration file specifies the execution of Genarris which is broken down into procedures, such as Pygenarris_Structure_Generation. Each procedure has a corresponding section in the configuration file, for our example pygenarris_structure_generation. The section contains options which control the operations performed by each procedure.

This document details the options for procedures that are executed in the Genarris 2.0 Robust workflow. In order these are, Relax_Single_Molecule, Estimate_Unit_Cell_Volume, Pygenarris_Structure_Generation, Run_Rdf_Calc, Affinity_Propagation_Fixed_Clusters, FHI_Aims_Energy_Evaluation, Affinity_Propagation_Fixed_Clusters, Run_FHI_Aims_Batch. There are many options that can be specified and modified for each section. All of these options are specified in this document under the Configuration File Options section of each procedure.

There are three categories of Configuration File Options. These are required, optional, and inferred. In the Configuration File Options, these categories are specified after the type of the option, such as int, float, or bool. Required options have no category placed after the type. Both optional and inferred are specified after the type. Optional arguments are those that have default settings that in general perform well. The user may specify these optional arguments to have more control over the program executing. Inferred options are those that may be present in multiple different procedures. For example, the option aims_lib_dir is needed in the Relax_Single_Molecule, FHI_Aims_Energy_Evaluation, and Run_FHI_Aims_Batch. But, because it is an inferred parameter, it only needs to be specified once in the earliest procedure in which occurs and then it will be inferred by all further procedures. Options which are inferred are thus optional in all proceeding sections.

Genarris 2.0 Procedures for Robust Workflow

class Genarris.genarris_master.Genarris(inst_path)

Master class of Genarris. It controls all aspects of the Genarris workflow which can be executed individually or sequantially. Begins by reading and intepreting the configuration file. Calls the defined procedures with the options specified in the configuration file. Some options may be inferred from previous sections if they are not present in every section.

Arguments
inst_path: str

Path to the configuration file.

Affinity_Propagation_Fixed_Clusters(comm)

AP that explores the setting of preference in order to generate desired number of clusters.

Arguments

comm: mpi4py.MPI object

MPI communicator.

Configuration File Options

output_dirstr

Path to the directory where the chosen structures will be stored.

preference_rangelist

List of two values as the [min, max] of the range of allowable preference values.

structure_dirstr, inferred

Path to the directory of files to be used for the calculation. Default is to infer this value from the previous section.

dist_mat_input_filestr, inferred

Path to the distance matrix output from the descriptor calculation. Default is to infer this value from the previous sections.

output_formatstr, optional

Format the structure files should be saved as. Default is both.

cluster_on_energybool, optional

Uses energy values to determine examplars. Structures with the lowest energy values from each cluster are selected. Default is False.

plot_histogramsbool, optional

If histogram plots should be created of the volume and space groups. Default is False.

num_of_clustersint or float, optional

Float, must be less than 0. Selects a fraction of the structures. Int, selects specific number of structures equal to int. Default is 0.1.

num_of_clusters_toleranceint, optional

Algorithm will stop if it has generated the number of clusters within the number of desired clusters and this tolerance. Default is 0.

max_sampled_preferencesint, optional

Maximum number of preference values to try.

output_without_successbool, optional

Whether to perform output procedures if the algorithm has reached the maximum number of sampled preferences without finding the correct number of clusters. Default is False.

affinity_typelist, optional

List of [type of afinity, value] argument Scikit-Learn AP alogrithm.

affinity_matrix_pathstr, optional

Path to the affinity matrix to use for the AP algorithm. Default is affinity_matrix.dat.

dampingfloat, optional

damping argument for Scikit-Learn AP algorithm. Default is 0.5.

convergence_iterint, optional

convergence_iter argument for Scikit-Learn AP algorithm. Default is 15.

max_iterint, optional

max_iter argument for Scikit-Learn AP algorithm. Default is 1000.

preferenceint, optional

preference argument for Scikit-Learn AP algorithm. Default is None.

verbose_outputbool, optional

verbose argument for Scikit-Learn AP algorithm. Default is False.

property_keystr, optional

Key which the AP cluster will be stored in the properties of each structure object. Default is AP_cluster.

output_filestr, optional

Path where info about the AP alogrithm execution will be stored. Default is ./AP_cluster.info.

exemplars_output_dirstr, optional

If provided, will output the examplars of each cluster to this folder. Default is None.

exemplars_output_formatstr, optional

File format of structures to be output. Default is both.

structure_suffixstr, optional

Suffix to apply to structure files which are written. Default is .json.

output_dir_2: str, inferred

Code automatically looks for the option output_dir_2 if the output directory already exists. This is how the code currently identifies that AP is running for a second time. Default behavior is to not use this option if output_dir does not already exist.

num_of_clusters_2: int or float, optional

num_of_clusters for second clustering step. Default value is 0.1.

output_file_2str, inferred

Use if running AP algorithm twice, such as in the Robust workflow. Default is to use output_file.

exemplars_output_dir_2str, inferred

Exemplars output directory if second clustering step is used. Default is to use exemplars_output_dir.

cluster_on_energy_2str, inferred

How to choose examplars for the second clustering step. Default is to use cluster_on_energy value.

energy_name_2str, inferred

Energy name to use for second clustering step. Default is to use energy_name.

Estimate_Unit_Cell_Volume(comm)

Performs volume estimation using a machine learned model train on the CSD and based on Monte Carlo volume integration and topological molecular fragments. See Genarris 2.0 paper for full description.

Arguments

comm: mpi4py.MPI object

MPI communicator.

Configuration File Options

volume_meanfloat, optional

If provided, uses this value as the volume generation mean without using the ML model to etimate the volume.

volume_stdfloat, optional

If provided, uses this value for structure generation, otherwise a default value of 0.075 multiplied by the prediction volume per unit cell is provided.

Returns

None (None) -- Returns an object of type None.

FHI_Aims_Energy_Evaluation(comm, world_comm, MPI_ANY_SOURCE, num_replicas)

Runs Self-Consistent Field calculation on a pool of structures.

Arguments

See Run_FHI_Aims_Batch()

Configuration File Options

See Run_FHI_Aims_Batch()

Returns

None (None)

Pygenarris_Structure_Generation(comm)

Uses the Genarris module written in C to perform structure generation. This module enables generation on special positions.

Arguments

comm: mpi4py.MPI object

MPI communicator.

Configuration File Options

molecule_pathstr

Path to the relaxed molecule geometry.

output_formatstr,

Determines the type of file which will be output for each structure. Can be one of: json, geo, both.

output_dirstr

Path to the directory which will contain all generated structures which pass the intermolecular distance checks.

num_structuresint

Target number of structures to generate.

Zint

Number of molecules per cell to generate.

volume_meanfloat, optional

See Estimate_Unit_Cell_Volume()

volume_stdfloat, optional

See Estimate_Unit_Cell_Volume()

srfloat, optional

Defines the minimum intermolecular distance that is considered physical by multiplying the sum of the van der Waals radii of the interacting atoms by sr. Default value is 0.85.

tolfloat, optional

Tolerance to be used to identify space groups compatible with the input molecule.

max_attempts_per_spg_per_rankint

Defines the maximum number of attempts the structure generator makes before moving on to the next space group.

num_structures_per_allowed_SG_per_rankint

Number of structures per space group per rank which will be generated by Pygenarris.

geometry_out_filenamestr

Filename where all structures generated by Pygenarris will be found.

omp_num_threadsint

Number of OpenMP threads to pass into Pygenarris

truncate_to_num_structuresbool

If true, will reduce pool to exactly the number defined by num_structures.

Run_Rdf_Calc(comm)

Runs RDF calculation for the pool of generated structures. RDF descriptor is similar to that described in Behler and Parrinello 2007. Then calculates the structure difference matrix.

Arguments

comm: mpi4py.MPI object

MPI communicator.

Configuration File Options

structure_dirstr, inferred

Path to the directory of structures to evaluate.

dist_mat_fpathstr

Path to file to write distance matrix to.

output_dirstr

Path of directory to write structures to (will create if it DNE). If 'no_new_output_dir' then input structures will be overwritten.

normalize_rdf_vectors: bool,optional

Whether to normalize the rdf vectors over the columns of the feature matrix before using them to compute the distance matrix. Default is Falase.

standardize_distance_matrix: bool

If True, standardizes the distance matrix. The method is to divide all elements by the max value in the distance matrix. Because it is a distance matrix and thus all elements are positive, the standardized elements will be in the range [0, 1]. Default is False.

save_envs: bool, optional

Whether to save the environment vectors calculated by the RDF method in the output structure files. Default is False.

cutofffloat, optional

Cutoff radius to apply to the atom centered symmetry function. Default is 12.

n_D_interint, optional

Number of dimensions to use for each type of pair-wise interatomic interaction found in the structure. Default is 12.

init_schemestr, optional

Can be centered or shifted, as described in Gastegger et al. 2018. Default is shifted.

eta_rangelist, optional

List of two floats which define the range for eta parameter in Gastegger et al. 2018. Default is [0.05,0.5].

Rs_rangelist, optional

List of two floats which define the range for Rs parameter in Gastegger et al. 2018. Default is [[0.1,12].

pdist_distance_typestr,optional

Input parameter for the pdist function. Default is Euclidean.

Returns

None (None)

Relax_Single_Molecule(comm, world_comm, MPI_ANY_SOURCE, num_replicas)

Calls run_fhi_aims_batch using the provided single molecule path.

Arguments

See Run_FHI_Aims_Batch()

Configuration File Options

See Run_FHI_Aims_Batch()

Returns

None (None) -- Returns an object of type None.

Run_FHI_Aims_Batch(comm, world_comm, MPI_ANY_SOURCE, num_replicas)

Runs FHI-aims calculations on a pool of structures using num_replicas.

Arguments

comm: mpi4py.MPI object

MPI communicator to pass into aims

world_comm: mpi4py.MPI object

World MPI communicator

MPI_ANY_SOURCE: mpi4py.MPI.ANY_SOURCE

MPI ANY_SOURCE object to facilitate communication.

num_replicas: int

Number of replicas to use in calculation.

Configuration File Options

verbosebool

Controls verbosity of output.

energy_namestr

Property name which the calculated energy will be stored with in the Structure file.

output_dirstr

Path to the directory where the output structure file will be saved.

aims_output_dirstr

Path where the aims calculation will take place.

aims_lib_dirstr, inferred

Path to the location of the directory containing the FHI-aims library file.

molecule_pathstr

Path to the geometry.in file of the molecule to be calculated if called using harris_single_molecule_prep or relax_single_molecule.

structure_dirstr, inferred

Path to the directory of structures to be calculated if calculation was called not using harris_single_molecule_prep or relax_single_molecule.

Zint, inferred

Number of molecules per cell.

Returns

None (None)

Genarris 2.0 Callable Functions

Genarris.evaluation.run_fhi_aims.run_fhi_aims_batch(comm, world_comm, MPI_ANY_SOURCE, num_replicas, inst=None, sname=None, structure_dir=None, aims_output_dir=None, output_dir=None, aims_lib_dir=None, control_path=None, energy_name='energy', verbose=False)

Performs multiple FHI calculations

Arguments

comm: mpi4py.MPI object

MPI communicator to pass into aims

world_comm: mpi4py.MPI object

World MPI communicator

MPI_ANY_SOURCE: mpi4py.MPI.ANY_SOURCE

Any source object for communication.

num_replicas: int

Number of replicas to perform calculation.

inst: genarris.core.instruct.Instruct

Config Parser object which contains all the configuration file sections and options for calculation.

sname: str

Section name which called run_fhi_aims_batch

struct_dir: str

Path to directory of structures to perform calculation.

aims_output_dir: str

Path to the directory where FHI-aims calculations should take place.

output_dir: str

Path to the directory where the Structure files should be saved.

aims_lib_dir: str

Path to the directory containing the FHI-aims library file.

control_path: str

Path to the directory containing the control file to use.

energy_namestr

Property name which the calculated energy will be stored with in the Structure file.

verbosebool

Controls verbosity of output.

Configuration File Options

verbosebool

Controls verbosity of output.

energy_namestr

Property name which the calculated energy will be stored with in the Structure file.

output_dirstr

Path to the directory where the output structure file will be saved.

aims_output_dirstr

Path where the aims calculation will take place.

aims_lib_dirstr

Path to the location of the directory containing the FHI-aims library file.

molecule_pathstr

Path to the geometry.in file of the molecule to be calculated if called using harris_single_molecule_prep or relax_single_molecule.

structure_dirstr

Path to the directory of structures to be calculated if calculation was called not using harris_single_molecule_prep or relax_single_molecule.

Returns

None (None)