Genarris 2.0 Beta Documentation¶
How to use API Documentation¶
The execution of Genarris is controlled by a configuration file. The configuration
file specifies the execution of Genarris which is broken down into procedures,
such as Pygenarris_Structure_Generation
. Each procedure has a corresponding
section in the configuration file, for our example pygenarris_structure_generation
.
The section contains options which control the operations performed by each
procedure.
This document details the options for procedures that are executed in the Genarris 2.0
Robust workflow. In order these are, Relax_Single_Molecule, Estimate_Unit_Cell_Volume,
Pygenarris_Structure_Generation, Run_Rdf_Calc, Affinity_Propagation_Fixed_Clusters,
FHI_Aims_Energy_Evaluation, Affinity_Propagation_Fixed_Clusters, Run_FHI_Aims_Batch
.
There are many options that can be specified and modified for each section.
All of these options are specified in this document under the
Configuration File Options section of each procedure.
There are three categories of Configuration File Options. These are required,
optional, and inferred. In the Configuration File Options, these categories
are specified after the type of the option, such as int, float, or bool.
Required options have no category placed after the type. Both optional and
inferred are specified after the type. Optional arguments are those that
have default settings that in general perform well. The user may specify these
optional arguments to have more control over the program executing.
Inferred options are those that may be present in multiple different procedures.
For example, the option aims_lib_dir
is needed in the Relax_Single_Molecule
,
FHI_Aims_Energy_Evaluation
, and Run_FHI_Aims_Batch
. But, because it is
an inferred parameter, it only needs to be specified once in the earliest procedure
in which occurs and then it will be inferred by all further procedures. Options which
are inferred are thus optional in all proceeding sections.
Genarris 2.0 Procedures for Robust Workflow¶
-
class
Genarris.genarris_master.
Genarris
(inst_path)¶ Master class of Genarris. It controls all aspects of the Genarris workflow which can be executed individually or sequantially. Begins by reading and intepreting the configuration file. Calls the defined procedures with the options specified in the configuration file. Some options may be inferred from previous sections if they are not present in every section.
- Arguments
- inst_path: str
Path to the configuration file.
-
Affinity_Propagation_Fixed_Clusters
(comm)¶ AP that explores the setting of preference in order to generate desired number of clusters.
Arguments
- comm: mpi4py.MPI object
MPI communicator.
Configuration File Options
- output_dirstr
Path to the directory where the chosen structures will be stored.
- preference_rangelist
List of two values as the [min, max] of the range of allowable preference values.
- structure_dirstr, inferred
Path to the directory of files to be used for the calculation. Default is to infer this value from the previous section.
- dist_mat_input_filestr, inferred
Path to the distance matrix output from the descriptor calculation. Default is to infer this value from the previous sections.
- output_formatstr, optional
Format the structure files should be saved as. Default is both.
- cluster_on_energybool, optional
Uses energy values to determine examplars. Structures with the lowest energy values from each cluster are selected. Default is False.
- plot_histogramsbool, optional
If histogram plots should be created of the volume and space groups. Default is False.
- num_of_clustersint or float, optional
Float, must be less than 0. Selects a fraction of the structures. Int, selects specific number of structures equal to int. Default is 0.1.
- num_of_clusters_toleranceint, optional
Algorithm will stop if it has generated the number of clusters within the number of desired clusters and this tolerance. Default is 0.
- max_sampled_preferencesint, optional
Maximum number of preference values to try.
- output_without_successbool, optional
Whether to perform output procedures if the algorithm has reached the maximum number of sampled preferences without finding the correct number of clusters. Default is False.
- affinity_typelist, optional
List of [type of afinity, value] argument Scikit-Learn AP alogrithm.
- affinity_matrix_pathstr, optional
Path to the affinity matrix to use for the AP algorithm. Default is
affinity_matrix.dat
.- dampingfloat, optional
damping argument for Scikit-Learn AP algorithm. Default is 0.5.
- convergence_iterint, optional
convergence_iter argument for Scikit-Learn AP algorithm. Default is 15.
- max_iterint, optional
max_iter argument for Scikit-Learn AP algorithm. Default is 1000.
- preferenceint, optional
preference argument for Scikit-Learn AP algorithm. Default is None.
- verbose_outputbool, optional
verbose argument for Scikit-Learn AP algorithm. Default is False.
- property_keystr, optional
Key which the AP cluster will be stored in the properties of each structure object. Default is
AP_cluster
.- output_filestr, optional
Path where info about the AP alogrithm execution will be stored. Default is
./AP_cluster.info
.- exemplars_output_dirstr, optional
If provided, will output the examplars of each cluster to this folder. Default is None.
- exemplars_output_formatstr, optional
File format of structures to be output. Default is both.
- structure_suffixstr, optional
Suffix to apply to structure files which are written. Default is
.json
.- output_dir_2: str, inferred
Code automatically looks for the option output_dir_2 if the output directory already exists. This is how the code currently identifies that AP is running for a second time. Default behavior is to not use this option if output_dir does not already exist.
- num_of_clusters_2: int or float, optional
num_of_clusters for second clustering step. Default value is 0.1.
- output_file_2str, inferred
Use if running AP algorithm twice, such as in the Robust workflow. Default is to use output_file.
- exemplars_output_dir_2str, inferred
Exemplars output directory if second clustering step is used. Default is to use exemplars_output_dir.
- cluster_on_energy_2str, inferred
How to choose examplars for the second clustering step. Default is to use cluster_on_energy value.
- energy_name_2str, inferred
Energy name to use for second clustering step. Default is to use energy_name.
-
Estimate_Unit_Cell_Volume
(comm)¶ Performs volume estimation using a machine learned model train on the CSD and based on Monte Carlo volume integration and topological molecular fragments. See Genarris 2.0 paper for full description.
Arguments
- comm: mpi4py.MPI object
MPI communicator.
Configuration File Options
- volume_meanfloat, optional
If provided, uses this value as the volume generation mean without using the ML model to etimate the volume.
- volume_stdfloat, optional
If provided, uses this value for structure generation, otherwise a default value of 0.075 multiplied by the prediction volume per unit cell is provided.
- Returns
None (None) -- Returns an object of type None.
-
FHI_Aims_Energy_Evaluation
(comm, world_comm, MPI_ANY_SOURCE, num_replicas)¶ Runs Self-Consistent Field calculation on a pool of structures.
Arguments
Configuration File Options
- Returns
None (None)
-
Pygenarris_Structure_Generation
(comm)¶ Uses the Genarris module written in C to perform structure generation. This module enables generation on special positions.
Arguments
- comm: mpi4py.MPI object
MPI communicator.
Configuration File Options
- molecule_pathstr
Path to the relaxed molecule geometry.
- output_formatstr,
Determines the type of file which will be output for each structure. Can be one of: json, geo, both.
- output_dirstr
Path to the directory which will contain all generated structures which pass the intermolecular distance checks.
- num_structuresint
Target number of structures to generate.
- Zint
Number of molecules per cell to generate.
- volume_meanfloat, optional
- volume_stdfloat, optional
- srfloat, optional
Defines the minimum intermolecular distance that is considered physical by multiplying the sum of the van der Waals radii of the interacting atoms by sr. Default value is 0.85.
- tolfloat, optional
Tolerance to be used to identify space groups compatible with the input molecule.
- max_attempts_per_spg_per_rankint
Defines the maximum number of attempts the structure generator makes before moving on to the next space group.
- num_structures_per_allowed_SG_per_rankint
Number of structures per space group per rank which will be generated by Pygenarris.
- geometry_out_filenamestr
Filename where all structures generated by Pygenarris will be found.
- omp_num_threadsint
Number of OpenMP threads to pass into Pygenarris
- truncate_to_num_structuresbool
If true, will reduce pool to exactly the number defined by num_structures.
-
Run_Rdf_Calc
(comm)¶ Runs RDF calculation for the pool of generated structures. RDF descriptor is similar to that described in Behler and Parrinello 2007. Then calculates the structure difference matrix.
Arguments
- comm: mpi4py.MPI object
MPI communicator.
Configuration File Options
- structure_dirstr, inferred
Path to the directory of structures to evaluate.
- dist_mat_fpathstr
Path to file to write distance matrix to.
- output_dirstr
Path of directory to write structures to (will create if it DNE). If 'no_new_output_dir' then input structures will be overwritten.
- normalize_rdf_vectors: bool,optional
Whether to normalize the rdf vectors over the columns of the feature matrix before using them to compute the distance matrix. Default is Falase.
- standardize_distance_matrix: bool
If True, standardizes the distance matrix. The method is to divide all elements by the max value in the distance matrix. Because it is a distance matrix and thus all elements are positive, the standardized elements will be in the range [0, 1]. Default is False.
- save_envs: bool, optional
Whether to save the environment vectors calculated by the RDF method in the output structure files. Default is False.
- cutofffloat, optional
Cutoff radius to apply to the atom centered symmetry function. Default is 12.
- n_D_interint, optional
Number of dimensions to use for each type of pair-wise interatomic interaction found in the structure. Default is 12.
- init_schemestr, optional
Can be centered or shifted, as described in Gastegger et al. 2018. Default is shifted.
- eta_rangelist, optional
List of two floats which define the range for eta parameter in Gastegger et al. 2018. Default is [0.05,0.5].
- Rs_rangelist, optional
List of two floats which define the range for Rs parameter in Gastegger et al. 2018. Default is [[0.1,12].
- pdist_distance_typestr,optional
Input parameter for the pdist function. Default is Euclidean.
- Returns
None (None)
-
Relax_Single_Molecule
(comm, world_comm, MPI_ANY_SOURCE, num_replicas)¶ Calls run_fhi_aims_batch using the provided single molecule path.
Arguments
Configuration File Options
- Returns
None (None) -- Returns an object of type None.
-
Run_FHI_Aims_Batch
(comm, world_comm, MPI_ANY_SOURCE, num_replicas)¶ Runs FHI-aims calculations on a pool of structures using num_replicas.
Arguments
- comm: mpi4py.MPI object
MPI communicator to pass into aims
- world_comm: mpi4py.MPI object
World MPI communicator
- MPI_ANY_SOURCE: mpi4py.MPI.ANY_SOURCE
MPI ANY_SOURCE object to facilitate communication.
- num_replicas: int
Number of replicas to use in calculation.
Configuration File Options
- verbosebool
Controls verbosity of output.
- energy_namestr
Property name which the calculated energy will be stored with in the Structure file.
- output_dirstr
Path to the directory where the output structure file will be saved.
- aims_output_dirstr
Path where the aims calculation will take place.
- aims_lib_dirstr, inferred
Path to the location of the directory containing the FHI-aims library file.
- molecule_pathstr
Path to the geometry.in file of the molecule to be calculated if called using harris_single_molecule_prep or relax_single_molecule.
- structure_dirstr, inferred
Path to the directory of structures to be calculated if calculation was called not using harris_single_molecule_prep or relax_single_molecule.
- Zint, inferred
Number of molecules per cell.
- Returns
None (None)
Genarris 2.0 Callable Functions¶
-
Genarris.evaluation.run_fhi_aims.
run_fhi_aims_batch
(comm, world_comm, MPI_ANY_SOURCE, num_replicas, inst=None, sname=None, structure_dir=None, aims_output_dir=None, output_dir=None, aims_lib_dir=None, control_path=None, energy_name='energy', verbose=False)¶ Performs multiple FHI calculations
Arguments
- comm: mpi4py.MPI object
MPI communicator to pass into aims
- world_comm: mpi4py.MPI object
World MPI communicator
- MPI_ANY_SOURCE: mpi4py.MPI.ANY_SOURCE
Any source object for communication.
- num_replicas: int
Number of replicas to perform calculation.
- inst: genarris.core.instruct.Instruct
Config Parser object which contains all the configuration file sections and options for calculation.
- sname: str
Section name which called run_fhi_aims_batch
- struct_dir: str
Path to directory of structures to perform calculation.
- aims_output_dir: str
Path to the directory where FHI-aims calculations should take place.
- output_dir: str
Path to the directory where the Structure files should be saved.
- aims_lib_dir: str
Path to the directory containing the FHI-aims library file.
- control_path: str
Path to the directory containing the control file to use.
- energy_namestr
Property name which the calculated energy will be stored with in the Structure file.
- verbosebool
Controls verbosity of output.
Configuration File Options
- verbosebool
Controls verbosity of output.
- energy_namestr
Property name which the calculated energy will be stored with in the Structure file.
- output_dirstr
Path to the directory where the output structure file will be saved.
- aims_output_dirstr
Path where the aims calculation will take place.
- aims_lib_dirstr
Path to the location of the directory containing the FHI-aims library file.
- molecule_pathstr
Path to the geometry.in file of the molecule to be calculated if called using harris_single_molecule_prep or relax_single_molecule.
- structure_dirstr
Path to the directory of structures to be calculated if calculation was called not using harris_single_molecule_prep or relax_single_molecule.
- Returns
None (None)