Amons-based QML

In the section, we cover

  • basic I/O, i.e., reading/writing info from xyz files and output files from quantum chemistry programs such as orca4

  • generation of molecular representations: SLATM and its local conterpart

  • run KRR training/test based on SLATM

Again, one has to cd to the main directory of aqml.

Basic I/O

Reading from xyz files

import aqml.io2 as io2
# get amons data used for training, as stored in xyz files
fs = io2.cmdout('ls demo/example/reference/g7/f*z')
fs
['../../demo/example/reference/g7/frag_01_c00001.xyz',
 '../../demo/example/reference/g7/frag_02_c00001.xyz',
 '../../demo/example/reference/g7/frag_03_c00001.xyz',
 '../../demo/example/reference/g7/frag_04_c00001.xyz',
 '../../demo/example/reference/g7/frag_05_c00001.xyz',
 '../../demo/example/reference/g7/frag_06_c00001.xyz',
 '../../demo/example/reference/g7/frag_07_c00001.xyz',
 '../../demo/example/reference/g7/frag_07_c00002.xyz',
 '../../demo/example/reference/g7/frag_08_c00001.xyz',
 '../../demo/example/reference/g7/frag_09_c00001.xyz',
 '../../demo/example/reference/g7/frag_10_c00001.xyz',
 '../../demo/example/reference/g7/frag_11_c00001.xyz',
 '../../demo/example/reference/g7/frag_11_c00002.xyz',
 '../../demo/example/reference/g7/frag_12_c00001.xyz',
 '../../demo/example/reference/g7/frag_12_c00002.xyz',
 '../../demo/example/reference/g7/frag_13_c00001.xyz']

A typical xyz file (to be used by aqml) looks like:

!cat demo/example/reference/g7/frag_01_c00001.xyz
5
alpha=13.42 b3lypvdz=-40.480429212044
     C     -1.54626100      0.73185600      0.67141100
     H     -1.73565800      0.47084700      1.72356700
     H     -1.64277500      1.82028800      0.53911200
     H     -0.52974600      0.41588200      0.39233400
     H     -2.27556000      0.21702800      0.02997500

where alpha and b3lypvdz are the polarizability (in Bohr\(^3\)) and total energy (in Hartree) of CH4 molecule. b3lypvdz indicates that all properties were calculated at the level of theory b3lyp/cc-pvdz (by orca4, if not otherwise stated).

To read geometry together with all properties of one molecule (say, CH4), do the following

import aqml.cheminfo.core as cc

mol = cc.molecule(fs[0], ['a'])
mol.props
{'alpha': array([13.42]), 'b3lypvdz': array([-25401.85283769])}

Note that atomic units for energetic properties would be automaically converted to units that are used more favorably by Chemists, namely, kcal/mol.

To read geometries and all properties of multiple molecules (say, the first 5 mols):

mols = cc.molecules(fs[:5], ['a'])
mols.props
{'alpha': array([[13.42],
        [21.68],
        [13.57],
        [33.96],
        [25.19]]), 'b3lypvdz': array([[-25401.85283769],
        [-49280.26416003],
        [-71817.46145015],
        [-73935.93419366],
        [-96479.43688458]])}

Note that the second entry of function cc.molecule and cc.molecules is the list of names of properties. If it’s set to ['a'], then all properties would be read and it’s equivalent to specify the second entry to ['alpha','b3lypvdz'].

Reading from orca output file

import aqml.io2.orca as orca

#from importlib import reload
#reload(orca)
fs = io2.cmdout('ls ../../demo/example/reference/g7/f*out')
obj = orca.orcajob(fs[0])

Note that only serial mode is supported, i.e., one output file each time.

Now get the method and total energy from the output file:

obj.e
{'b3lypvdz': -40.480429212044}

Output file produced by other programs such as Molpro, Gaussian 09 are also supported and the corresponding usages will be documented in the near future.

Generation of SLATM and ML (KRR)

import cml.algo.aqml as aq
reload(aq)
<module 'cml.algo.aqml' from '/home/bing/anaconda3/lib/python3.7/site-packages/cml/algo/aqml.py'>
T, F = True, False

# root directory storing all relevant data
root = '../../demo/example/reference/'

# amons data are stored in in xyz files under folder `g7`
train = [root + 'g7/']

# test data are stored in xyz file under folder `target/`
test = [root + 'target/']

# representation generation and krr can be done within one line of commmand
obj = aq.calculator(iaml=T,
                 train=train, test=test, lambdas=[4.0], coeffs=[1.], kernel='g',
                 rcut=4.8, p='b3lypvdz', nprocs=1, i_disp_eaq=F, debug=F)

Now run training & test

obj.run()