Scripts

 

abstract2seq


Run Jay Ponder's 'abstract' program from his Sleuth package on a pdb file and convert the result to a sequence


Abstract calculates Kabsch and Sander secondary structure, polar and buried surfaces

Note: This will fail if the pdb file contains disordered atoms


Flags:

-c_<string> Chain to abstract (all = all chains [default], blank = blank chains)

-force Force overwrite of existing output



adf


Calculate angular distribution functions of atoms about the centre of mass of a group


At the moment, this script is extremely simplified. It only takes one reference group and one target group per invocation, and includes no support for periodic boundary conditions.


Flags:

-i_<file> Index file

-r_<string> Reference group (present in index file)

-t_<string> Target group (present in index file)

-s_<num> Bin size (angle in degrees)

-n_<num> Number of bins

-O_<string> Output file name



atom_rename


Change a single atom name or rename a single atom in a particular residue type


Flags:

-eres_<string> Existing residue name

-ename_<string> Existing atom name

-enum_<number> Existing atom number

-new_<string> New atom name



autocorr_angle


Flags:

-g1_<string> Group 1 (present in the index file)

-g2_<string> Group 2 (present in the index file)

-i_<file> Index file

-ts_<num> Time step between molecules (ps)

-m_<num> Maximum interval (delta-t) as proportion of the entire window (default 0.1)

-O_<string> Output file name



autocorr_distance


Flags:

-g_<string> Group (present in the index file)

-i_<file> Index file

-ts_<num> Time step between molecules (ps)

-m_<num> Maximum interval (delta-t) as proportion of the entire window (default 0.1)

-O_<string> Output file name



average2


Reads in two or more whitespace-delimited files and averages their contents.


Each file is assumed to have two data columns, and comment lines escaped using "#". Of the data columns, the one on the left is the independent variable, and that on the right the dependent.

At least two files are required.



backbone


Delete all non-backbone atoms from a molecule.

Uses atom labels to determine which atoms should be kept. Default set is {N, CA, C, O}. This can be further reduced to CA atoms only by use of the -ca flag, as below.


Flags:

-ca Keep only CA atoms



bilayer_builder


Build an approximate lipid bilayer using periodic boundary conditions. Molecules can be emedded in the bilayer.


By default, the bilayer is built in the XZ plane, using instances of a supplied lipid file. In this case, prior to use in bilayer building, the lipid should already be oriented along the Y-axis, with its head in the positive-Y direction.


To make bilayer builder more compatible with MD packages that require anisotropic simulations to have the bilayer in the XY plane, use the -xy flag.


Bilayers can be built with multiple constituent species. To do this, multiple passes are used. That is, instances of one species are built into a bilayer, and this bilayer is then "embedded" into a bilayer of the second species, and so on.


The distance between the two halves of the bilayer is controlled by the -s flag. This should be set to approximately the length of the lipid in the Y dimension. Usage: bilayer_builder <lipid_file> -x <cellx> -y <celly> -z <cellz> -n1 <number of lipids in first layer> [ -n2 <number of lipids in second layer> -e <molecule to embed> -s <separation between layers of lipids> ]


Flags:

-n1_<number> Number of instances of lipid to use to create monolayer 1 (top)

-n2_<number> Number of instances of lipid to use to create monolayer 2 (bottom)

-x_<number> |

-y_<number> | Dimensions of the box in each direction

-z_<number> |

-s_<number> Distance between the mid-points of the bilayers

-d_<number> Minimum distance between molecule heavy atoms

-r Randomise lipid dihedrals by this number of steps

-mt Maximum number of attempts to pack molecule (default 1000)

-ih Ignore hydrogen atoms when packing lipids (default)

-e_<filename> Embed molecule filename

-not Do not translate the embedded molecule to the centre

-xy Build bilayer in the xy plane

-test Test resulting bilayer for clashes



bsurface


Find number of atoms buried in a protein-ligand intermolecular interaction

Ligand atoms are classified as being buried if they are within <cutoff> distance of the protein and vice versa


Flags:

-p_<filename> Protein file name

-c_<number> H-bond distance cutoff



calc_time


Turns a specified number of seconds into human-readable time


Usage:

calc_time <time 1> [ <time 2> ... ]



check_complex


Check to see if a ligand overlaps with a protein

Use: check_complex <ligand> -p <proteinfile>



checkbox


Test to see if molecule(s) is (are) inside the box defined by box.pdb


Use: checkbox <ligandfile> -b<boxfile>


Used for programs like DOCK which do not always dock a molecule INSIDE the specified box. The default (Dock format) box is 'box.pdb'. 'Good' structures are written to <filebase>_ibox>.fmt and structures with atoms outside the box are written to <filebase>_obox.fmt


Flags:

-ih Ignore hydrogen atoms

-b box location (default box.pdb)

-force Force overwriting of output files



cluster


Cluster molecules using the very simple 'cluster' subroutine.

Use: cluster <file> [<flags>]

Writes out a representative structure from each cluster.


Flags:

-t_<number> Clustering threshold (default: 2)

-a Treat all input files as one set



cluster_fp


Cluster compounds based on a molecular fingerprint using the Tanimoto coefficient

Uses a very simplistic clustering algorithm

Use: cluster_fp <structurefile>

One compound is retained from each cluster. The retained compounds are biased towards lower molecular weight (strictly number of heavy atoms)


Flags:

-cutoff Tanimoto cutoff value (default 0.50)

-fmax Max fragment size

-fmin Min fragment size

-at Fragment atom typing [element (simple typing by atomic element, default), none (all atoms have the same type)]

-bt Bond typer [simple (single, triple and double/aromatic bonds), none (all bonds have the same type) ]

-h Include hydrogens in fragments [none (default), polar, all]



dipole_moment


Calculate a dipole moment


Flags:

-e Calculate electrostatic dipole moment

-f Spec file for custom dipole moments

-n Index file

-ts Timestep between files (ps)

-i Time at first file (ps)

-w Write out molecule and charge-centres in mol2 files

Notes:

-e and -f can be used together, but at least one must be used.

-n must be used if -f is used.



druglike


Discard compounds that do not fit a set of Lipinskiish critera for druglike properties

Mol_characterise must be run first


Flags:

-nh_<number> Maximum number of heavy atoms (default 30)

-fnh_<text> SDF field containing number of heavy atoms (default ".NUMHEAVY")

-nr_<number> Maximum number of rotatable bonds (default 7)

-fnr_<text> SDF field containing number of rotatable bonds (default "SILICO.NUMROT")

-nd_<number> Minumum number of Hydrogen bond donors (default 1)

-fnd_<text> SDF field containing number of Hydrogen bond donors (default "SILICO.NUMDON")

-na_<number> Minimum number of Hydrogen bond acceptors (default 3)

-fna_<text> SDF field containing number of Hydrogen bond acceptors (default "SILICO.NUMACC")

-l_<number> Maximum Log-P value (default 5.0)

-fl_<text> SDF field containing Log-P value (default "LOGP")

-no_<number> Maximum number of non-{C,H,N,O,P,S} atoms (default 0)

-fno_<text> SDF field containing number of non-{C,H,N,O,P,S} atoms (default "SILICO.NUMOTHER")



extract_lig_prot


Split pdb file into protein and non-covalently bound ligands

Structures are split up using connectivity

Molecules with fewer than 'maxatom' and more 'minatom' atoms are assumed to be ligands (This approach is a little simplistic. It assumes that there are no breaks in the protein chain. However it IS able to extract peptide ligands from proteins)


Flags:

-maxatoms (Maximum ligand size)

-minatoms (Minimum ligand size)



file_rename


Rename files using a perl regular expression substitution


For example file_rename -a _new <filename> will replace the string '_new' with '' and the file clozapine_new.mol2 becomes clozapine.mol2


Use:

file_rename <inputfiles> -a <exp1> -b <exp2> where <exp1> and <exp2> are strings or perl regular expressions

<exp2> is '' by default


Examples:

file_rename file_new.mol2 -a _new will rename file_new.mol2 to file.mol2

file_rename file.mol2 -a .mol2 -b .bak will rename file.mol2 to file.bak

file_rename ssss.mol2 -a 's*' -b x -re will rename ssss.mol2 to x.mol2

file_rename ssss.mol2 -a 's' -b x -re -s will rename ssss.mol2 to xsss.mol2


Flags:

-a_<string> string to replace

-b_<string> string to insert ('' by default)

-s Make only a single substitution of string a (not a global one)

-re treat string a as a regular expression

-force overwrite existing files



find_aggregate


Identify hydrophobic aggregates of molecules (eg components of micelles) in a periodic system. Molecules are identified by connectivity and can contain multiple residues. Two molecules are defined as belonging to the same aggregate if they have carbon atoms within the cutoff distance (-t flag).


The -move option moves all atoms into the unit cell (molecules are split when they cross the unit cell boundaries). This is useful for generating pictures in programs like Pymol


The script writes out a Gromacs format index file (ie atoms are indexted starting from 1).

Summary data is written to <first_filename>.out


Flags:

-t_<number> Maximum distance between C atoms in aggregate (default 4)

-move Move all atoms into the unit cell (range 0 -> cell size)

-x_<number> }

-y_<number> } Cell dimensions

-z_<number> }



find_amino_acids


Test script to find amino acids within a molecule by comparing residues to amino acid templates

Note: Requires that hydrogen atoms are present


Flags:

-w Write out files containing the newly labelled structure



find_close_water


Find water molecules that are directly hydrogen bonded to a protein



find_groups


Count functional groups in a molecule


find_max


Find the maximum and minimum extents of a molecule and make a box to enclose it

Box file is written as box_<filebase><counter>.pdb


find_rings


Test script to find rings, planar rings and aromatic rings in a molecule. Temperature factor and occupancies of the output pdb file are set. The Mol2 output file contains the rings, aromatic rings and planar rings in static sets.


Flags:

-r Maximum depth of search for rings

-o Write out molecule structure files with rings marked

-timing

-debug rings



find_similar


Identify duplicate or similar compounds in two sets of molecules using Tanimoto or Euclidian comparisons


Useage find_similar file1 file2 file3 ....


Designed to be used to filter docking results that are ordered from best to worst.

Fragments are generated using the silico fragment routines

Tanimoto coeff = Num common fragments / Num fragments in mol1 + Num fragments in mol2 - Num common fragments

Euclidian coeff =


Flags:

-s Scoring method (Tanimoto or Euclidian)

-cut Cutoff value for duplicate compound (default 0.80)

-dup Find duplicates (sets cutoff to 0.999)

-max Max fragment size

-min Min fragment size

-noh Ignore hydrogens in fingerprints (default)

-o<format> Output format


fix_rings


Attempts to find and fix rings with bonds through them


This assumes that all bonds are present in the structure file and that the structure is minimised to begin with.


The script focuses on bond length as a method of detection and finishes up with a second minimisation.


Flags:

-min_<number> Minimise with Sybyl for some number of steps (0 disables minimisation)

-r Rename molecules, giving them the filebase as the new name



fix_types


Attempts to fix Sybyl atom types that have been broken in a minimisation, looking especially for atoms with aromatic bonds that have not been given aromatic types.


Flags:

-r Rename each molecule in a file, giving it the filename



flatten


Squash molecules flat into the XY plane



formatdoc


Print formatted comments from silico files


Flags:

-h Write HTML output to a file

-s Generate subroutine descriptions



get_near_res


Work-alike for Dock get_near_res and invertPDB


Only takes one argument at a time.


Flags:

-p_<file> Protein filename

-c_<dist> Cutoff (default 15 Angstroms)



getcell


Make a box from cell coordinates



hydrodynamic_radius


Calculates the hydrodynamic radius for a series of molecules


Flags:

-ts Timestep between files (ps)

-i Time at initial file (ps)



libclean


Clean up structures taken from the Available Chemicals Directory or other ISIS databases (A poor man's Concord)


Assumes that molecules are 'flat' and have ISIS chirality descriptors. Sybyl is used to convert 2D structures to 3D by minimisation using either the Sybyl or MMFF forcefield. Produces an output file with a descriptive name if an error occurs. Retains molecular data from SDF files


Steps:


1. Discard counter ions and small molecules by retaining only the largest molecule in the input structure


2. Scale bonds to approximate sensible values


3. Delete any _polar_ atoms so that an approximate physiological protonation state can be produced


4. Check that all atom elements are real elements (Not Du, R, etc)


5. Add hydrogens using Sybyl (known to have problems with nitro groups and other things) or silico which approximates physiological conditions for common functional groups - carboxylic acids are deprotonated, aliphatic amines are protonated. Hydrogens added using Silico have approximate geometries and the resutling structues should probably be minimised.


6. Attempt to produce approximately correct stereochemistry for atoms marked as chiral (this usually fails for complex molecules)


7. Randomise atom Z positions slightly to stop Sybyl getting stuck on saddle points


8. Minimise molecule with Sybyl (optional)


9. Check for gross problems with the molecule


10. Write out the result <filebase>_cl_<number>.<ext>


Flags:


-flat Use this flag if the input structures are 2D

-addh Add hydrogens with 'silico' or 'sybyl'

-min <steps> Number of steps for minimisation

-ff Sybyl forcefield: Tripos FF and Gasteiger-Marsilli charges (gm) or Merck FF and charges (merck)

-mhadd Minimise only if change in number of hydrogen atoms (Silico H addition only)

-sybexe <sybyl_exe> Location/name of Sybyl executable. This is usually found automatically by checking TA_ROOT

-force Force overwriting of output files

-rename Rename molecules using SDF data field

-o <format> Output format


Files:

sleep A file called sleep in the operating directory will cause the job to go to sleep while it is there

stop A stop file will cause the job to stop



make_box


Produce a box file '<filebase>_box.pdb'. If the input file has a unit cell predefined, that will be used. Otherwise, a box will be made large enough to enclose the first molecule.

Unit cells are encoded in some file types, for example Gromacs, Mol2, PDB


Flags:

-i Ignore existing unit cell data



make_index


Create a an index file in Gromacs or DCD format.


Can use a versatile atom selection language and will also write out files containing selected atoms


Flags:


-z Number atoms from zero, as for DCD files (Note if you wish to use this file with catdcd, you will need to edit the resulting file to contain only a single index group and remove all lines containing square brackets

-as_<string> Use Atom Specifier (see below)

-g Create an index group containing all atoms

-a Create a separate index group for each atom

-e Create an index group for each element

-r Create an index group for each residue

-seg Create an index group for each segment

-w Create separate index groups for water and not water

-set Create an index group for each Mol2 atom set

-an_<atom_names> Create index groups for listed atom names

-rn_<res_names> Create index groups for listed residue names

-d Analyse the molecule as for a Starmaker dendrimer (create index groups for COR, GAA, GAB,...)


-write Write out a file containing atoms in index groups. The output file contains structures corresponding to each index group (except the one containing all atoms)


Using atom specifiers


Atoms specifiers are supplied using the -as flag

Several flags are supplied as atom specifier shortcuts

-back Backbone atoms

-ca CA atoms

-cacb CA and CB atoms

-heavy Nonhydrogen atoms


Atom specifier examples:


ANAME:CA All atoms called 'CA'.

ANAME:CA,CB,CG Returns all atoms called 'CA' or 'CB' or 'CD'.

ANAME:CA,RESNAME:TRP All atoms called 'CA' in all residues called TRP

ANAME:CA,SUBID:4 All atoms called 'CA' in residue number 4

ELEMENT:!H All nonhydrogen atoms

SEGID:PROT,SEGID:LIG All atoms with the SEGID set to PROT or LIG

Successive atom specifications can be made. Each is separated by a '|'

ANAME:CB|ANAME:CA|ANAME:CD Returns all atoms called 'CB', 'CA', 'CD'.

ANAME:CA,RESID:1|ANAME:CA,RESID:4 Returns 'CA' atoms from residue 1 and 4.

Atom specifiers are case sensitive.



merge_residue


Merge a single residue into another. The residue to be merged will be given the same name as the target residue.


Flags:

-e_<number> Existing residue number (A = all residues)

-n_<number> New residue number

-a_<string> New residue name (optional)



mol2cns


Convert a molecule to be suitable for input to CNS

Deletes pseudoatoms from a file (atom name starts with Q)

Adds hydrogens

Generates correct hydrogen names


Flags:


-f File containing atom names and connectivities (default $SILICO_HOME/data/cns_amino_acid_atoms.dat)

-addh Add hydrogens

-del Delete unknown hydrogens



mol2seq


Extract amino acid sequence from molecule

Output file <filebase>.xxx

Flags

-ih Include HETATMs

-c Combine all sequences into a single file (all.seq)

-o_<format> Output format



mol2split


Fast split program to divide a mol2 file into smaller files without parsing the file

Default is 100 structures per file


Flags:

-s <style> Output style [numbered, fnumbered] (default: fnumbered)

-n <number> Number of structures in each output file (default: 100)

-d <dir> Output directory (default: working directory)



mol_add_h


Add hydrogens to a molecule


By default, a protonation state is produced which approximates physiological state. Using the -v flag will fill all valences. ie Will add one hydrogen to a carboxylic acid and 3 hydrogens to ammonia


Adds only polar hydrogens if the 'polar' flag is used. Adds hydrogens to carbon only if 'nonpolar' flag is used


Flags:

-polar Add only polar hydrogens

-nonpolar Add only nonpolar hydrogens (hydrogens on carbon)

-v Fill valence

-d File containing atom names and connectivities ($SILICO_HOME/data/amino_acid_atoms.dat)

-check Run mol_check on generated structures



mol_add_lp


Add lone pairs to a molecule


Flags:

-o_<format> Output format

-O_<filename> Output filename



mol_amides


Constructs a plot of dihedral angles w and w' for secondary amides


Flags:

-p Print a hardcopy instead of producing an output file

--print Ditto

-o <format> Output file format (default: ps)

--output-file-format=<format>

-g Path to grace executable

--grace-executable Ditto



mol_centre


Translate a molecule to 0,0,0



mol_characterise


Calculate molecular weight, molecule extents and other molecular properties


Properties:


Number of atoms

Number of bonds

Number of rotatable bonds (see subroutine mol_count_rot_bonds)

Molecular weight

Number of C, H, N, O, P, S, halogen and other atoms.

Number of rings up to size 10 (Note that the number of rings is not quite the way a chemist would see it - eg Naphthalene has 3 rings)

Number of planar rings

Number of H-bond donors (see subroutine mol_find_donors_acceptors)

Number of H-bond acceptors

Molecule name


Adds hydrogens first by default


Data is written to a file '<filebase>_mc.out' in tab delimited format and into SDF data fields if -sdf flag is set


Flags

-sdf Write out an sdf file containing molecule structure and calculated data (on by default)

-hadd Add hydrogens (on by default)

-t Write data to tab delimited text file

-replace Overwrite original sdf file (only if -sdf is set)

-force Overwrite preexisting output files



mol_charge


Find the total partial and formal charges on molecle


Flags:

-formal Calculate formal charges

-r Do not provide a charge breakdown by residue

-s Do not provide a charge breakdown by segment



mol_check


Run a series of sanity checks on a molecule


Calls ensemble_check (this principally checks the integrity of the internal Silico data structures), mol_check_atom_overlap (to find badly overlapping atoms), mol_check_valences (to find atoms with an incorrect number of connected atoms)

Writes out a mol2 file with subsets ERROR, OVERLAP, BONDLENGTH and VALENCE containing any atoms with errors


It would be desirable to check for poor bond lengths and angles as well


Flags

-ce Check atom elements (default on)

-co Check for atom overlap (default off)

-cv Check atom valences (default on)

-ca Check for aromatic bonds in non-aromatic systems (default on)

-cb Check bondlengths (default on)

-cr Check for distorted aromatic rings (default on)

-amide Check for cis-amides and distorted trans amides (default on)

-a Run all checks

-noconnect Do not create connection table

-nofile Do not write an output .mol2 file for each input file



mol_chop_box


Chop off bits of a protein that are outside the box defined in box.pdb


Protein chains are terminated using chemically sensible groups.

Output files <filebase>_inbox.pdb and <filebase>_exclude.pdb


Used for dock setup



mol_combine


Combine separate molecules into a single molecule


By default combines all the molecules on the command line in to one single molecule. Each molecule is given a separate SEGID


If the -p option is used to specify a 'parent' molecule, then each molecule specified on the command line will be combined separately with the parent molecule. This is useful for combining many ligands with a single parent protein to produce complexes.


The -glide option will combine glide _pv.maegz files to produce multiple receptor/ligand structures in individual files. If the -o pdb option is chosen then the ligand will have HETATM records to be compatible with ligplot.



Flags:

-o_<format> Output file format

-p_<parent protein> Combine all molecules with parent (usually protein) molecule

-glide Combine all molecules with first molecule in file. Use with glide .pv or .raw files

-ra Renumber atoms in output file

-rr Renumber residues in output file



mol_connect


Test script for bond creation routines. All methods should give the same results.


Flags:

-o output format (default pdb)

-c connect atoms routine to use



mol_cubic_crystal


Generate all neighbours of a cubic crystal


Applies a 180 degree rotation about a specified axis and 27 translations to give a total of 54 molecules.


Flags:

-x_<val> }

-y_<val> } cell lengths

-z_<val> }

-rx }

-ry } Rotate by 180 degrees about each of these axes

-rz } (can be used together)



mol_del_atoms


Delete atoms from a file


Atoms are selected on the basis of their names, their elements or their residue names.

Name, element and residue name options accept comma separated values, which are ORed. For example, -e C,N would mean "delete carbons or nitrogens".


The * wildcard can be used in atom or residue names, but it must be enclosed in quotes to escape the shell. For example, mol_del_atoms -a N'*' file.mol2 will delete atoms whose names start with N in all molecules in file.mol2.


The various criteria are ANDed. For example, -r HOH -e O,H would mean "delete all atoms whose residue name is HOH and which are oxygens or hydrogens".


If any given criterion is left blank, any value for that criterion is considered acceptable.


Flags:

-a Atom names (comma separated list)

-e Atom elements (comma separated list)

-r Residue names (comma separated list)



mol_del_dummy


Delete all dummy atoms from a file (eg lone pairs).


Output file <filebase>_nodu.<ext>


Flags:

-con Force regeneration of connection table



mol_del_duplicate_atoms


Delete atoms which occupy the same point in space.



mol_del_excess_solv


Remove excess solvent from a file.


Flags:

-d Distance from molecule to leave solvated (defaults to 10 Angstroms)



mol_del_h

Delete all hydrogens from a file.

Output file <filebase>_noh.<ext>


Flags:

-res Delete hydrogens on a particular residue name

-n Delete all nonpolar hydrogens



mol_del_nonpolar_h


Delete nonpolar hydrogens from a file


Nonpolar charges are defined as being attached to carbon

Any charges on the hydrogens being deleted are transferred on to the parent atom.

Output file <filebase>_polarh.<ext>


Flags:

-con Force regeneration of connection table



mol_del_res


Script to delete residues from each molecule in a file.


Residues may be identified either by number or by name, or both. If both name and number are used, residues will be deleted if either criterion is matched, unless the -b flag is used.


Flags:

-a Residue names to delete (comma separated list)

-n Residue numbers to delete (comma separated list, ranges accepted)

-b Both name and number must be matched to delete a residue

-v Inverse operation (i.e., keep matching residues and delete everything else)



mol_del_water


Delete water molecules from a file (or alternatively delete nonwaters)


Writes out _dry file containing unsolvated molecules or _wat file containing waters

Note: Water is defined has having a residue name that starts with TIP or HOH


Flags:

-n Negate. ie write out waters instead of nonwaters



mol_divide


Separate a multi-molecule file (eg Tripos mol2 or Schrodinger mae) containing into individual files, each containing one molecule.


Files are put into a directory called <filebase>.dir. Each molecule is renamed with an Insight-safe name. The default behaviour renames the output file to match the molecule name. Other file naming styles can be selected using the -s flag


To separate a single molecule into multiple structures see 'mol_split'


For really big files (multiple thousands of structures) consider using sdfsplit, mol2split or pdbsplit which do not parse the file and are much faster


For convenience the script makes a pymol load script 'load.pml' in the output directory. Run this script in pymol to load all the files


For more control over splitting PDB files by chain, waters etc see 'pdbsplit'


Flags:

-stride_<val> Write out structure every 'val' steps

-n Starting number for renumbering

-s Output style for file names. molname: molecule name (not checked for duplicates!). molname_i 'insight_safe' molecule name. numbered: numbered, no leading zeros, fnumbered: numbered, leading zeros

-o Output format

-force Overwrite existing output

-p Make pymol load script 'load.pml' in output directory



mol_ensemble_average


Given a number of molecules of the same composition as input, write out a molecule where the position of each atom is averaged over the whole ensemble.


Flags:

-l Consider the largest fragment only

-t Translate the molecule (or largest fragment if -l) to centre of mass



mol_extents


Calculate the maximum and minimum X, Y and Z coordinates of a molecule and the centre point.

Values are written to STDOUT



mol_filter


Filter a set of molecules by SDF property and/or molecular weight. Can be used to retain molecules with a unique SDF_CODE


Molecules meeting critera are written out to <filebase>_flt.<ext>


Flags:

-p SDF property name

-max Property maximum value

-min Property minimum value

-mwmax MW maximum value

-mwmin MW minimum value

-druglike Select druglike compounds

-leadlike Select leadlike compounds

-fragmentlike Select fragmentlike compounds

-u Retain only a single representative with this ID

-np Use noparse option (much faster, but currently only available for sdf files)



mol_fp


Generate silico fragments for a molecule


Fragments are written to <filebase>_frag.dat


Flags:

-fmax Max fragment size

-fmin Min fragment size

-at Fragment atom typing [element (simple typing by atomic element, default), none (all atoms have the same type)]

-bt Bond typer [simple (single, triple and double/aromatic bonds), none (all bonds have the same type) ]

-h Include hydrogens in fragments [none (default), polar, all]



mol_get_name


Extract molecue names from a file and print them to the screen


Flags:

-f_<string> SDF field to use for molecule name



mol_hydrogen_bonds


Find all hydrogen bonds in a system.


Uses the hydrogen bond definition developed by McDonald and Thornton (J. Mol. Biol. 1994, 238, 777-793) which specifies maximum distances and angles for A...H-D and A..D.


Note that to increase the H-bond distance, you must increase both the A..H-D and A..D distances.


The default output filename is derived from the first. This can be changed using the -O flag.


Flags:


Hydrogen bond parameters


-d_<val> Maximum Donor-Acceptor distance (default 3.9 Ang)

-h_<val> Maximum Hydrogen-Acceptor distance (default 2.5 Ang)

-a_<val> Minimum Donor-Hydrogen-Acceptor angle (default 90 deg)

-b_<val> Minimum Hydrogen-Acceptor-Substituent angle (default 90 deg)


Timestep parameters


-ts_<val> Timestep between files (ps)

-i_<val> Time at first file (ps)

Atom/molecule selection options

-ignh Ignore hydrogen atoms [do not use: not yet implemented]

-wat Include water molecules


Input file options


-copy Copy data from first molecule to subsequent molecule. This is good for MD trajectories and series of PDB files

Output file options

-energy Print molecular energies to output file

-helix Print numbers of i-i+3 and i-i+4 H-bonds for each structure

-write Write out a file containing each input structure. Atoms involved in hydrogen bonds are contained in Sybyl sets

-writehb_<val> Write out only those molecules containing >= val H-bonds. Atoms involved in hydrogen bonds are contained in Sybyl sets (It may be useful to use the -o mol2 flag with this option)

-list List all hydrogen bonds in each structure

-ens Summarise the number of times each H-bond was found in output file. Assumes that all input molecules are members of an ensemble



mol_label_fg


Test script to test the subroutine 'mol_label_functional_group'


Flags:

-aa Label aa backbone

-het Label heterocycles



mol_merge


Merge all molecules in a file in to a single molecule


Flags:

-n Do not rename residues



mol_mw


Calculate molecular weight and molecule extents


MW data is calculated for the parent molecule (molecule with most atoms in file)

Values are printed to standard output and added to output file as SDF_DATA


Flags:

-addh Add hydrogens

-v Fill valence

-print Print data to a file 'mw.txt'

-min Minimum molecular weight to output (ie skip molecules below this mass)

-max Maximum molecular weight to output (ie skip molecules above this mass)



mol_rama


Constructs a Ramachandran plot for a molecule containing Alpha Amino acids


Flags:

-force Force overwrite of existing output files (default: off)

-o Output format (default: PostScript)

-print Print a hardcopy (default: off)

-residue Make one plot for each residue

-debug Print extra debugging information (default: off)



mol_rename


Rename molecules


Can also:

change the molecule name to the filename

change the molecule filename to match the molecule

rename the molecule using a specified SDF data field.

Using both the -r<datafield> and -c flags can be used to change filename to the specified SDF data field.

Can generate 'insight_safe' names. i.e. So that they do not contain spaces, punctuation, start with an underscore or a digit and are of limited length.


Flags:


Multiple changes

-g_<base> Generate new name using <base>, add a number and change SDF Name field (same as -b, -n, -s)

-mips_<start> Generate new MIPS code and set SDF Code field starting from suplied number

-b Set this molecule base name

-f Change molecule name to filename\

-r_<string> Rename molecules using the <string> SDF Data field

Options the modify the molecule name

-safe Use insight-safe names (ie that do not contain spaces, punctuation, start with an underscore or a digit and are of limited length)

-n Add a number to the end of the name


Options that modify molecule data (SDF data)

-s Transfer the molecule name to the SDF Data fields 'NAME' and 'title' : -sdfield_<field> Change specified SDF field to molecule name


Options that modify the filename

-c Change output filename to name of first molecule

-k Keep the same filename, overwriting the input file


Other options

-np Use noparse option (much faster, but currently only available for sdf files)




mol_renumber


Renumber and/or rename atoms and/or residues in a molecule


Flags:

-a <Starting atom number> Renumber atoms starting from this number

-s <Starting residue number> Renumber residues starting from this number

-c <Starting chain letter> Relabel chains starting from this letter

-ra Rename all atoms. Heavy atoms are numbered from 1 hydrogens are named according to the heavy atom they are connected to

-rr Rename atoms within each residue. Heavy atoms are numbered from 1 hydrogens are named according to the heavy atom they are connected to

-simple Rename atoms in residues. All atom elements are renumbered from 1



mol_rescale_bonds


Rescale a molecule so that the carbon-carbon bonds have a reasonable average bond length.


This defaults to 1.5 Angstroms, however, it can be adjusted through use of the -l flag. Alternatively, a scaling factor can be used by means of the -f flag.

This is useful to clean up files that have come out of Isis databases before they are minimised by some other program (eg Insight).


Flags:

-f <factor> Scaling factor (not to be used with -l)

-l <length> Target C-C bond length (not to be used with -f)

-o <format> Output format

-split Split multiple molecule files into a separate directory. Each molecule is renamed with an "Insight safe" name



mol_rot


rotate a molecule about any vector


-x_<number> }

-y_<number> } Vector to rotate around (assumed to pass thru origin)

-z_<number> }

-test Testing routine: rotate the first molecule only about the axis by 60 degrees

-a_<number> Angle to rotate molecule through

-random_<number> Generate this number of randomly rotated molecules

-maximise Approximately maximise the extents of the molecule along the X and Y axes (ie align the major axis along the X axis and the medium axis along the Y axis).

SF



mol_rot_bond


Set the torsion angle between four atoms to a specified value.


The four atoms are not necessarily bonded to each other, however it makes more chemical sense if they are (provided the middle two are not in a ring). If the middle two atoms are in a ring, nothing will be done.


Flags:

-a_<number> Atom A

-b_<number> Atom B

-c_<number> Atom C

-d_<number> Atom D

-w_<number> Desired torsion angle (degrees)



mol_rotrans


Apply rotations and/or translations to a file


The rotation (about the X, Y or Z axis) is applied first, followed by the translation


Flags:

-x_<number> }

-y_<number> } X, Y and Z rotation angles in degrees

-z_<number> }

-a_<number> }

-b_<number> } X, Y and Z translation distances in Angstroms

-c_<number> }



mol_segment


Split all molecules in file to separate molecules (based on connectivities) and recombine them in to a single molecule. Each molecule is placed in a separate segment (M001 ... MXXX).



mol_size_shape


Calculate the size and shape of each molecule (by connectivity) in a file

Default is to exclude small molecules (< 10 atoms)


Flags:

-s Minimum size of molecules to include (default 10)



mol_smiles


Generate a SMILES string for a molecule


Smiles string is added to 'smiles' record of SDF_DATA in output file


Flags:

-b Include explicit bond orders

-h Include explicit hydrogen atoms

-k Use Kekule bonds and non-aromatic atom symbols



mol_solvate


Make a solvated box around a molecule with a density of 1. The result needs to be minimised!


Only one file may be supplied as an argument.


Default water residue name is HOH with atoms labelled OH2, H1, H2. Using the -amber flag produces residue name WAT with atom names O, H1, H2 Using the -gromacs flag produces residue name SOL with atom names OW, HW1, HW2


Note that using the default density of 1 g/mL to solvate proteins or bilayers will probably overestimate the number of water molecules required to produce a realistic total system density


Usage: mol_solvate <file> [<flags>]


Flags:

-x_<number> }

-y_<number> } Dimensions of the box in each direction

-z_<number> }

-f_<file> File containing solvent molecule to add. Water will be used if no file is supplied.

-r_<string> Residue name to call solvent molecules (default is read from file, or HOH)

-g_<number> Margin to add around molecule

-n_<number> Add this number of solvent molecules

-d_<number> Required density (default 1 g/mL)

-p_<number> Min packing distance between solvent molecules

-i Observe packing distance only for solvent-solute distances, not solvent-solvent

-t Translate solute molecule to coordinate origin

-b_<number> Solvate bilayer. ie Don't put solvent molecules within <number> A of bilayer plane. Default is XZ. XY plane if -xy flag is used

-xy Solvate bilayer in the xy plane

-amber Give water molecules AMBER names (resname WAT, Oxygen O, hydrogens H1, H2), and also adds TER after each water molecule

-gromacs Give water molecules GROMACS names (resname SOL, Oxygen OW hydrogens HW1, HW2),

-chain Each solvent molecule in its own chain. (-amber will also enable this option.)



mol_sort


Script to reorder the atoms in a file


Atoms are sorted by chain, residue number, atom number

Contains an option to reorder residues within a file. This option resets the CHAIN and SEGID identifiers to prevent undesired effects in the sorting routine.


Flags:

-r Rearrange residue order (takes input from command line)



mol_sort_fp


Calculate average Tanimoto coefficients of molecules within a set of compounds and sort the output by average Tanimoto coeff


Flags:

-w Write fragments into SDF Data

-fmax Max fragment size

-fmin Min fragment size

-at Fragment atom typing [element (simple typing by atomic element, default), none (all atoms have the same type)]

-bt Bond typer [simple (single, triple and double/aromatic bonds), none (all bonds have the same type) ]

-h Include hydrogens in fragments [none (default), polar, all]



mol_split


Split a molecule file into separate molecules based on connectivities.


Molecules are named using the RESIDUE name of the first residue by defualt

To separate a multi-molecule file into individual structures see 'mol_divide'

Known Bugs -d option does not work properly with pdb files


Flags:

-s Name molecules by by SEGID

-l Keep each molecule's largest fragment only

-d Write each output molecule as a separate file

-min Retain only molecules with at least this number of atoms



mol_split_segid


Split a molecule file into separate molecules based on SEGID. Molecules are written out to a single file. Molecules with no defined SEGID are assigned to the SEGID 'NONE'.



mol_wrap_cell


Wrap all molecules back in to a unit cell. A given atom can be selected to centre the system around using the -a flag. Otherwise, the centre of the largest fragment will be used. Useful for molecule dynamics output where some molecules have wandered out of the unit cell


Flags:

-ignore Ignore unit cell dimensions in individual files

-x_<number> Default unit cell's X dimension

-y_<number> Default unit cell's Y dimension

-z_<number> Default unit cell's Z dimension

-i_<file> Index file

-a_<number> Atom number to centre the system around

-g_<string> Name of index group to centre the system around

-t Translate the centre to (0,0,0)



namd_fix_back


Set the occupancy field of a pdb file for use as a NAMD constraint file:

Backbone atoms are constrained with the force constant given by -c.

All other atoms are free.


Flags:

-c_<number> Constraint (kcal/mol/Ang^2) (default 1)



namd_fix_heavy


Set the occupancy field of a pdb file for use as a NAMD constraint file.


Hydrogen and water atoms are free.


Sodium and Chlorine (ie salt) atoms are also free.


All other atoms are constrained with the force constant given by -c.


Flags:

-c_<number> Constraint (kcal/mol/Ang^2) (default 1)



namd_write_consref


A script to write out a constraint reference file (PDB format) for any molecule

Also includes a force constant to use


Flags:

-a_<integer> Reference Atom Number (defaults to 1)

-k_<number> Force Constant (defaults to 0.09)

-d_<number> Dihedral Force Constant for amides (defaults to 1000)



name_atoms_simple


Rename atoms in a molecule using a simple scheme



pdb_rename_hydrogens


Rename hydrogen atoms so that they have the correct PDB nomenclature. Using the -charmm flag will produce charmm27 atom names. Using the -cyana flag will produce cyana2 names.


Note: Particular attention must be paid to the delta-carbon of isoleucine residues, which is also renamed. The -charmm flag will name ILE CD as CD. The -cyana flag will name ILE CD as CD1.

C-terminus and N-terminus hydrogen names are currently not generated.


Flags:

-charmm Generate charmm27 atom names

--charmm-atom-names

-cyana Generate cyana2 atom names

--cyana-atom-names

-f <filename> Filename containing atom names and connectivities

--datafile=<filename>

-d Delete any hydrogens that can not be given a name

--delete-unknown-h

-o <format> Output file format (default PDB)

--output-file-format=<format>

David K. Chalmers, 3 February 2000



pdbsplit


Split a PDB file into smaller pieces based on TER or MODEL records without parsing the file


Use:

pdbsplit file.pdb -s fnumbered produces file_00001.pdb, file_00002.pdb, etc

pdbsplit file.pdb -s numbered produces file_1.pdb, file_2.pdb


Flags:

-s Filename output style

-d <dirname> Output directory

-end Split on END records

-endmdl Split on ENDMDL records

-ter Split on TER records

-chain Split at end of each chain/SEGID. Name output files by chain

-all Split at all of the above (default but unset by selecting one of the above)



radius_of_gyration


Calculates the radius of gyration for a series of molecules


Flags:

-ts_<number> Timestep between files (ps)

-i_<number> Time at initial file (ps)

-a Use all fragments (not just the largest one) to calculate radius of gyration

-g_<string> Use only atoms in index group "string"

-n_<file> Index file in which to find the group



random_box


Fill a cell with molecules in random orientation. An existing molecule file can also be embedded in the box (solute).


Use

random_box -x 10 -y 10 -z 10 -n <number of molecules> <solvent filename> -e <molecule to embed>

random_box -d <required density of system> <solvent filename>

random_box -d <required density of system> -w <weight percent of solvent> <solvent filename>



Flags:

-x }

-y } Dimensions of the box (Angstroms)

-z }


-cx }

-cy } Centre of the box (Angstroms)

-cz }


-e_<filename> Embed molecule filename

-n_<number> Number of molecules to add

-d_<number> Calculate number and add molecules to give this density of entire molecular system (including solute)

-wp_<weight percent> of solvent (this modifies the density value - final density = density * weight-percent/100)

-o_<format> Output format (default: Gromacs .gro)

-ignore_h Ignore hydrogen atoms when placing molecules

-t Translate embed molecule to centre (default on)

-md Minimum distance between molecule heavy atoms

-mc Maximum number of clashes allowed when adding new molecules

-mci Increment maximum number of clashes after this number of unsuccesful trials




randomise_conformation


Produce a random conformation of a molecule by performing arbitrary rotations about bonds.


Flags:

-n <number> Maximum number of rotations to perform

--number-of-rotations=<number>

-d <distance> Distance below which two atoms clash

--clash-distance=<distance>

-min <number> Maximum number of steps to minimise in Sybyl

--minimisation-steps=<number>

-r Calculate RMS to original molecule after each rotation

--rms-values

-o <format> Output format

--output-file-format=<format>



read_write_charmm_rtf


Generate CHARMm topology file (rtf) from input file.


Currently designed to work on residues as separate molecules. Separate residues are converted to GROUPS

Also writes pdb and mol2 format files


Flags:

-p,--parameter-file Charm parameter (prm) file



read_write_cml


Read any silico format and write a Chemical Markup Language format file.

Under development and incomplete



read_write_merck


Read any Silico format and write a Merck format file


Flags:

-b,--regenerate-bondorders Regenerate bondorders



read_write_mmod


Read a molecule and write a Macromodel format file.



read_write_mol


Test script to read any Silico format and write (by default) in the same format.


Flags:

-o_<format> Output format

-O_<filename> Output filename

-check Run molecule check



read_write_mol2


Read any silico format and write a mol2 file.


Flags:

-r Rename molecules using SDF data field for conversion from SDF to mol2

-mm Write output in MolMol Mol2 format.

-p Write output in Mol2 Protein format.

--single Use single molecule read/write routines

-dr Print additional debugging information for rings



read_write_mopac


Read any Silico format and write a MOPAC cartesian file.



read_write_pdb


Read any Silico format and write a pdb file


Flags:

-d Delete disordered (ALT) atoms

--single Use single molecule read/write routines

-debug



read_write_rtf


Test script to read and write CHARMm rtf files



read_write_sdf


Read any silico format and write an sdf file.


Optionally add a 'name' field, rename the molecule or remove SDF data


Flags:

-s Starting structure number

-n Number of structures to read

-r SDF data field to use if renaming molecules

-a Add 'name' field to SDF_DATA using molecule name encoded in the first line of the file

-clean Remove all sdf data (except name)

-noparse Do not parse SDF data (Only works with SDF input files)

--single Use single molecule read/write routines



read_write_seq


Sequence format test script


Silico protein/DNA sequence format routines are under development and incomplete


Flags:

-c,--combine Combine sequences from all files to a single file

-n,--number-of-residues Number of residues per line in output



renumber_residues


Renumber residues in a file. Number SUBCOUNTs sequentially from 1, and make the SUBID for any atom the same as the SUBCOUNT for that atom. Optionally, use a different start and a different increment.


Molecule is sorted before renumbering. All hydrogens are forced to have the same residue name, residue number, chain and segid as their parent heavy atom


Flags:

-s_<number> New starting residue number (default 1)

-i_<number> New increment (default 1)



residue_rename


Change a single residue name


Rename a single residue type


Flags:

-a Rename all residues

-e_<residue_names> List of residue names to change

-n_<numbers> List of residue numbers to change

-r_<residue_name> New residue name



rms


Calculate RMS distances between molecules without superimposition


The first structure in the first file is used as the reference structure. The RMS distance is calculated to all subsequent molecules. Output is written to <ref_file>.rms


Heavy atom RMS is calculated by default. The -all flag can be used to include hydrogens in the calculations


Flags:

-a Use all atoms including hydrogens to calculate RMS. Uses heavy atoms by default

-s Sort atoms into smiles order before doing RMS comparison. This may be useful if molecules have different atom orders.

-w Write out file containing RMS as SDF_DATA



scale


Scale a molecule by a factor


Flags:

-f <number> Scale factor

-o <format> Output file format (default: input format)



sdf_add_id


Script to add an identifier field (SILICO.ID) to an sdf file.


By default the identifier of the format XXdddddddd where XX are random letters and dddddddd is an eight digit integer starting from 00000001


Flags:

-i <field> SDF DATA field containing identifier data

-r <field> SDF DATA field to use when renaming molecules



sdfsplit


Fast split program to divide an sdf file into smaller files without parsing the file


Use:


sdfsplit file.sdf -s fnumbered produces file_00001.sdf, file_00002.sdf, etc

sdfsplit file.sdf -s numbered produces file_1.sdf, file_2.sdf

sdfsplit -r <CODE> renames molecules using the data in field <CODE>



Flags:

-r <datafield> Rename molecules using the given SDF data field

-s (numbered / fnumbered) Output style

-n <num> Number of structures to write to each output file (default 1)

-d <dirname> Output directory



seq_similarity


Calculate pairwise identity and strong and weak homologies for a set of sequences

Uses the strong and weak conserved groups described in the ClustalX documentation corresponding to the *, : and . labels output by Clustal


Note: Silico protein/DNA sequence format routines are under development and incomplete


Flags:

-c Combine sequences from all files to a single file (all.XXX)

--combine Equivalent to -c

-o Output format

--output-format Equivalent to -o



shape_tensor


Calculates the gyration tensor, the shape ellipsoid, and three shape descriptors of a series of molecules.


Flags:

-ts Timestep between files (ps)

-i Time at initial file (ps)

-e Write mol2 file containing molecule and shape ellipse



slurp


Test for slurp routine which reads molecule files into simple text strings



stacking


Calculates the extent of planar ring stacking in a molecule system


Flags:

-f_<string> Output text file suffix (defaults to _stacking.dat)

-ts_<number> Time between files (ps)

-i_<number> Time at first file (ps)

-sd_<number> Stacking distance cutoff (Angstroms)

-sa_<number> Stacking angle cutoff (degrees)

-cd_<number> Clustering distance cutoff (Angstroms)

-size Maximum stack size to record in a separate column

-write Write out a Mol2 file for each input file

Ring type flags (at least one must be used)

-p Count stacks of all planar rings

-b Count benzene stacks

-n Count naphthalene stacks



starmaker


All-purpose dendrimer builder.


Written by David Chalmers and Ben Roberts


Use: starmaker5 core.mol2 monomer.mol2 monomer.mol2 monomer.mol2 ... cap.mol2


Each mol2 file should contain the monomer for that generation. Each monomer needs to have attachment points. These are hydrogen atoms named Q1 (point to branch FROM) or Q2 (point to attach TO). The first (core) subunit should contain at least one Q1 hydrogen. A standard monomer should contain one Q2 hydrogen and at least one Q1 hydrogen. A capping group should contain one Q2 hydrogen.


Outputs a mol2 file 'final_dendrimer.mol2' and intermediate states as layer_XX.mol2

Main features:


The dendrimer is built layer by layer. The first residue in the list is named COR. Subsequent residues are named GAA, GAB, GAC, etc. Amide bonds in each monomer are recognised and converted to a trans geometry. A repulsion potential and random torsional search is used to force monomers to grow in a extended geometry.


At the completion of each layer the molecule is minimised using Sybyl (although this can be turned off using a flag).


Additional features:


A file 'stop' in the working directory will terminate the program

Flags

-rc Force first residue to be renamed to 'COR'.

-clash Distance below which two atoms clash

-noh Do not include hydrogens in geometry optimisation (default)

-cut Distance above which the repulsive potential is ignored

-iter Maximum number of iterations when optimising geometry (no clashes)

-conv Number of steps with the same best coordinates before optimisation stops

-min Minimise with Sybyl for this maximum number of steps (default 5000)

-wc Write out structure after each monomer to file current.mol2

-wi Write out structure after every optimisation step



statistics


Reads in and calculates statistics (various measures of centre and spread) for a column of values in each of one or more files.



tabulate


Output data from SDF_DATA fields to a tab delimited text file and produce histograms of data values using gnuplot


Designed to act principally on SDF files but will also work with some data (DOCK and PMFSCORE output) read from comment lines in mol2 format



unit_cell_size


Report the individual volumes and mean volume of the unit cells of a series of molecules.


Flags:

-i Initial time (default 0)

-ts Time step between files



water_to_ion


Replace random water molecules (single atom) counter ions. Defaults to Na ions. Covers a very large range of inorganic ions.


Flags:

-e_<string> Ion element (default Na). Now handles a large range of single-atom cations and anions.

-n_<number> Number of ions to add

-r_<string> Residue name of added ions (optional)

-a_<string> Atom name to use for ions (optional)

-amber Add PDB TER record after each ion to produce input for Amber



write_mol


Test script to read any Silico format and write (by default) in the same format.


Flags:

-check Check integrity of molecules



write_seq


Sequence format test script


Silico protein/DNA sequence format routines are under development


Flags:

-combine Combine sequences from all files to a single file (all.XXX)

-nodup Remove duplicate sequences

-nostop Remove any amino acid sequences containing a stop ($)

-clean Clean up sequences (equivalent to -nodup -nostop)

-nl_<number> Number of residues per line in output

-split Split sequences into individual files

-sort Sort sequences alphabetically by name