Scripts

abstract2seq

Run Jay Ponder's 'abstract' program from his Sleuth package on a pdb file and convert the result to a sequence

Abstract calculates Kabsch and Sander secondary structure, polar and buried surfaces

Note: This will fail if the pdb file contains disordered atoms

Flags:

-c_<string> Chain to abstract (all = all chains [default], blank = blank chains)

-force Force overwrite of existing output

adf

Calculate angular distribution functions of atoms about the centre of mass of a group

At the moment, this script is extremely simplified. It only takes one reference group and one target group per invocation, and includes no support for periodic boundary conditions.

Flags:

-i_<file> Index file

-r_<string> Reference group (present in index file)

-t_<string> Target group (present in index file)

-s_<num> Bin size (angle in degrees)

-n_<num> Number of bins

-O_<string> Output file name

atom_rename

Change a single atom name or rename a single atom in a particular residue type

Flags:

-eres_<string> Existing residue name

-ename_<string> Existing atom name

-enum_<number> Existing atom number

-new_<string> New atom name

autocorr_angle

Flags:

-g1_<string> Group 1 (present in the index file)

-g2_<string> Group 2 (present in the index file)

-i_<file> Index file

-ts_<num> Time step between molecules (ps)

-m_<num> Maximum interval (delta-t) as proportion of the entire window (default 0.1)

-O_<string> Output file name

autocorr_distance

Flags:

-g_<string> Group (present in the index file)

-i_<file> Index file

-ts_<num> Time step between molecules (ps)

-m_<num> Maximum interval (delta-t) as proportion of the entire window (default 0.1)

-O_<string> Output file name

average2

Reads in two or more whitespace-delimited files and averages their contents.

Each file is assumed to have two data columns, and comment lines escaped using "#". Of the data columns, the one on the left is the independent variable, and that on the right the dependent.

At least two files are required.

backbone

Delete all non-backbone atoms from a molecule.

Uses atom labels to determine which atoms should be kept. Default set is {N, CA, C, O}. This can be further reduced to CA atoms only by use of the -ca flag, as below.

Flags:

-ca Keep only CA atoms

bilayer_builder

Build an approximate lipid bilayer using periodic boundary conditions. Molecules can be emedded in the bilayer.

By default, the bilayer is built in the XZ plane, using instances of a supplied lipid file. In this case, prior to use in bilayer building, the lipid should already be oriented along the Y-axis, with its head in the positive-Y direction.

To make bilayer builder more compatible with MD packages that require anisotropic simulations to have the bilayer in the XY plane, use the -xy flag.

Bilayers can be built with multiple constituent species. To do this, multiple passes are used. That is, instances of one species are built into a bilayer, and this bilayer is then "embedded" into a bilayer of the second species, and so on.

The distance between the two halves of the bilayer is controlled by the -s flag. This should be set to approximately the length of the lipid in the Y dimension. Usage: bilayer_builder <lipid_file> -x <cellx> -y <celly> -z <cellz> -n1 <number of lipids in first layer> [ -n2 <number of lipids in second layer> -e <molecule to embed> -s <separation between layers of lipids> ]

Flags:

-n1_<number> Number of instances of lipid to use to create monolayer 1 (top)

-n2_<number> Number of instances of lipid to use to create monolayer 2 (bottom)

-x_<number> |

-y_<number> | Dimensions of the box in each direction

-z_<number> |

-s_<number> Distance between the mid-points of the bilayers

-d_<number> Minimum distance between molecule heavy atoms

-r Randomise lipid dihedrals by this number of steps

-mt Maximum number of attempts to pack molecule (default 1000)

-ih Ignore hydrogen atoms when packing lipids (default)

-e_<filename> Embed molecule filename

-not Do not translate the embedded molecule to the centre

-xy Build bilayer in the xy plane

-test Test resulting bilayer for clashes

bsurface

Find number of atoms buried in a protein-ligand intermolecular interaction

Ligand atoms are classified as being buried if they are within <cutoff> distance of the protein and vice versa

Flags:

-p_<filename> Protein file name

-c_<number> H-bond distance cutoff

calc_time

Turns a specified number of seconds into human-readable time

Usage:

calc_time <time 1> [ <time 2> ... ]

check_complex

Check to see if a ligand overlaps with a protein

Use: check_complex <ligand> -p <proteinfile>

checkbox

Test to see if molecule(s) is (are) inside the box defined by box.pdb

Use: checkbox <ligandfile> -b<boxfile>

Used for programs like DOCK which do not always dock a molecule INSIDE the specified box. The default (Dock format) box is 'box.pdb'. 'Good' structures are written to <filebase>_ibox>.fmt and structures with atoms outside the box are written to <filebase>_obox.fmt

Flags:

-ih Ignore hydrogen atoms

-b box location (default box.pdb)

-force Force overwriting of output files

cluster

Cluster molecules using the very simple 'cluster' subroutine.

Use: cluster <file> [<flags>]

Writes out a representative structure from each cluster.

Flags:

-t_<number> Clustering threshold (default: 2)

-a Treat all input files as one set

cluster_fp

Cluster compounds based on a molecular fingerprint using the Tanimoto coefficient

Uses a very simplistic clustering algorithm

Use: cluster_fp <structurefile>

One compound is retained from each cluster. The retained compounds are biased towards lower molecular weight (strictly number of heavy atoms)

Flags:

-cutoff Tanimoto cutoff value (default 0.50)

-fmax Max fragment size

-fmin Min fragment size

-at Fragment atom typing [element (simple typing by atomic element, default), none (all atoms have the same type)]

-bt Bond typer [simple (single, triple and double/aromatic bonds), none (all bonds have the same type) ]

-h Include hydrogens in fragments [none (default), polar, all]

dipole_moment

Calculate a dipole moment

Flags:

-e Calculate electrostatic dipole moment

-f Spec file for custom dipole moments

-n Index file

-ts Timestep between files (ps)

-i Time at first file (ps)

-w Write out molecule and charge-centres in mol2 files

Notes:

-e and -f can be used together, but at least one must be used.

-n must be used if -f is used.

druglike

Discard compounds that do not fit a set of Lipinskiish critera for druglike properties

Mol_characterise must be run first

Flags:

-nh_<number> Maximum number of heavy atoms (default 30)

-fnh_<text> SDF field containing number of heavy atoms (default ".NUMHEAVY")

-nr_<number> Maximum number of rotatable bonds (default 7)

-fnr_<text> SDF field containing number of rotatable bonds (default "SILICO.NUMROT")

-nd_<number> Minumum number of Hydrogen bond donors (default 1)

-fnd_<text> SDF field containing number of Hydrogen bond donors (default "SILICO.NUMDON")

-na_<number> Minimum number of Hydrogen bond acceptors (default 3)

-fna_<text> SDF field containing number of Hydrogen bond acceptors (default "SILICO.NUMACC")

-l_<number> Maximum Log-P value (default 5.0)

-fl_<text> SDF field containing Log-P value (default "LOGP")

-no_<number> Maximum number of non-{C,H,N,O,P,S} atoms (default 0)

-fno_<text> SDF field containing number of non-{C,H,N,O,P,S} atoms (default "SILICO.NUMOTHER")

extract_lig_prot

Split pdb file into protein and non-covalently bound ligands

Structures are split up using connectivity

Molecules with fewer than 'maxatom' and more 'minatom' atoms are assumed to be ligands (This approach is a little simplistic. It assumes that there are no breaks in the protein chain. However it IS able to extract peptide ligands from proteins)

Flags:

-maxatoms (Maximum ligand size)

-minatoms (Minimum ligand size)

file_rename

Rename files using a perl regular expression substitution

For example file_rename -a _new <filename> will replace the string '_new' with '' and the file clozapine_new.mol2 becomes clozapine.mol2

Use:

file_rename <inputfiles> -a <exp1> -b <exp2> where <exp1> and <exp2> are strings or perl regular expressions

<exp2> is '' by default

Examples:

file_rename file_new.mol2 -a _new will rename file_new.mol2 to file.mol2

file_rename file.mol2 -a .mol2 -b .bak will rename file.mol2 to file.bak

file_rename ssss.mol2 -a 's*' -b x -re will rename ssss.mol2 to x.mol2

file_rename ssss.mol2 -a 's' -b x -re -s will rename ssss.mol2 to xsss.mol2

Flags:

-a_<string> string to replace

-b_<string> string to insert ('' by default)

-s Make only a single substitution of string a (not a global one)

-re treat string a as a regular expression

-force overwrite existing files

find_aggregate

Identify hydrophobic aggregates of molecules (eg components of micelles) in a periodic system. Molecules are identified by connectivity and can contain multiple residues. Two molecules are defined as belonging to the same aggregate if they have carbon atoms within the cutoff distance (-t flag).

The -move option moves all atoms into the unit cell (molecules are split when they cross the unit cell boundaries). This is useful for generating pictures in programs like Pymol

The script writes out a Gromacs format index file (ie atoms are indexted starting from 1).

Summary data is written to <first_filename>.out

Flags:

-t_<number> Maximum distance between C atoms in aggregate (default 4)

-move Move all atoms into the unit cell (range 0 -> cell size)

-x_<number> }

-y_<number> } Cell dimensions

-z_<number> }

find_amino_acids

Test script to find amino acids within a molecule by comparing residues to amino acid templates

Note: Requires that hydrogen atoms are present

Flags:

-w Write out files containing the newly labelled structure

find_close_water

Find water molecules that are directly hydrogen bonded to a protein

find_groups

Count functional groups in a molecule

find_max

Find the maximum and minimum extents of a molecule and make a box to enclose it

Box file is written as box_<filebase><counter>.pdb

find_rings

Test script to find rings, planar rings and aromatic rings in a molecule. Temperature factor and occupancies of the output pdb file are set. The Mol2 output file contains the rings, aromatic rings and planar rings in static sets.

Flags:

-r Maximum depth of search for rings

-o Write out molecule structure files with rings marked

-timing

-debug rings

find_similar

Identify duplicate or similar compounds in two sets of molecules using Tanimoto or Euclidian comparisons

Useage find_similar file1 file2 file3 ....

Designed to be used to filter docking results that are ordered from best to worst.

Fragments are generated using the silico fragment routines

Tanimoto coeff = Num common fragments / Num fragments in mol1 + Num fragments in mol2 - Num common fragments

Euclidian coeff =

Flags:

-s Scoring method (Tanimoto or Euclidian)

-cut Cutoff value for duplicate compound (default 0.80)

-dup Find duplicates (sets cutoff to 0.999)

-max Max fragment size

-min Min fragment size

-noh Ignore hydrogens in fingerprints (default)

-o<format> Output format

fix_rings

Attempts to find and fix rings with bonds through them

This assumes that all bonds are present in the structure file and that the structure is minimised to begin with.

The script focuses on bond length as a method of detection and finishes up with a second minimisation.

Flags:

-min_<number> Minimise with Sybyl for some number of steps (0 disables minimisation)

-r Rename molecules, giving them the filebase as the new name

fix_types

Attempts to fix Sybyl atom types that have been broken in a minimisation, looking especially for atoms with aromatic bonds that have not been given aromatic types.

Flags:

-r Rename each molecule in a file, giving it the filename

flatten

Squash molecules flat into the XY plane

formatdoc

Print formatted comments from silico files

Flags:

-h Write HTML output to a file

-s Generate subroutine descriptions

get_near_res

Work-alike for Dock get_near_res and invertPDB

Only takes one argument at a time.

Flags:

-p_<file> Protein filename

-c_<dist> Cutoff (default 15 Angstroms)

getcell

Make a box from cell coordinates

hydrodynamic_radius

Calculates the hydrodynamic radius for a series of molecules

Flags:

-ts Timestep between files (ps)

-i Time at initial file (ps)

libclean

Clean up structures taken from the Available Chemicals Directory or other ISIS databases (A poor man's Concord)

Assumes that molecules are 'flat' and have ISIS chirality descriptors. Sybyl is used to convert 2D structures to 3D by minimisation using either the Sybyl or MMFF forcefield. Produces an output file with a descriptive name if an error occurs. Retains molecular data from SDF files

Steps:

1. Discard counter ions and small molecules by retaining only the largest molecule in the input structure

2. Scale bonds to approximate sensible values

3. Delete any _polar_ atoms so that an approximate physiological protonation state can be produced

4. Check that all atom elements are real elements (Not Du, R, etc)

5. Add hydrogens using Sybyl (known to have problems with nitro groups and other things) or silico which approximates physiological conditions for common functional groups - carboxylic acids are deprotonated, aliphatic amines are protonated. Hydrogens added using Silico have approximate geometries and the resutling structues should probably be minimised.

6. Attempt to produce approximately correct stereochemistry for atoms marked as chiral (this usually fails for complex molecules)

7. Randomise atom Z positions slightly to stop Sybyl getting stuck on saddle points

8. Minimise molecule with Sybyl (optional)

9. Check for gross problems with the molecule

10. Write out the result <filebase>_cl_<number>.<ext>

Flags:

-flat Use this flag if the input structures are 2D

-addh Add hydrogens with 'silico' or 'sybyl'

-min <steps> Number of steps for minimisation

-ff Sybyl forcefield: Tripos FF and Gasteiger-Marsilli charges (gm) or Merck FF and charges (merck)

-mhadd Minimise only if change in number of hydrogen atoms (Silico H addition only)

-sybexe <sybyl_exe> Location/name of Sybyl executable. This is usually found automatically by checking TA_ROOT

-force Force overwriting of output files

-rename Rename molecules using SDF data field

-o <format> Output format

Files:

sleep A file called sleep in the operating directory will cause the job to go to sleep while it is there

stop A stop file will cause the job to stop

make_box

Produce a box file '<filebase>_box.pdb'. If the input file has a unit cell predefined, that will be used. Otherwise, a box will be made large enough to enclose the first molecule.

Unit cells are encoded in some file types, for example Gromacs, Mol2, PDB

Flags:

-i Ignore existing unit cell data

make_index

Create a an index file in Gromacs or DCD format.

Can use a versatile atom selection language and will also write out files containing selected atoms

Flags:

-z Number atoms from zero, as for DCD files (Note if you wish to use this file with catdcd, you will need to edit the resulting file to contain only a single index group and remove all lines containing square brackets

-as_<string> Use Atom Specifier (see below)

-g Create an index group containing all atoms

-a Create a separate index group for each atom

-e Create an index group for each element

-r Create an index group for each residue

-seg Create an index group for each segment

-w Create separate index groups for water and not water

-set Create an index group for each Mol2 atom set

-an_<atom_names> Create index groups for listed atom names

-rn_<res_names> Create index groups for listed residue names

-d Analyse the molecule as for a Starmaker dendrimer (create index groups for COR, GAA, GAB,...)

-write Write out a file containing atoms in index groups. The output file contains structures corresponding to each index group (except the one containing all atoms)

Using atom specifiers

Atoms specifiers are supplied using the -as flag

Several flags are supplied as atom specifier shortcuts

-back Backbone atoms

-ca CA atoms

-cacb CA and CB atoms

-heavy Nonhydrogen atoms

Atom specifier examples:

ANAME:CA All atoms called 'CA'.

ANAME:CA,CB,CG Returns all atoms called 'CA' or 'CB' or 'CD'.

ANAME:CA,RESNAME:TRP All atoms called 'CA' in all residues called TRP

ANAME:CA,SUBID:4 All atoms called 'CA' in residue number 4

ELEMENT:!H All nonhydrogen atoms

SEGID:PROT,SEGID:LIG All atoms with the SEGID set to PROT or LIG

Successive atom specifications can be made. Each is separated by a '|'

ANAME:CB|ANAME:CA|ANAME:CD Returns all atoms called 'CB', 'CA', 'CD'.

ANAME:CA,RESID:1|ANAME:CA,RESID:4 Returns 'CA' atoms from residue 1 and 4.

Atom specifiers are case sensitive.

merge_residue

Merge a single residue into another. The residue to be merged will be given the same name as the target residue.

Flags:

-e_<number> Existing residue number (A = all residues)

-n_<number> New residue number

-a_<string> New residue name (optional)

mol2cns

Convert a molecule to be suitable for input to CNS

Deletes pseudoatoms from a file (atom name starts with Q)

Adds hydrogens

Generates correct hydrogen names

Flags:

-f File containing atom names and connectivities (default $SILICO_HOME/data/cns_amino_acid_atoms.dat)

-addh Add hydrogens

-del Delete unknown hydrogens

mol2seq

Extract amino acid sequence from molecule

Output file <filebase>.xxx

Flags

-ih Include HETATMs

-c Combine all sequences into a single file (all.seq)

-o_<format> Output format

mol2split

Fast split program to divide a mol2 file into smaller files without parsing the file

Default is 100 structures per file

Flags:

-s <style> Output style [numbered, fnumbered] (default: fnumbered)

-n <number> Number of structures in each output file (default: 100)

-d <dir> Output directory (default: working directory)

mol_add_h

Add hydrogens to a molecule

By default, a protonation state is produced which approximates physiological state. Using the -v flag will fill all valences. ie Will add one hydrogen to a carboxylic acid and 3 hydrogens to ammonia

Adds only polar hydrogens if the 'polar' flag is used. Adds hydrogens to carbon only if 'nonpolar' flag is used

Flags:

-polar Add only polar hydrogens

-nonpolar Add only nonpolar hydrogens (hydrogens on carbon)

-v Fill valence

-d File containing atom names and connectivities ($SILICO_HOME/data/amino_acid_atoms.dat)

-check Run mol_check on generated structures

mol_add_lp

Add lone pairs to a molecule

Flags:

-o_<format> Output format

-O_<filename> Output filename

mol_amides

Constructs a plot of dihedral angles w and w' for secondary amides

Flags:

-p Print a hardcopy instead of producing an output file

--print Ditto

-o <format> Output file format (default: ps)

--output-file-format=<format>

-g Path to grace executable

--grace-executable Ditto

mol_centre

Translate a molecule to 0,0,0

mol_characterise

Calculate molecular weight, molecule extents and other molecular properties

Properties:

Number of atoms

Number of bonds

Number of rotatable bonds (see subroutine mol_count_rot_bonds)

Molecular weight

Number of C, H, N, O, P, S, halogen and other atoms.

Number of rings up to size 10 (Note that the number of rings is not quite the way a chemist would see it - eg Naphthalene has 3 rings)

Number of planar rings

Number of H-bond donors (see subroutine mol_find_donors_acceptors)

Number of H-bond acceptors

Molecule name

Adds hydrogens first by default

Data is written to a file '<filebase>_mc.out' in tab delimited format and into SDF data fields if -sdf flag is set

Flags

-sdf Write out an sdf file containing molecule structure and calculated data (on by default)

-hadd Add hydrogens (on by default)

-t Write data to tab delimited text file

-replace Overwrite original sdf file (only if -sdf is set)

-force Overwrite preexisting output files

mol_charge

Find the total partial and formal charges on molecle

Flags:

-formal Calculate formal charges

-r Do not provide a charge breakdown by residue

-s Do not provide a charge breakdown by segment

mol_check

Run a series of sanity checks on a molecule

Calls ensemble_check (this principally checks the integrity of the internal Silico data structures), mol_check_atom_overlap (to find badly overlapping atoms), mol_check_valences (to find atoms with an incorrect number of connected atoms)

Writes out a mol2 file with subsets ERROR, OVERLAP, BONDLENGTH and VALENCE containing any atoms with errors

It would be desirable to check for poor bond lengths and angles as well

Flags

-ce Check atom elements (default on)

-co Check for atom overlap (default off)

-cv Check atom valences (default on)

-ca Check for aromatic bonds in non-aromatic systems (default on)

-cb Check bondlengths (default on)

-cr Check for distorted aromatic rings (default on)

-amide Check for cis-amides and distorted trans amides (default on)

-a Run all checks

-noconnect Do not create connection table

-nofile Do not write an output .mol2 file for each input file

mol_chop_box

Chop off bits of a protein that are outside the box defined in box.pdb

Protein chains are terminated using chemically sensible groups.

Output files <filebase>_inbox.pdb and <filebase>_exclude.pdb

Used for dock setup

mol_combine

Combine separate molecules into a single molecule

By default combines all the molecules on the command line in to one single molecule. Each molecule is given a separate SEGID

If the -p option is used to specify a 'parent' molecule, then each molecule specified on the command line will be combined separately with the parent molecule. This is useful for combining many ligands with a single parent protein to produce complexes.

The -glide option will combine glide _pv.maegz files to produce multiple receptor/ligand structures in individual files. If the -o pdb option is chosen then the ligand will have HETATM records to be compatible with ligplot.

Flags:

-o_<format> Output file format

-p_<parent protein> Combine all molecules with parent (usually protein) molecule

-glide Combine all molecules with first molecule in file. Use with glide .pv or .raw files

-ra Renumber atoms in output file

-rr Renumber residues in output file

mol_connect

Test script for bond creation routines. All methods should give the same results.

Flags:

-o output format (default pdb)

-c connect atoms routine to use

mol_cubic_crystal

Generate all neighbours of a cubic crystal

Applies a 180 degree rotation about a specified axis and 27 translations to give a total of 54 molecules.

Flags:

-x_<val> }

-y_<val> } cell lengths

-z_<val> }

-rx }

-ry } Rotate by 180 degrees about each of these axes

-rz } (can be used together)

mol_del_atoms

Delete atoms from a file

Atoms are selected on the basis of their names, their elements or their residue names.

Name, element and residue name options accept comma separated values, which are ORed. For example, -e C,N would mean "delete carbons or nitrogens".

The * wildcard can be used in atom or residue names, but it must be enclosed in quotes to escape the shell. For example, mol_del_atoms -a N'*' file.mol2 will delete atoms whose names start with N in all molecules in file.mol2.

The various criteria are ANDed. For example, -r HOH -e O,H would mean "delete all atoms whose residue name is HOH and which are oxygens or hydrogens".

If any given criterion is left blank, any value for that criterion is considered acceptable.

Flags:

-a Atom names (comma separated list)

-e Atom elements (comma separated list)

-r Residue names (comma separated list)

mol_del_dummy

Delete all dummy atoms from a file (eg lone pairs).

Output file <filebase>_nodu.<ext>

Flags:

-con Force regeneration of connection table

mol_del_duplicate_atoms

Delete atoms which occupy the same point in space.

mol_del_excess_solv

Remove excess solvent from a file.

Flags:

-d Distance from molecule to leave solvated (defaults to 10 Angstroms)

mol_del_h

Delete all hydrogens from a file.

Output file <filebase>_noh.<ext>

Flags:

-res Delete hydrogens on a particular residue name

-n Delete all nonpolar hydrogens

mol_del_nonpolar_h

Delete nonpolar hydrogens from a file

Nonpolar charges are defined as being attached to carbon

Any charges on the hydrogens being deleted are transferred on to the parent atom.

Output file <filebase>_polarh.<ext>

Flags:

-con Force regeneration of connection table

mol_del_res

Script to delete residues from each molecule in a file.

Residues may be identified either by number or by name, or both. If both name and number are used, residues will be deleted if either criterion is matched, unless the -b flag is used.

Flags:

-a Residue names to delete (comma separated list)

-n Residue numbers to delete (comma separated list, ranges accepted)

-b Both name and number must be matched to delete a residue

-v Inverse operation (i.e., keep matching residues and delete everything else)

mol_del_water

Delete water molecules from a file (or alternatively delete nonwaters)

Writes out _dry file containing unsolvated molecules or _wat file containing waters

Note: Water is defined has having a residue name that starts with TIP or HOH

Flags:

-n Negate. ie write out waters instead of nonwaters

mol_divide

Separate a multi-molecule file (eg Tripos mol2 or Schrodinger mae) containing into individual files, each containing one molecule.

Files are put into a directory called <filebase>.dir. Each molecule is renamed with an Insight-safe name. The default behaviour renames the output file to match the molecule name. Other file naming styles can be selected using the -s flag

To separate a single molecule into multiple structures see 'mol_split'

For really big files (multiple thousands of structures) consider using sdfsplit, mol2split or pdbsplit which do not parse the file and are much faster

For convenience the script makes a pymol load script 'load.pml' in the output directory. Run this script in pymol to load all the files

For more control over splitting PDB files by chain, waters etc see 'pdbsplit'

Flags:

-stride_<val> Write out structure every 'val' steps

-n Starting number for renumbering

-s Output style for file names. molname: molecule name (not checked for duplicates!). molname_i 'insight_safe' molecule name. numbered: numbered, no leading zeros, fnumbered: numbered, leading zeros

-o Output format

-force Overwrite existing output

-p Make pymol load script 'load.pml' in output directory

mol_ensemble_average

Given a number of molecules of the same composition as input, write out a molecule where the position of each atom is averaged over the whole ensemble.

Flags:

-l Consider the largest fragment only

-t Translate the molecule (or largest fragment if -l) to centre of mass

mol_extents

Calculate the maximum and minimum X, Y and Z coordinates of a molecule and the centre point.

Values are written to STDOUT

mol_filter

Filter a set of molecules by SDF property and/or molecular weight. Can be used to retain molecules with a unique SDF_CODE

Molecules meeting critera are written out to <filebase>_flt.<ext>

Flags:

-p SDF property name

-max Property maximum value

-min Property minimum value

-mwmax MW maximum value

-mwmin MW minimum value

-druglike Select druglike compounds

-leadlike Select leadlike compounds

-fragmentlike Select fragmentlike compounds

-u Retain only a single representative with this ID

-np Use noparse option (much faster, but currently only available for sdf files)

mol_fp

Generate silico fragments for a molecule

Fragments are written to <filebase>_frag.dat

Flags:

-fmax Max fragment size

-fmin Min fragment size

-at Fragment atom typing [element (simple typing by atomic element, default), none (all atoms have the same type)]

-bt Bond typer [simple (single, triple and double/aromatic bonds), none (all bonds have the same type) ]

-h Include hydrogens in fragments [none (default), polar, all]

mol_get_name

Extract molecue names from a file and print them to the screen

Flags:

-f_<string> SDF field to use for molecule name

mol_hydrogen_bonds

Find all hydrogen bonds in a system.

Uses the hydrogen bond definition developed by McDonald and Thornton (J. Mol. Biol. 1994, 238, 777-793) which specifies maximum distances and angles for A...H-D and A..D.

Note that to increase the H-bond distance, you must increase both the A..H-D and A..D distances.

The default output filename is derived from the first. This can be changed using the -O flag.

Flags:

Hydrogen bond parameters

-d_<val> Maximum Donor-Acceptor distance (default 3.9 Ang)

-h_<val> Maximum Hydrogen-Acceptor distance (default 2.5 Ang)

-a_<val> Minimum Donor-Hydrogen-Acceptor angle (default 90 deg)

-b_<val> Minimum Hydrogen-Acceptor-Substituent angle (default 90 deg)

Timestep parameters

-ts_<val> Timestep between files (ps)

-i_<val> Time at first file (ps)

Atom/molecule selection options

-ignh Ignore hydrogen atoms [do not use: not yet implemented]

-wat Include water molecules

Input file options

-copy Copy data from first molecule to subsequent molecule. This is good for MD trajectories and series of PDB files

Output file options

-energy Print molecular energies to output file

-helix Print numbers of i-i+3 and i-i+4 H-bonds for each structure

-write Write out a file containing each input structure. Atoms involved in hydrogen bonds are contained in Sybyl sets

-writehb_<val> Write out only those molecules containing >= val H-bonds. Atoms involved in hydrogen bonds are contained in Sybyl sets (It may be useful to use the -o mol2 flag with this option)

-list List all hydrogen bonds in each structure

-ens Summarise the number of times each H-bond was found in output file. Assumes that all input molecules are members of an ensemble

mol_label_fg

Test script to test the subroutine 'mol_label_functional_group'

Flags:

-aa Label aa backbone

-het Label heterocycles

mol_merge

Merge all molecules in a file in to a single molecule

Flags:

-n Do not rename residues

mol_mw

Calculate molecular weight and molecule extents

MW data is calculated for the parent molecule (molecule with most atoms in file)

Values are printed to standard output and added to output file as SDF_DATA

Flags:

-addh Add hydrogens

-v Fill valence

-print Print data to a file 'mw.txt'

-min Minimum molecular weight to output (ie skip molecules below this mass)

-max Maximum molecular weight to output (ie skip molecules above this mass)

mol_rama

Constructs a Ramachandran plot for a molecule containing Alpha Amino acids

Flags:

-force Force overwrite of existing output files (default: off)

-o Output format (default: PostScript)

-print Print a hardcopy (default: off)

-residue Make one plot for each residue

-debug Print extra debugging information (default: off)

mol_rename

Rename molecules

Can also:

change the molecule name to the filename

change the molecule filename to match the molecule

rename the molecule using a specified SDF data field.

Using both the -r<datafield> and -c flags can be used to change filename to the specified SDF data field.

Can generate 'insight_safe' names. i.e. So that they do not contain spaces, punctuation, start with an underscore or a digit and are of limited length.

Flags:

Multiple changes

-g_<base> Generate new name using <base>, add a number and change SDF Name field (same as -b, -n, -s)

-mips_<start> Generate new MIPS code and set SDF Code field starting from suplied number

-b Set this molecule base name

-f Change molecule name to filename\

-r_<string> Rename molecules using the <string> SDF Data field

Options the modify the molecule name

-safe Use insight-safe names (ie that do not contain spaces, punctuation, start with an underscore or a digit and are of limited length)

-n Add a number to the end of the name

Options that modify molecule data (SDF data)

-s Transfer the molecule name to the SDF Data fields 'NAME' and 'title' : -sdfield_<field> Change specified SDF field to molecule name

Options that modify the filename

-c Change output filename to name of first molecule

-k Keep the same filename, overwriting the input file

Other options

-np Use noparse option (much faster, but currently only available for sdf files)

mol_renumber

Renumber and/or rename atoms and/or residues in a molecule

Flags:

-a <Starting atom number> Renumber atoms starting from this number

-s <Starting residue number> Renumber residues starting from this number

-c <Starting chain letter> Relabel chains starting from this letter

-ra Rename all atoms. Heavy atoms are numbered from 1 hydrogens are named according to the heavy atom they are connected to

-rr Rename atoms within each residue. Heavy atoms are numbered from 1 hydrogens are named according to the heavy atom they are connected to

-simple Rename atoms in residues. All atom elements are renumbered from 1

mol_rescale_bonds

Rescale a molecule so that the carbon-carbon bonds have a reasonable average bond length.

This defaults to 1.5 Angstroms, however, it can be adjusted through use of the -l flag. Alternatively, a scaling factor can be used by means of the -f flag.

This is useful to clean up files that have come out of Isis databases before they are minimised by some other program (eg Insight).

Flags:

-f <factor> Scaling factor (not to be used with -l)

-l <length> Target C-C bond length (not to be used with -f)

-o <format> Output format

-split Split multiple molecule files into a separate directory. Each molecule is renamed with an "Insight safe" name

mol_rot

rotate a molecule about any vector

-x_<number> }

-y_<number> } Vector to rotate around (assumed to pass thru origin)

-z_<number> }

-test Testing routine: rotate the first molecule only about the axis by 60 degrees

-a_<number> Angle to rotate molecule through

-random_<number> Generate this number of randomly rotated molecules

-maximise Approximately maximise the extents of the molecule along the X and Y axes (ie align the major axis along the X axis and the medium axis along the Y axis).

mol_rot_bond

Set the torsion angle between four atoms to a specified value.

The four atoms are not necessarily bonded to each other, however it makes more chemical sense if they are (provided the middle two are not in a ring). If the middle two atoms are in a ring, nothing will be done.

Flags:

-a_<number> Atom A

-b_<number> Atom B

-c_<number> Atom C

-d_<number> Atom D

-w_<number> Desired torsion angle (degrees)

mol_rotrans

Apply rotations and/or translations to a file

The rotation (about the X, Y or Z axis) is applied first, followed by the translation

Flags:

-x_<number> }

-y_<number> } X, Y and Z rotation angles in degrees

-z_<number> }

-a_<number> }

-b_<number> } X, Y and Z translation distances in Angstroms

-c_<number> }

mol_segment

Split all molecules in file to separate molecules (based on connectivities) and recombine them in to a single molecule. Each molecule is placed in a separate segment (M001 ... MXXX).

mol_size_shape

Calculate the size and shape of each molecule (by connectivity) in a file

Default is to exclude small molecules (< 10 atoms)

Flags:

-s Minimum size of molecules to include (default 10)

mol_smiles

Generate a SMILES string for a molecule

Smiles string is added to 'smiles' record of SDF_DATA in output file

Flags:

-b Include explicit bond orders

-h Include explicit hydrogen atoms

-k Use Kekule bonds and non-aromatic atom symbols

mol_solvate

Make a solvated box around a molecule with a density of 1. The result needs to be minimised!

Only one file may be supplied as an argument.

Default water residue name is HOH with atoms labelled OH2, H1, H2. Using the -amber flag produces residue name WAT with atom names O, H1, H2 Using the -gromacs flag produces residue name SOL with atom names OW, HW1, HW2

Note that using the default density of 1 g/mL to solvate proteins or bilayers will probably overestimate the number of water molecules required to produce a realistic total system density

Usage: mol_solvate <file> [<flags>]

Flags:

-x_<number> }

-y_<number> } Dimensions of the box in each direction

-z_<number> }

-f_<file> File containing solvent molecule to add. Water will be used if no file is supplied.

-r_<string> Residue name to call solvent molecules (default is read from file, or HOH)

-g_<number> Margin to add around molecule

-n_<number> Add this number of solvent molecules

-d_<number> Required density (default 1 g/mL)

-p_<number> Min packing distance between solvent molecules

-i Observe packing distance only for solvent-solute distances, not solvent-solvent

-t Translate solute molecule to coordinate origin

-b_<number> Solvate bilayer. ie Don't put solvent molecules within <number> A of bilayer plane. Default is XZ. XY plane if -xy flag is used

-xy Solvate bilayer in the xy plane

-amber Give water molecules AMBER names (resname WAT, Oxygen O, hydrogens H1, H2), and also adds TER after each water molecule

-gromacs Give water molecules GROMACS names (resname SOL, Oxygen OW hydrogens HW1, HW2),

-chain Each solvent molecule in its own chain. (-amber will also enable this option.)

mol_sort

Script to reorder the atoms in a file

Atoms are sorted by chain, residue number, atom number

Contains an option to reorder residues within a file. This option resets the CHAIN and SEGID identifiers to prevent undesired effects in the sorting routine.

Flags:

-r Rearrange residue order (takes input from command line)

mol_sort_fp

Calculate average Tanimoto coefficients of molecules within a set of compounds and sort the output by average Tanimoto coeff

Flags:

-w Write fragments into SDF Data

-fmax Max fragment size

-fmin Min fragment size

-at Fragment atom typing [element (simple typing by atomic element, default), none (all atoms have the same type)]

-bt Bond typer [simple (single, triple and double/aromatic bonds), none (all bonds have the same type) ]

-h Include hydrogens in fragments [none (default), polar, all]

mol_split

Split a molecule file into separate molecules based on connectivities.

Molecules are named using the RESIDUE name of the first residue by defualt

To separate a multi-molecule file into individual structures see 'mol_divide'

Known Bugs -d option does not work properly with pdb files

Flags:

-s Name molecules by by SEGID

-l Keep each molecule's largest fragment only

-d Write each output molecule as a separate file

-min Retain only molecules with at least this number of atoms

mol_split_segid

Split a molecule file into separate molecules based on SEGID. Molecules are written out to a single file. Molecules with no defined SEGID are assigned to the SEGID 'NONE'.

mol_wrap_cell

Wrap all molecules back in to a unit cell. A given atom can be selected to centre the system around using the -a flag. Otherwise, the centre of the largest fragment will be used. Useful for molecule dynamics output where some molecules have wandered out of the unit cell

Flags:

-ignore Ignore unit cell dimensions in individual files

-x_<number> Default unit cell's X dimension

-y_<number> Default unit cell's Y dimension

-z_<number> Default unit cell's Z dimension

-i_<file> Index file

-a_<number> Atom number to centre the system around

-g_<string> Name of index group to centre the system around

-t Translate the centre to (0,0,0)

namd_fix_back

Set the occupancy field of a pdb file for use as a NAMD constraint file:

Backbone atoms are constrained with the force constant given by -c.

All other atoms are free.

Flags:

-c_<number> Constraint (kcal/mol/Ang^2) (default 1)

namd_fix_heavy

Set the occupancy field of a pdb file for use as a NAMD constraint file.

Hydrogen and water atoms are free.

Sodium and Chlorine (ie salt) atoms are also free.

All other atoms are constrained with the force constant given by -c.

Flags:

-c_<number> Constraint (kcal/mol/Ang^2) (default 1)

namd_write_consref

A script to write out a constraint reference file (PDB format) for any molecule

Also includes a force constant to use

Flags:

-a_<integer> Reference Atom Number (defaults to 1)

-k_<number> Force Constant (defaults to 0.09)

-d_<number> Dihedral Force Constant for amides (defaults to 1000)

name_atoms_simple

Rename atoms in a molecule using a simple scheme

pdb_rename_hydrogens

Rename hydrogen atoms so that they have the correct PDB nomenclature. Using the -charmm flag will produce charmm27 atom names. Using the -cyana flag will produce cyana2 names.

Note: Particular attention must be paid to the delta-carbon of isoleucine residues, which is also renamed. The -charmm flag will name ILE CD as CD. The -cyana flag will name ILE CD as CD1.

C-terminus and N-terminus hydrogen names are currently not generated.

Flags:

-charmm Generate charmm27 atom names

--charmm-atom-names

-cyana Generate cyana2 atom names

--cyana-atom-names

-f <filename> Filename containing atom names and connectivities

--datafile=<filename>

-d Delete any hydrogens that can not be given a name

--delete-unknown-h

-o <format> Output file format (default PDB)

--output-file-format=<format>

David K. Chalmers, 3 February 2000

pdbsplit

Split a PDB file into smaller pieces based on TER or MODEL records without parsing the file

Use:

pdbsplit file.pdb -s fnumbered produces file_00001.pdb, file_00002.pdb, etc

pdbsplit file.pdb -s numbered produces file_1.pdb, file_2.pdb

Flags:

-s Filename output style

-d <dirname> Output directory

-end Split on END records

-endmdl Split on ENDMDL records

-ter Split on TER records

-chain Split at end of each chain/SEGID. Name output files by chain

-all Split at all of the above (default but unset by selecting one of the above)

radius_of_gyration

Calculates the radius of gyration for a series of molecules

Flags:

-ts_<number> Timestep between files (ps)

-i_<number> Time at initial file (ps)

-a Use all fragments (not just the largest one) to calculate radius of gyration

-g_<string> Use only atoms in index group "string"

-n_<file> Index file in which to find the group

random_box

Fill a cell with molecules in random orientation. An existing molecule file can also be embedded in the box (solute).

Use

random_box -x 10 -y 10 -z 10 -n <number of molecules> <solvent filename> -e <molecule to embed>

random_box -d <required density of system> <solvent filename>

random_box -d <required density of system> -w <weight percent of solvent> <solvent filename>

Flags:

-x }

-y } Dimensions of the box (Angstroms)

-z }

-cx }

-cy } Centre of the box (Angstroms)

-cz }

-e_<filename> Embed molecule filename

-n_<number> Number of molecules to add

-d_<number> Calculate number and add molecules to give this density of entire molecular system (including solute)

-wp_<weight percent> of solvent (this modifies the density value - final density = density * weight-percent/100)

-o_<format> Output format (default: Gromacs .gro)

-ignore_h Ignore hydrogen atoms when placing molecules

-t Translate embed molecule to centre (default on)

-md Minimum distance between molecule heavy atoms

-mc Maximum number of clashes allowed when adding new molecules

-mci Increment maximum number of clashes after this number of unsuccesful trials

randomise_conformation

Produce a random conformation of a molecule by performing arbitrary rotations about bonds.

Flags:

-n <number> Maximum number of rotations to perform

--number-of-rotations=<number>

-d <distance> Distance below which two atoms clash

--clash-distance=<distance>

-min <number> Maximum number of steps to minimise in Sybyl

--minimisation-steps=<number>

-r Calculate RMS to original molecule after each rotation

--rms-values

-o <format> Output format

--output-file-format=<format>

read_write_charmm_rtf

Generate CHARMm topology file (rtf) from input file.

Currently designed to work on residues as separate molecules. Separate residues are converted to GROUPS

Also writes pdb and mol2 format files

Flags:

-p,--parameter-file Charm parameter (prm) file

read_write_cml

Read any silico format and write a Chemical Markup Language format file.

Under development and incomplete

read_write_merck

Read any Silico format and write a Merck format file

Flags:

-b,--regenerate-bondorders Regenerate bondorders

read_write_mmod

Read a molecule and write a Macromodel format file.

read_write_mol

Test script to read any Silico format and write (by default) in the same format.

Flags:

-o_<format> Output format

-O_<filename> Output filename

-check Run molecule check

read_write_mol2

Read any silico format and write a mol2 file.

Flags:

-r Rename molecules using SDF data field for conversion from SDF to mol2

-mm Write output in MolMol Mol2 format.

-p Write output in Mol2 Protein format.

--single Use single molecule read/write routines

-dr Print additional debugging information for rings

read_write_mopac

Read any Silico format and write a MOPAC cartesian file.

read_write_pdb

Read any Silico format and write a pdb file

Flags:

-d Delete disordered (ALT) atoms

--single Use single molecule read/write routines

-debug

read_write_rtf

Test script to read and write CHARMm rtf files

read_write_sdf

Read any silico format and write an sdf file.

Optionally add a 'name' field, rename the molecule or remove SDF data

Flags:

-s Starting structure number

-n Number of structures to read

-r SDF data field to use if renaming molecules

-a Add 'name' field to SDF_DATA using molecule name encoded in the first line of the file

-clean Remove all sdf data (except name)

-noparse Do not parse SDF data (Only works with SDF input files)

--single Use single molecule read/write routines

read_write_seq

Sequence format test script

Silico protein/DNA sequence format routines are under development and incomplete

Flags:

-c,--combine Combine sequences from all files to a single file

-n,--number-of-residues Number of residues per line in output

renumber_residues

Renumber residues in a file. Number SUBCOUNTs sequentially from 1, and make the SUBID for any atom the same as the SUBCOUNT for that atom. Optionally, use a different start and a different increment.

Molecule is sorted before renumbering. All hydrogens are forced to have the same residue name, residue number, chain and segid as their parent heavy atom

Flags:

-s_<number> New starting residue number (default 1)

-i_<number> New increment (default 1)

residue_rename

Change a single residue name

Rename a single residue type

Flags:

-a Rename all residues

-e_<residue_names> List of residue names to change

-n_<numbers> List of residue numbers to change

-r_<residue_name> New residue name

rms

Calculate RMS distances between molecules without superimposition

The first structure in the first file is used as the reference structure. The RMS distance is calculated to all subsequent molecules. Output is written to <ref_file>.rms

Heavy atom RMS is calculated by default. The -all flag can be used to include hydrogens in the calculations

Flags:

-a Use all atoms including hydrogens to calculate RMS. Uses heavy atoms by default

-s Sort atoms into smiles order before doing RMS comparison. This may be useful if molecules have different atom orders.

-w Write out file containing RMS as SDF_DATA

scale

Scale a molecule by a factor

Flags:

-f <number> Scale factor

-o <format> Output file format (default: input format)

sdf_add_id

Script to add an identifier field (SILICO.ID) to an sdf file.

By default the identifier of the format XXdddddddd where XX are random letters and dddddddd is an eight digit integer starting from 00000001

Flags:

-i <field> SDF DATA field containing identifier data

-r <field> SDF DATA field to use when renaming molecules

sdfsplit

Fast split program to divide an sdf file into smaller files without parsing the file

Use:

sdfsplit file.sdf -s fnumbered produces file_00001.sdf, file_00002.sdf, etc

sdfsplit file.sdf -s numbered produces file_1.sdf, file_2.sdf

sdfsplit -r <CODE> renames molecules using the data in field <CODE>

Flags:

-r <datafield> Rename molecules using the given SDF data field

-s (numbered / fnumbered) Output style

-n <num> Number of structures to write to each output file (default 1)

-d <dirname> Output directory

seq_similarity

Calculate pairwise identity and strong and weak homologies for a set of sequences

Uses the strong and weak conserved groups described in the ClustalX documentation corresponding to the *, : and . labels output by Clustal

Note: Silico protein/DNA sequence format routines are under development and incomplete

Flags:

-c Combine sequences from all files to a single file (all.XXX)

--combine Equivalent to -c

-o Output format

--output-format Equivalent to -o

shape_tensor

Calculates the gyration tensor, the shape ellipsoid, and three shape descriptors of a series of molecules.

Flags:

-ts Timestep between files (ps)

-i Time at initial file (ps)

-e Write mol2 file containing molecule and shape ellipse

slurp

Test for slurp routine which reads molecule files into simple text strings

stacking

Calculates the extent of planar ring stacking in a molecule system

Flags:

-f_<string> Output text file suffix (defaults to _stacking.dat)

-ts_<number> Time between files (ps)

-i_<number> Time at first file (ps)

-sd_<number> Stacking distance cutoff (Angstroms)

-sa_<number> Stacking angle cutoff (degrees)

-cd_<number> Clustering distance cutoff (Angstroms)

-size Maximum stack size to record in a separate column

-write Write out a Mol2 file for each input file

Ring type flags (at least one must be used)

-p Count stacks of all planar rings

-b Count benzene stacks

-n Count naphthalene stacks

starmaker

All-purpose dendrimer builder.

Written by David Chalmers and Ben Roberts

Use: starmaker5 core.mol2 monomer.mol2 monomer.mol2 monomer.mol2 ... cap.mol2

Each mol2 file should contain the monomer for that generation. Each monomer needs to have attachment points. These are hydrogen atoms named Q1 (point to branch FROM) or Q2 (point to attach TO). The first (core) subunit should contain at least one Q1 hydrogen. A standard monomer should contain one Q2 hydrogen and at least one Q1 hydrogen. A capping group should contain one Q2 hydrogen.

Outputs a mol2 file 'final_dendrimer.mol2' and intermediate states as layer_XX.mol2

Main features:

The dendrimer is built layer by layer. The first residue in the list is named COR. Subsequent residues are named GAA, GAB, GAC, etc. Amide bonds in each monomer are recognised and converted to a trans geometry. A repulsion potential and random torsional search is used to force monomers to grow in a extended geometry.

At the completion of each layer the molecule is minimised using Sybyl (although this can be turned off using a flag).

Additional features:

A file 'stop' in the working directory will terminate the program

Flags

-rc Force first residue to be renamed to 'COR'.

-clash Distance below which two atoms clash

-noh Do not include hydrogens in geometry optimisation (default)

-cut Distance above which the repulsive potential is ignored

-iter Maximum number of iterations when optimising geometry (no clashes)

-conv Number of steps with the same best coordinates before optimisation stops

-min Minimise with Sybyl for this maximum number of steps (default 5000)

-wc Write out structure after each monomer to file current.mol2

-wi Write out structure after every optimisation step

statistics

Reads in and calculates statistics (various measures of centre and spread) for a column of values in each of one or more files.

tabulate

Output data from SDF_DATA fields to a tab delimited text file and produce histograms of data values using gnuplot

Designed to act principally on SDF files but will also work with some data (DOCK and PMFSCORE output) read from comment lines in mol2 format

unit_cell_size

Report the individual volumes and mean volume of the unit cells of a series of molecules.

Flags:

-i Initial time (default 0)

-ts Time step between files

water_to_ion

Replace random water molecules (single atom) counter ions. Defaults to Na ions. Covers a very large range of inorganic ions.

Flags:

-e_<string> Ion element (default Na). Now handles a large range of single-atom cations and anions.

-n_<number> Number of ions to add

-r_<string> Residue name of added ions (optional)

-a_<string> Atom name to use for ions (optional)

-amber Add PDB TER record after each ion to produce input for Amber

write_mol

Test script to read any Silico format and write (by default) in the same format.

Flags:

-check Check integrity of molecules

write_seq

Sequence format test script

Silico protein/DNA sequence format routines are under development

Flags:

-combine Combine sequences from all files to a single file (all.XXX)

-nodup Remove duplicate sequences

-nostop Remove any amino acid sequences containing a stop ($)

-clean Clean up sequences (equivalent to -nodup -nostop)

-nl_<number> Number of residues per line in output

-split Split sequences into individual files

-sort Sort sequences alphabetically by name