Scripts
Scripts
abstract2seq
Run Jay Ponder's 'abstract' program from his Sleuth package on a pdb file and convert the result to a sequence
Abstract calculates Kabsch and Sander secondary structure, polar and buried surfaces
Note: This will fail if the pdb file contains disordered atoms
Flags:
-c_<string> Chain to abstract (all = all chains [default], blank = blank chains)
-force Force overwrite of existing output
adf
Calculate angular distribution functions of atoms about the centre of mass of a group
At the moment, this script is extremely simplified. It only takes one reference group and one target group per invocation, and includes no support for periodic boundary conditions.
Flags:
-i_<file> Index file
-r_<string> Reference group (present in index file)
-t_<string> Target group (present in index file)
-s_<num> Bin size (angle in degrees)
-n_<num> Number of bins
-O_<string> Output file name
atom_rename
Change a single atom name or rename a single atom in a particular residue type
Flags:
-eres_<string> Existing residue name
-ename_<string> Existing atom name
-enum_<number> Existing atom number
-new_<string> New atom name
autocorr_angle
Flags:
-g1_<string> Group 1 (present in the index file)
-g2_<string> Group 2 (present in the index file)
-i_<file> Index file
-ts_<num> Time step between molecules (ps)
-m_<num> Maximum interval (delta-t) as proportion of the entire window (default 0.1)
-O_<string> Output file name
autocorr_distance
Flags:
-g_<string> Group (present in the index file)
-i_<file> Index file
-ts_<num> Time step between molecules (ps)
-m_<num> Maximum interval (delta-t) as proportion of the entire window (default 0.1)
-O_<string> Output file name
average2
Reads in two or more whitespace-delimited files and averages their contents.
Each file is assumed to have two data columns, and comment lines escaped using "#". Of the data columns, the one on the left is the independent variable, and that on the right the dependent.
At least two files are required.
backbone
Delete all non-backbone atoms from a molecule.
Uses atom labels to determine which atoms should be kept. Default set is {N, CA, C, O}. This can be further reduced to CA atoms only by use of the -ca flag, as below.
Flags:
-ca Keep only CA atoms
bilayer_builder
Build an approximate lipid bilayer using periodic boundary conditions. Molecules can be emedded in the bilayer.
By default, the bilayer is built in the XZ plane, using instances of a supplied lipid file. In this case, prior to use in bilayer building, the lipid should already be oriented along the Y-axis, with its head in the positive-Y direction.
To make bilayer builder more compatible with MD packages that require anisotropic simulations to have the bilayer in the XY plane, use the -xy flag.
Bilayers can be built with multiple constituent species. To do this, multiple passes are used. That is, instances of one species are built into a bilayer, and this bilayer is then "embedded" into a bilayer of the second species, and so on.
The distance between the two halves of the bilayer is controlled by the -s flag. This should be set to approximately the length of the lipid in the Y dimension. Usage: bilayer_builder <lipid_file> -x <cellx> -y <celly> -z <cellz> -n1 <number of lipids in first layer> [ -n2 <number of lipids in second layer> -e <molecule to embed> -s <separation between layers of lipids> ]
Flags:
-n1_<number> Number of instances of lipid to use to create monolayer 1 (top)
-n2_<number> Number of instances of lipid to use to create monolayer 2 (bottom)
-x_<number> |
-y_<number> | Dimensions of the box in each direction
-z_<number> |
-s_<number> Distance between the mid-points of the bilayers
-d_<number> Minimum distance between molecule heavy atoms
-r Randomise lipid dihedrals by this number of steps
-mt Maximum number of attempts to pack molecule (default 1000)
-ih Ignore hydrogen atoms when packing lipids (default)
-e_<filename> Embed molecule filename
-not Do not translate the embedded molecule to the centre
-xy Build bilayer in the xy plane
-test Test resulting bilayer for clashes
bsurface
Find number of atoms buried in a protein-ligand intermolecular interaction
Ligand atoms are classified as being buried if they are within <cutoff> distance of the protein and vice versa
Flags:
-p_<filename> Protein file name
-c_<number> H-bond distance cutoff
calc_time
Turns a specified number of seconds into human-readable time
Usage:
calc_time <time 1> [ <time 2> ... ]
check_complex
Check to see if a ligand overlaps with a protein
Use: check_complex <ligand> -p <proteinfile>
checkbox
Test to see if molecule(s) is (are) inside the box defined by box.pdb
Use: checkbox <ligandfile> -b<boxfile>
Used for programs like DOCK which do not always dock a molecule INSIDE the specified box. The default (Dock format) box is 'box.pdb'. 'Good' structures are written to <filebase>_ibox>.fmt and structures with atoms outside the box are written to <filebase>_obox.fmt
Flags:
-ih Ignore hydrogen atoms
-b box location (default box.pdb)
-force Force overwriting of output files
cluster
Cluster molecules using the very simple 'cluster' subroutine.
Use: cluster <file> [<flags>]
Writes out a representative structure from each cluster.
Flags:
-t_<number> Clustering threshold (default: 2)
-a Treat all input files as one set
cluster_fp
Cluster compounds based on a molecular fingerprint using the Tanimoto coefficient
Uses a very simplistic clustering algorithm
Use: cluster_fp <structurefile>
One compound is retained from each cluster. The retained compounds are biased towards lower molecular weight (strictly number of heavy atoms)
Flags:
-cutoff Tanimoto cutoff value (default 0.50)
-fmax Max fragment size
-fmin Min fragment size
-at Fragment atom typing [element (simple typing by atomic element, default), none (all atoms have the same type)]
-bt Bond typer [simple (single, triple and double/aromatic bonds), none (all bonds have the same type) ]
-h Include hydrogens in fragments [none (default), polar, all]
dipole_moment
Calculate a dipole moment
Flags:
-e Calculate electrostatic dipole moment
-f Spec file for custom dipole moments
-n Index file
-ts Timestep between files (ps)
-i Time at first file (ps)
-w Write out molecule and charge-centres in mol2 files
Notes:
-e and -f can be used together, but at least one must be used.
-n must be used if -f is used.
druglike
Discard compounds that do not fit a set of Lipinskiish critera for druglike properties
Mol_characterise must be run first
Flags:
-nh_<number> Maximum number of heavy atoms (default 30)
-fnh_<text> SDF field containing number of heavy atoms (default ".NUMHEAVY")
-nr_<number> Maximum number of rotatable bonds (default 7)
-fnr_<text> SDF field containing number of rotatable bonds (default "SILICO.NUMROT")
-nd_<number> Minumum number of Hydrogen bond donors (default 1)
-fnd_<text> SDF field containing number of Hydrogen bond donors (default "SILICO.NUMDON")
-na_<number> Minimum number of Hydrogen bond acceptors (default 3)
-fna_<text> SDF field containing number of Hydrogen bond acceptors (default "SILICO.NUMACC")
-l_<number> Maximum Log-P value (default 5.0)
-fl_<text> SDF field containing Log-P value (default "LOGP")
-no_<number> Maximum number of non-{C,H,N,O,P,S} atoms (default 0)
-fno_<text> SDF field containing number of non-{C,H,N,O,P,S} atoms (default "SILICO.NUMOTHER")
extract_lig_prot
Split pdb file into protein and non-covalently bound ligands
Structures are split up using connectivity
Molecules with fewer than 'maxatom' and more 'minatom' atoms are assumed to be ligands (This approach is a little simplistic. It assumes that there are no breaks in the protein chain. However it IS able to extract peptide ligands from proteins)
Flags:
-maxatoms (Maximum ligand size)
-minatoms (Minimum ligand size)
file_rename
Rename files using a perl regular expression substitution
For example file_rename -a _new <filename> will replace the string '_new' with '' and the file clozapine_new.mol2 becomes clozapine.mol2
Use:
file_rename <inputfiles> -a <exp1> -b <exp2> where <exp1> and <exp2> are strings or perl regular expressions
<exp2> is '' by default
Examples:
file_rename file_new.mol2 -a _new will rename file_new.mol2 to file.mol2
file_rename file.mol2 -a .mol2 -b .bak will rename file.mol2 to file.bak
file_rename ssss.mol2 -a 's*' -b x -re will rename ssss.mol2 to x.mol2
file_rename ssss.mol2 -a 's' -b x -re -s will rename ssss.mol2 to xsss.mol2
Flags:
-a_<string> string to replace
-b_<string> string to insert ('' by default)
-s Make only a single substitution of string a (not a global one)
-re treat string a as a regular expression
-force overwrite existing files
find_aggregate
Identify hydrophobic aggregates of molecules (eg components of micelles) in a periodic system. Molecules are identified by connectivity and can contain multiple residues. Two molecules are defined as belonging to the same aggregate if they have carbon atoms within the cutoff distance (-t flag).
The -move option moves all atoms into the unit cell (molecules are split when they cross the unit cell boundaries). This is useful for generating pictures in programs like Pymol
The script writes out a Gromacs format index file (ie atoms are indexted starting from 1).
Summary data is written to <first_filename>.out
Flags:
-t_<number> Maximum distance between C atoms in aggregate (default 4)
-move Move all atoms into the unit cell (range 0 -> cell size)
-x_<number> }
-y_<number> } Cell dimensions
-z_<number> }
find_amino_acids
Test script to find amino acids within a molecule by comparing residues to amino acid templates
Note: Requires that hydrogen atoms are present
Flags:
-w Write out files containing the newly labelled structure
find_close_water
Find water molecules that are directly hydrogen bonded to a protein
find_groups
Count functional groups in a molecule
find_max
Find the maximum and minimum extents of a molecule and make a box to enclose it
Box file is written as box_<filebase><counter>.pdb
find_rings
Test script to find rings, planar rings and aromatic rings in a molecule. Temperature factor and occupancies of the output pdb file are set. The Mol2 output file contains the rings, aromatic rings and planar rings in static sets.
Flags:
-r Maximum depth of search for rings
-o Write out molecule structure files with rings marked
-timing
-debug rings
find_similar
Identify duplicate or similar compounds in two sets of molecules using Tanimoto or Euclidian comparisons
Useage find_similar file1 file2 file3 ....
Designed to be used to filter docking results that are ordered from best to worst.
Fragments are generated using the silico fragment routines
Tanimoto coeff = Num common fragments / Num fragments in mol1 + Num fragments in mol2 - Num common fragments
Euclidian coeff =
Flags:
-s Scoring method (Tanimoto or Euclidian)
-cut Cutoff value for duplicate compound (default 0.80)
-dup Find duplicates (sets cutoff to 0.999)
-max Max fragment size
-min Min fragment size
-noh Ignore hydrogens in fingerprints (default)
-o<format> Output format
fix_rings
Attempts to find and fix rings with bonds through them
This assumes that all bonds are present in the structure file and that the structure is minimised to begin with.
The script focuses on bond length as a method of detection and finishes up with a second minimisation.
Flags:
-min_<number> Minimise with Sybyl for some number of steps (0 disables minimisation)
-r Rename molecules, giving them the filebase as the new name
fix_types
Attempts to fix Sybyl atom types that have been broken in a minimisation, looking especially for atoms with aromatic bonds that have not been given aromatic types.
Flags:
-r Rename each molecule in a file, giving it the filename
flatten
Squash molecules flat into the XY plane
formatdoc
Print formatted comments from silico files
Flags:
-h Write HTML output to a file
-s Generate subroutine descriptions
get_near_res
Work-alike for Dock get_near_res and invertPDB
Only takes one argument at a time.
Flags:
-p_<file> Protein filename
-c_<dist> Cutoff (default 15 Angstroms)
getcell
Make a box from cell coordinates
hydrodynamic_radius
Calculates the hydrodynamic radius for a series of molecules
Flags:
-ts Timestep between files (ps)
-i Time at initial file (ps)
libclean
Clean up structures taken from the Available Chemicals Directory or other ISIS databases (A poor man's Concord)
Assumes that molecules are 'flat' and have ISIS chirality descriptors. Sybyl is used to convert 2D structures to 3D by minimisation using either the Sybyl or MMFF forcefield. Produces an output file with a descriptive name if an error occurs. Retains molecular data from SDF files
Steps:
1. Discard counter ions and small molecules by retaining only the largest molecule in the input structure
2. Scale bonds to approximate sensible values
3. Delete any _polar_ atoms so that an approximate physiological protonation state can be produced
4. Check that all atom elements are real elements (Not Du, R, etc)
5. Add hydrogens using Sybyl (known to have problems with nitro groups and other things) or silico which approximates physiological conditions for common functional groups - carboxylic acids are deprotonated, aliphatic amines are protonated. Hydrogens added using Silico have approximate geometries and the resutling structues should probably be minimised.
6. Attempt to produce approximately correct stereochemistry for atoms marked as chiral (this usually fails for complex molecules)
7. Randomise atom Z positions slightly to stop Sybyl getting stuck on saddle points
8. Minimise molecule with Sybyl (optional)
9. Check for gross problems with the molecule
10. Write out the result <filebase>_cl_<number>.<ext>
Flags:
-flat Use this flag if the input structures are 2D
-addh Add hydrogens with 'silico' or 'sybyl'
-min <steps> Number of steps for minimisation
-ff Sybyl forcefield: Tripos FF and Gasteiger-Marsilli charges (gm) or Merck FF and charges (merck)
-mhadd Minimise only if change in number of hydrogen atoms (Silico H addition only)
-sybexe <sybyl_exe> Location/name of Sybyl executable. This is usually found automatically by checking TA_ROOT
-force Force overwriting of output files
-rename Rename molecules using SDF data field
-o <format> Output format
Files:
sleep A file called sleep in the operating directory will cause the job to go to sleep while it is there
stop A stop file will cause the job to stop
make_box
Produce a box file '<filebase>_box.pdb'. If the input file has a unit cell predefined, that will be used. Otherwise, a box will be made large enough to enclose the first molecule.
Unit cells are encoded in some file types, for example Gromacs, Mol2, PDB
Flags:
-i Ignore existing unit cell data
make_index
Create a an index file in Gromacs or DCD format.
Can use a versatile atom selection language and will also write out files containing selected atoms
Flags:
-z Number atoms from zero, as for DCD files (Note if you wish to use this file with catdcd, you will need to edit the resulting file to contain only a single index group and remove all lines containing square brackets
-as_<string> Use Atom Specifier (see below)
-g Create an index group containing all atoms
-a Create a separate index group for each atom
-e Create an index group for each element
-r Create an index group for each residue
-seg Create an index group for each segment
-w Create separate index groups for water and not water
-set Create an index group for each Mol2 atom set
-an_<atom_names> Create index groups for listed atom names
-rn_<res_names> Create index groups for listed residue names
-d Analyse the molecule as for a Starmaker dendrimer (create index groups for COR, GAA, GAB,...)
-write Write out a file containing atoms in index groups. The output file contains structures corresponding to each index group (except the one containing all atoms)
Using atom specifiers
Atoms specifiers are supplied using the -as flag
Several flags are supplied as atom specifier shortcuts
-back Backbone atoms
-ca CA atoms
-cacb CA and CB atoms
-heavy Nonhydrogen atoms
Atom specifier examples:
ANAME:CA All atoms called 'CA'.
ANAME:CA,CB,CG Returns all atoms called 'CA' or 'CB' or 'CD'.
ANAME:CA,RESNAME:TRP All atoms called 'CA' in all residues called TRP
ANAME:CA,SUBID:4 All atoms called 'CA' in residue number 4
ELEMENT:!H All nonhydrogen atoms
SEGID:PROT,SEGID:LIG All atoms with the SEGID set to PROT or LIG
Successive atom specifications can be made. Each is separated by a '|'
ANAME:CB|ANAME:CA|ANAME:CD Returns all atoms called 'CB', 'CA', 'CD'.
ANAME:CA,RESID:1|ANAME:CA,RESID:4 Returns 'CA' atoms from residue 1 and 4.
Atom specifiers are case sensitive.
merge_residue
Merge a single residue into another. The residue to be merged will be given the same name as the target residue.
Flags:
-e_<number> Existing residue number (A = all residues)
-n_<number> New residue number
-a_<string> New residue name (optional)
mol2cns
Convert a molecule to be suitable for input to CNS
Deletes pseudoatoms from a file (atom name starts with Q)
Adds hydrogens
Generates correct hydrogen names
Flags:
-f File containing atom names and connectivities (default $SILICO_HOME/data/cns_amino_acid_atoms.dat)
-addh Add hydrogens
-del Delete unknown hydrogens
mol2seq
Extract amino acid sequence from molecule
Output file <filebase>.xxx
Flags
-ih Include HETATMs
-c Combine all sequences into a single file (all.seq)
-o_<format> Output format
mol2split
Fast split program to divide a mol2 file into smaller files without parsing the file
Default is 100 structures per file
Flags:
-s <style> Output style [numbered, fnumbered] (default: fnumbered)
-n <number> Number of structures in each output file (default: 100)
-d <dir> Output directory (default: working directory)
mol_add_h
Add hydrogens to a molecule
By default, a protonation state is produced which approximates physiological state. Using the -v flag will fill all valences. ie Will add one hydrogen to a carboxylic acid and 3 hydrogens to ammonia
Adds only polar hydrogens if the 'polar' flag is used. Adds hydrogens to carbon only if 'nonpolar' flag is used
Flags:
-polar Add only polar hydrogens
-nonpolar Add only nonpolar hydrogens (hydrogens on carbon)
-v Fill valence
-d File containing atom names and connectivities ($SILICO_HOME/data/amino_acid_atoms.dat)
-check Run mol_check on generated structures
mol_add_lp
Add lone pairs to a molecule
Flags:
-o_<format> Output format
-O_<filename> Output filename
mol_amides
Constructs a plot of dihedral angles w and w' for secondary amides
Flags:
-p Print a hardcopy instead of producing an output file
--print Ditto
-o <format> Output file format (default: ps)
--output-file-format=<format>
-g Path to grace executable
--grace-executable Ditto
mol_centre
Translate a molecule to 0,0,0
mol_characterise
Calculate molecular weight, molecule extents and other molecular properties
Properties:
Number of atoms
Number of bonds
Number of rotatable bonds (see subroutine mol_count_rot_bonds)
Molecular weight
Number of C, H, N, O, P, S, halogen and other atoms.
Number of rings up to size 10 (Note that the number of rings is not quite the way a chemist would see it - eg Naphthalene has 3 rings)
Number of planar rings
Number of H-bond donors (see subroutine mol_find_donors_acceptors)
Number of H-bond acceptors
Molecule name
Adds hydrogens first by default
Data is written to a file '<filebase>_mc.out' in tab delimited format and into SDF data fields if -sdf flag is set
Flags
-sdf Write out an sdf file containing molecule structure and calculated data (on by default)
-hadd Add hydrogens (on by default)
-t Write data to tab delimited text file
-replace Overwrite original sdf file (only if -sdf is set)
-force Overwrite preexisting output files
mol_charge
Find the total partial and formal charges on molecle
Flags:
-formal Calculate formal charges
-r Do not provide a charge breakdown by residue
-s Do not provide a charge breakdown by segment
mol_check
Run a series of sanity checks on a molecule
Calls ensemble_check (this principally checks the integrity of the internal Silico data structures), mol_check_atom_overlap (to find badly overlapping atoms), mol_check_valences (to find atoms with an incorrect number of connected atoms)
Writes out a mol2 file with subsets ERROR, OVERLAP, BONDLENGTH and VALENCE containing any atoms with errors
It would be desirable to check for poor bond lengths and angles as well
Flags
-ce Check atom elements (default on)
-co Check for atom overlap (default off)
-cv Check atom valences (default on)
-ca Check for aromatic bonds in non-aromatic systems (default on)
-cb Check bondlengths (default on)
-cr Check for distorted aromatic rings (default on)
-amide Check for cis-amides and distorted trans amides (default on)
-a Run all checks
-noconnect Do not create connection table
-nofile Do not write an output .mol2 file for each input file
mol_chop_box
Chop off bits of a protein that are outside the box defined in box.pdb
Protein chains are terminated using chemically sensible groups.
Output files <filebase>_inbox.pdb and <filebase>_exclude.pdb
Used for dock setup
mol_combine
Combine separate molecules into a single molecule
By default combines all the molecules on the command line in to one single molecule. Each molecule is given a separate SEGID
If the -p option is used to specify a 'parent' molecule, then each molecule specified on the command line will be combined separately with the parent molecule. This is useful for combining many ligands with a single parent protein to produce complexes.
The -glide option will combine glide _pv.maegz files to produce multiple receptor/ligand structures in individual files. If the -o pdb option is chosen then the ligand will have HETATM records to be compatible with ligplot.
Flags:
-o_<format> Output file format
-p_<parent protein> Combine all molecules with parent (usually protein) molecule
-glide Combine all molecules with first molecule in file. Use with glide .pv or .raw files
-ra Renumber atoms in output file
-rr Renumber residues in output file
mol_connect
Test script for bond creation routines. All methods should give the same results.
Flags:
-o output format (default pdb)
-c connect atoms routine to use
mol_cubic_crystal
Generate all neighbours of a cubic crystal
Applies a 180 degree rotation about a specified axis and 27 translations to give a total of 54 molecules.
Flags:
-x_<val> }
-y_<val> } cell lengths
-z_<val> }
-rx }
-ry } Rotate by 180 degrees about each of these axes
-rz } (can be used together)
mol_del_atoms
Delete atoms from a file
Atoms are selected on the basis of their names, their elements or their residue names.
Name, element and residue name options accept comma separated values, which are ORed. For example, -e C,N would mean "delete carbons or nitrogens".
The * wildcard can be used in atom or residue names, but it must be enclosed in quotes to escape the shell. For example, mol_del_atoms -a N'*' file.mol2 will delete atoms whose names start with N in all molecules in file.mol2.
The various criteria are ANDed. For example, -r HOH -e O,H would mean "delete all atoms whose residue name is HOH and which are oxygens or hydrogens".
If any given criterion is left blank, any value for that criterion is considered acceptable.
Flags:
-a Atom names (comma separated list)
-e Atom elements (comma separated list)
-r Residue names (comma separated list)
mol_del_dummy
Delete all dummy atoms from a file (eg lone pairs).
Output file <filebase>_nodu.<ext>
Flags:
-con Force regeneration of connection table
mol_del_duplicate_atoms
Delete atoms which occupy the same point in space.
mol_del_excess_solv
Remove excess solvent from a file.
Flags:
-d Distance from molecule to leave solvated (defaults to 10 Angstroms)
mol_del_h
Delete all hydrogens from a file.
Output file <filebase>_noh.<ext>
Flags:
-res Delete hydrogens on a particular residue name
-n Delete all nonpolar hydrogens
mol_del_nonpolar_h
Delete nonpolar hydrogens from a file
Nonpolar charges are defined as being attached to carbon
Any charges on the hydrogens being deleted are transferred on to the parent atom.
Output file <filebase>_polarh.<ext>
Flags:
-con Force regeneration of connection table
mol_del_res
Script to delete residues from each molecule in a file.
Residues may be identified either by number or by name, or both. If both name and number are used, residues will be deleted if either criterion is matched, unless the -b flag is used.
Flags:
-a Residue names to delete (comma separated list)
-n Residue numbers to delete (comma separated list, ranges accepted)
-b Both name and number must be matched to delete a residue
-v Inverse operation (i.e., keep matching residues and delete everything else)
mol_del_water
Delete water molecules from a file (or alternatively delete nonwaters)
Writes out _dry file containing unsolvated molecules or _wat file containing waters
Note: Water is defined has having a residue name that starts with TIP or HOH
Flags:
-n Negate. ie write out waters instead of nonwaters
mol_divide
Separate a multi-molecule file (eg Tripos mol2 or Schrodinger mae) containing into individual files, each containing one molecule.
Files are put into a directory called <filebase>.dir. Each molecule is renamed with an Insight-safe name. The default behaviour renames the output file to match the molecule name. Other file naming styles can be selected using the -s flag
To separate a single molecule into multiple structures see 'mol_split'
For really big files (multiple thousands of structures) consider using sdfsplit, mol2split or pdbsplit which do not parse the file and are much faster
For convenience the script makes a pymol load script 'load.pml' in the output directory. Run this script in pymol to load all the files
For more control over splitting PDB files by chain, waters etc see 'pdbsplit'
Flags:
-stride_<val> Write out structure every 'val' steps
-n Starting number for renumbering
-s Output style for file names. molname: molecule name (not checked for duplicates!). molname_i 'insight_safe' molecule name. numbered: numbered, no leading zeros, fnumbered: numbered, leading zeros
-o Output format
-force Overwrite existing output
-p Make pymol load script 'load.pml' in output directory
mol_ensemble_average
Given a number of molecules of the same composition as input, write out a molecule where the position of each atom is averaged over the whole ensemble.
Flags:
-l Consider the largest fragment only
-t Translate the molecule (or largest fragment if -l) to centre of mass
mol_extents
Calculate the maximum and minimum X, Y and Z coordinates of a molecule and the centre point.
Values are written to STDOUT
mol_filter
Filter a set of molecules by SDF property and/or molecular weight. Can be used to retain molecules with a unique SDF_CODE
Molecules meeting critera are written out to <filebase>_flt.<ext>
Flags:
-p SDF property name
-max Property maximum value
-min Property minimum value
-mwmax MW maximum value
-mwmin MW minimum value
-druglike Select druglike compounds
-leadlike Select leadlike compounds
-fragmentlike Select fragmentlike compounds
-u Retain only a single representative with this ID
-np Use noparse option (much faster, but currently only available for sdf files)
mol_fp
Generate silico fragments for a molecule
Fragments are written to <filebase>_frag.dat
Flags:
-fmax Max fragment size
-fmin Min fragment size
-at Fragment atom typing [element (simple typing by atomic element, default), none (all atoms have the same type)]
-bt Bond typer [simple (single, triple and double/aromatic bonds), none (all bonds have the same type) ]
-h Include hydrogens in fragments [none (default), polar, all]
mol_get_name
Extract molecue names from a file and print them to the screen
Flags:
-f_<string> SDF field to use for molecule name
mol_hydrogen_bonds
Find all hydrogen bonds in a system.
Uses the hydrogen bond definition developed by McDonald and Thornton (J. Mol. Biol. 1994, 238, 777-793) which specifies maximum distances and angles for A...H-D and A..D.
Note that to increase the H-bond distance, you must increase both the A..H-D and A..D distances.
The default output filename is derived from the first. This can be changed using the -O flag.
Flags:
Hydrogen bond parameters
-d_<val> Maximum Donor-Acceptor distance (default 3.9 Ang)
-h_<val> Maximum Hydrogen-Acceptor distance (default 2.5 Ang)
-a_<val> Minimum Donor-Hydrogen-Acceptor angle (default 90 deg)
-b_<val> Minimum Hydrogen-Acceptor-Substituent angle (default 90 deg)
Timestep parameters
-ts_<val> Timestep between files (ps)
-i_<val> Time at first file (ps)
Atom/molecule selection options
-ignh Ignore hydrogen atoms [do not use: not yet implemented]
-wat Include water molecules
Input file options
-copy Copy data from first molecule to subsequent molecule. This is good for MD trajectories and series of PDB files
Output file options
-energy Print molecular energies to output file
-helix Print numbers of i-i+3 and i-i+4 H-bonds for each structure
-write Write out a file containing each input structure. Atoms involved in hydrogen bonds are contained in Sybyl sets
-writehb_<val> Write out only those molecules containing >= val H-bonds. Atoms involved in hydrogen bonds are contained in Sybyl sets (It may be useful to use the -o mol2 flag with this option)
-list List all hydrogen bonds in each structure
-ens Summarise the number of times each H-bond was found in output file. Assumes that all input molecules are members of an ensemble
mol_label_fg
Test script to test the subroutine 'mol_label_functional_group'
Flags:
-aa Label aa backbone
-het Label heterocycles
mol_merge
Merge all molecules in a file in to a single molecule
Flags:
-n Do not rename residues
mol_mw
Calculate molecular weight and molecule extents
MW data is calculated for the parent molecule (molecule with most atoms in file)
Values are printed to standard output and added to output file as SDF_DATA
Flags:
-addh Add hydrogens
-v Fill valence
-print Print data to a file 'mw.txt'
-min Minimum molecular weight to output (ie skip molecules below this mass)
-max Maximum molecular weight to output (ie skip molecules above this mass)
mol_rama
Constructs a Ramachandran plot for a molecule containing Alpha Amino acids
Flags:
-force Force overwrite of existing output files (default: off)
-o Output format (default: PostScript)
-print Print a hardcopy (default: off)
-residue Make one plot for each residue
-debug Print extra debugging information (default: off)
mol_rename
Rename molecules
Can also:
change the molecule name to the filename
change the molecule filename to match the molecule
rename the molecule using a specified SDF data field.
Using both the -r<datafield> and -c flags can be used to change filename to the specified SDF data field.
Can generate 'insight_safe' names. i.e. So that they do not contain spaces, punctuation, start with an underscore or a digit and are of limited length.
Flags:
Multiple changes
-g_<base> Generate new name using <base>, add a number and change SDF Name field (same as -b, -n, -s)
-mips_<start> Generate new MIPS code and set SDF Code field starting from suplied number
-b Set this molecule base name
-f Change molecule name to filename\
-r_<string> Rename molecules using the <string> SDF Data field
Options the modify the molecule name
-safe Use insight-safe names (ie that do not contain spaces, punctuation, start with an underscore or a digit and are of limited length)
-n Add a number to the end of the name
Options that modify molecule data (SDF data)
-s Transfer the molecule name to the SDF Data fields 'NAME' and 'title' : -sdfield_<field> Change specified SDF field to molecule name
Options that modify the filename
-c Change output filename to name of first molecule
-k Keep the same filename, overwriting the input file
Other options
-np Use noparse option (much faster, but currently only available for sdf files)
mol_renumber
Renumber and/or rename atoms and/or residues in a molecule
Flags:
-a <Starting atom number> Renumber atoms starting from this number
-s <Starting residue number> Renumber residues starting from this number
-c <Starting chain letter> Relabel chains starting from this letter
-ra Rename all atoms. Heavy atoms are numbered from 1 hydrogens are named according to the heavy atom they are connected to
-rr Rename atoms within each residue. Heavy atoms are numbered from 1 hydrogens are named according to the heavy atom they are connected to
-simple Rename atoms in residues. All atom elements are renumbered from 1
mol_rescale_bonds
Rescale a molecule so that the carbon-carbon bonds have a reasonable average bond length.
This defaults to 1.5 Angstroms, however, it can be adjusted through use of the -l flag. Alternatively, a scaling factor can be used by means of the -f flag.
This is useful to clean up files that have come out of Isis databases before they are minimised by some other program (eg Insight).
Flags:
-f <factor> Scaling factor (not to be used with -l)
-l <length> Target C-C bond length (not to be used with -f)
-o <format> Output format
-split Split multiple molecule files into a separate directory. Each molecule is renamed with an "Insight safe" name
mol_rot
rotate a molecule about any vector
-x_<number> }
-y_<number> } Vector to rotate around (assumed to pass thru origin)
-z_<number> }
-test Testing routine: rotate the first molecule only about the axis by 60 degrees
-a_<number> Angle to rotate molecule through
-random_<number> Generate this number of randomly rotated molecules
-maximise Approximately maximise the extents of the molecule along the X and Y axes (ie align the major axis along the X axis and the medium axis along the Y axis).
SF
mol_rot_bond
Set the torsion angle between four atoms to a specified value.
The four atoms are not necessarily bonded to each other, however it makes more chemical sense if they are (provided the middle two are not in a ring). If the middle two atoms are in a ring, nothing will be done.
Flags:
-a_<number> Atom A
-b_<number> Atom B
-c_<number> Atom C
-d_<number> Atom D
-w_<number> Desired torsion angle (degrees)
mol_rotrans
Apply rotations and/or translations to a file
The rotation (about the X, Y or Z axis) is applied first, followed by the translation
Flags:
-x_<number> }
-y_<number> } X, Y and Z rotation angles in degrees
-z_<number> }
-a_<number> }
-b_<number> } X, Y and Z translation distances in Angstroms
-c_<number> }
mol_segment
Split all molecules in file to separate molecules (based on connectivities) and recombine them in to a single molecule. Each molecule is placed in a separate segment (M001 ... MXXX).
mol_size_shape
Calculate the size and shape of each molecule (by connectivity) in a file
Default is to exclude small molecules (< 10 atoms)
Flags:
-s Minimum size of molecules to include (default 10)
mol_smiles
Generate a SMILES string for a molecule
Smiles string is added to 'smiles' record of SDF_DATA in output file
Flags:
-b Include explicit bond orders
-h Include explicit hydrogen atoms
-k Use Kekule bonds and non-aromatic atom symbols
mol_solvate
Make a solvated box around a molecule with a density of 1. The result needs to be minimised!
Only one file may be supplied as an argument.
Default water residue name is HOH with atoms labelled OH2, H1, H2. Using the -amber flag produces residue name WAT with atom names O, H1, H2 Using the -gromacs flag produces residue name SOL with atom names OW, HW1, HW2
Note that using the default density of 1 g/mL to solvate proteins or bilayers will probably overestimate the number of water molecules required to produce a realistic total system density
Usage: mol_solvate <file> [<flags>]
Flags:
-x_<number> }
-y_<number> } Dimensions of the box in each direction
-z_<number> }
-f_<file> File containing solvent molecule to add. Water will be used if no file is supplied.
-r_<string> Residue name to call solvent molecules (default is read from file, or HOH)
-g_<number> Margin to add around molecule
-n_<number> Add this number of solvent molecules
-d_<number> Required density (default 1 g/mL)
-p_<number> Min packing distance between solvent molecules
-i Observe packing distance only for solvent-solute distances, not solvent-solvent
-t Translate solute molecule to coordinate origin
-b_<number> Solvate bilayer. ie Don't put solvent molecules within <number> A of bilayer plane. Default is XZ. XY plane if -xy flag is used
-xy Solvate bilayer in the xy plane
-amber Give water molecules AMBER names (resname WAT, Oxygen O, hydrogens H1, H2), and also adds TER after each water molecule
-gromacs Give water molecules GROMACS names (resname SOL, Oxygen OW hydrogens HW1, HW2),
-chain Each solvent molecule in its own chain. (-amber will also enable this option.)
mol_sort
Script to reorder the atoms in a file
Atoms are sorted by chain, residue number, atom number
Contains an option to reorder residues within a file. This option resets the CHAIN and SEGID identifiers to prevent undesired effects in the sorting routine.
Flags:
-r Rearrange residue order (takes input from command line)
mol_sort_fp
Calculate average Tanimoto coefficients of molecules within a set of compounds and sort the output by average Tanimoto coeff
Flags:
-w Write fragments into SDF Data
-fmax Max fragment size
-fmin Min fragment size
-at Fragment atom typing [element (simple typing by atomic element, default), none (all atoms have the same type)]
-bt Bond typer [simple (single, triple and double/aromatic bonds), none (all bonds have the same type) ]
-h Include hydrogens in fragments [none (default), polar, all]
mol_split
Split a molecule file into separate molecules based on connectivities.
Molecules are named using the RESIDUE name of the first residue by defualt
To separate a multi-molecule file into individual structures see 'mol_divide'
Known Bugs -d option does not work properly with pdb files
Flags:
-s Name molecules by by SEGID
-l Keep each molecule's largest fragment only
-d Write each output molecule as a separate file
-min Retain only molecules with at least this number of atoms
mol_split_segid
Split a molecule file into separate molecules based on SEGID. Molecules are written out to a single file. Molecules with no defined SEGID are assigned to the SEGID 'NONE'.
mol_wrap_cell
Wrap all molecules back in to a unit cell. A given atom can be selected to centre the system around using the -a flag. Otherwise, the centre of the largest fragment will be used. Useful for molecule dynamics output where some molecules have wandered out of the unit cell
Flags:
-ignore Ignore unit cell dimensions in individual files
-x_<number> Default unit cell's X dimension
-y_<number> Default unit cell's Y dimension
-z_<number> Default unit cell's Z dimension
-i_<file> Index file
-a_<number> Atom number to centre the system around
-g_<string> Name of index group to centre the system around
-t Translate the centre to (0,0,0)
namd_fix_back
Set the occupancy field of a pdb file for use as a NAMD constraint file:
Backbone atoms are constrained with the force constant given by -c.
All other atoms are free.
Flags:
-c_<number> Constraint (kcal/mol/Ang^2) (default 1)
namd_fix_heavy
Set the occupancy field of a pdb file for use as a NAMD constraint file.
Hydrogen and water atoms are free.
Sodium and Chlorine (ie salt) atoms are also free.
All other atoms are constrained with the force constant given by -c.
Flags:
-c_<number> Constraint (kcal/mol/Ang^2) (default 1)
namd_write_consref
A script to write out a constraint reference file (PDB format) for any molecule
Also includes a force constant to use
Flags:
-a_<integer> Reference Atom Number (defaults to 1)
-k_<number> Force Constant (defaults to 0.09)
-d_<number> Dihedral Force Constant for amides (defaults to 1000)
name_atoms_simple
Rename atoms in a molecule using a simple scheme
pdb_rename_hydrogens
Rename hydrogen atoms so that they have the correct PDB nomenclature. Using the -charmm flag will produce charmm27 atom names. Using the -cyana flag will produce cyana2 names.
Note: Particular attention must be paid to the delta-carbon of isoleucine residues, which is also renamed. The -charmm flag will name ILE CD as CD. The -cyana flag will name ILE CD as CD1.
C-terminus and N-terminus hydrogen names are currently not generated.
Flags:
-charmm Generate charmm27 atom names
--charmm-atom-names
-cyana Generate cyana2 atom names
--cyana-atom-names
-f <filename> Filename containing atom names and connectivities
--datafile=<filename>
-d Delete any hydrogens that can not be given a name
--delete-unknown-h
-o <format> Output file format (default PDB)
--output-file-format=<format>
David K. Chalmers, 3 February 2000
pdbsplit
Split a PDB file into smaller pieces based on TER or MODEL records without parsing the file
Use:
pdbsplit file.pdb -s fnumbered produces file_00001.pdb, file_00002.pdb, etc
pdbsplit file.pdb -s numbered produces file_1.pdb, file_2.pdb
Flags:
-s Filename output style
-d <dirname> Output directory
-end Split on END records
-endmdl Split on ENDMDL records
-ter Split on TER records
-chain Split at end of each chain/SEGID. Name output files by chain
-all Split at all of the above (default but unset by selecting one of the above)
radius_of_gyration
Calculates the radius of gyration for a series of molecules
Flags:
-ts_<number> Timestep between files (ps)
-i_<number> Time at initial file (ps)
-a Use all fragments (not just the largest one) to calculate radius of gyration
-g_<string> Use only atoms in index group "string"
-n_<file> Index file in which to find the group
random_box
Fill a cell with molecules in random orientation. An existing molecule file can also be embedded in the box (solute).
Use
random_box -x 10 -y 10 -z 10 -n <number of molecules> <solvent filename> -e <molecule to embed>
random_box -d <required density of system> <solvent filename>
random_box -d <required density of system> -w <weight percent of solvent> <solvent filename>
Flags:
-x }
-y } Dimensions of the box (Angstroms)
-z }
-cx }
-cy } Centre of the box (Angstroms)
-cz }
-e_<filename> Embed molecule filename
-n_<number> Number of molecules to add
-d_<number> Calculate number and add molecules to give this density of entire molecular system (including solute)
-wp_<weight percent> of solvent (this modifies the density value - final density = density * weight-percent/100)
-o_<format> Output format (default: Gromacs .gro)
-ignore_h Ignore hydrogen atoms when placing molecules
-t Translate embed molecule to centre (default on)
-md Minimum distance between molecule heavy atoms
-mc Maximum number of clashes allowed when adding new molecules
-mci Increment maximum number of clashes after this number of unsuccesful trials
randomise_conformation
Produce a random conformation of a molecule by performing arbitrary rotations about bonds.
Flags:
-n <number> Maximum number of rotations to perform
--number-of-rotations=<number>
-d <distance> Distance below which two atoms clash
--clash-distance=<distance>
-min <number> Maximum number of steps to minimise in Sybyl
--minimisation-steps=<number>
-r Calculate RMS to original molecule after each rotation
--rms-values
-o <format> Output format
--output-file-format=<format>
read_write_charmm_rtf
Generate CHARMm topology file (rtf) from input file.
Currently designed to work on residues as separate molecules. Separate residues are converted to GROUPS
Also writes pdb and mol2 format files
Flags:
-p,--parameter-file Charm parameter (prm) file
read_write_cml
Read any silico format and write a Chemical Markup Language format file.
Under development and incomplete
read_write_merck
Read any Silico format and write a Merck format file
Flags:
-b,--regenerate-bondorders Regenerate bondorders
read_write_mmod
Read a molecule and write a Macromodel format file.
read_write_mol
Test script to read any Silico format and write (by default) in the same format.
Flags:
-o_<format> Output format
-O_<filename> Output filename
-check Run molecule check
read_write_mol2
Read any silico format and write a mol2 file.
Flags:
-r Rename molecules using SDF data field for conversion from SDF to mol2
-mm Write output in MolMol Mol2 format.
-p Write output in Mol2 Protein format.
--single Use single molecule read/write routines
-dr Print additional debugging information for rings
read_write_mopac
Read any Silico format and write a MOPAC cartesian file.
read_write_pdb
Read any Silico format and write a pdb file
Flags:
-d Delete disordered (ALT) atoms
--single Use single molecule read/write routines
-debug
read_write_rtf
Test script to read and write CHARMm rtf files
read_write_sdf
Read any silico format and write an sdf file.
Optionally add a 'name' field, rename the molecule or remove SDF data
Flags:
-s Starting structure number
-n Number of structures to read
-r SDF data field to use if renaming molecules
-a Add 'name' field to SDF_DATA using molecule name encoded in the first line of the file
-clean Remove all sdf data (except name)
-noparse Do not parse SDF data (Only works with SDF input files)
--single Use single molecule read/write routines
read_write_seq
Sequence format test script
Silico protein/DNA sequence format routines are under development and incomplete
Flags:
-c,--combine Combine sequences from all files to a single file
-n,--number-of-residues Number of residues per line in output
renumber_residues
Renumber residues in a file. Number SUBCOUNTs sequentially from 1, and make the SUBID for any atom the same as the SUBCOUNT for that atom. Optionally, use a different start and a different increment.
Molecule is sorted before renumbering. All hydrogens are forced to have the same residue name, residue number, chain and segid as their parent heavy atom
Flags:
-s_<number> New starting residue number (default 1)
-i_<number> New increment (default 1)
residue_rename
Change a single residue name
Rename a single residue type
Flags:
-a Rename all residues
-e_<residue_names> List of residue names to change
-n_<numbers> List of residue numbers to change
-r_<residue_name> New residue name
rms
Calculate RMS distances between molecules without superimposition
The first structure in the first file is used as the reference structure. The RMS distance is calculated to all subsequent molecules. Output is written to <ref_file>.rms
Heavy atom RMS is calculated by default. The -all flag can be used to include hydrogens in the calculations
Flags:
-a Use all atoms including hydrogens to calculate RMS. Uses heavy atoms by default
-s Sort atoms into smiles order before doing RMS comparison. This may be useful if molecules have different atom orders.
-w Write out file containing RMS as SDF_DATA
scale
Scale a molecule by a factor
Flags:
-f <number> Scale factor
-o <format> Output file format (default: input format)
sdf_add_id
Script to add an identifier field (SILICO.ID) to an sdf file.
By default the identifier of the format XXdddddddd where XX are random letters and dddddddd is an eight digit integer starting from 00000001
Flags:
-i <field> SDF DATA field containing identifier data
-r <field> SDF DATA field to use when renaming molecules
sdfsplit
Fast split program to divide an sdf file into smaller files without parsing the file
Use:
sdfsplit file.sdf -s fnumbered produces file_00001.sdf, file_00002.sdf, etc
sdfsplit file.sdf -s numbered produces file_1.sdf, file_2.sdf
sdfsplit -r <CODE> renames molecules using the data in field <CODE>
Flags:
-r <datafield> Rename molecules using the given SDF data field
-s (numbered / fnumbered) Output style
-n <num> Number of structures to write to each output file (default 1)
-d <dirname> Output directory
seq_similarity
Calculate pairwise identity and strong and weak homologies for a set of sequences
Uses the strong and weak conserved groups described in the ClustalX documentation corresponding to the *, : and . labels output by Clustal
Note: Silico protein/DNA sequence format routines are under development and incomplete
Flags:
-c Combine sequences from all files to a single file (all.XXX)
--combine Equivalent to -c
-o Output format
--output-format Equivalent to -o
shape_tensor
Calculates the gyration tensor, the shape ellipsoid, and three shape descriptors of a series of molecules.
Flags:
-ts Timestep between files (ps)
-i Time at initial file (ps)
-e Write mol2 file containing molecule and shape ellipse
slurp
Test for slurp routine which reads molecule files into simple text strings
stacking
Calculates the extent of planar ring stacking in a molecule system
Flags:
-f_<string> Output text file suffix (defaults to _stacking.dat)
-ts_<number> Time between files (ps)
-i_<number> Time at first file (ps)
-sd_<number> Stacking distance cutoff (Angstroms)
-sa_<number> Stacking angle cutoff (degrees)
-cd_<number> Clustering distance cutoff (Angstroms)
-size Maximum stack size to record in a separate column
-write Write out a Mol2 file for each input file
Ring type flags (at least one must be used)
-p Count stacks of all planar rings
-b Count benzene stacks
-n Count naphthalene stacks
starmaker
All-purpose dendrimer builder.
Written by David Chalmers and Ben Roberts
Use: starmaker5 core.mol2 monomer.mol2 monomer.mol2 monomer.mol2 ... cap.mol2
Each mol2 file should contain the monomer for that generation. Each monomer needs to have attachment points. These are hydrogen atoms named Q1 (point to branch FROM) or Q2 (point to attach TO). The first (core) subunit should contain at least one Q1 hydrogen. A standard monomer should contain one Q2 hydrogen and at least one Q1 hydrogen. A capping group should contain one Q2 hydrogen.
Outputs a mol2 file 'final_dendrimer.mol2' and intermediate states as layer_XX.mol2
Main features:
The dendrimer is built layer by layer. The first residue in the list is named COR. Subsequent residues are named GAA, GAB, GAC, etc. Amide bonds in each monomer are recognised and converted to a trans geometry. A repulsion potential and random torsional search is used to force monomers to grow in a extended geometry.
At the completion of each layer the molecule is minimised using Sybyl (although this can be turned off using a flag).
Additional features:
A file 'stop' in the working directory will terminate the program
Flags
-rc Force first residue to be renamed to 'COR'.
-clash Distance below which two atoms clash
-noh Do not include hydrogens in geometry optimisation (default)
-cut Distance above which the repulsive potential is ignored
-iter Maximum number of iterations when optimising geometry (no clashes)
-conv Number of steps with the same best coordinates before optimisation stops
-min Minimise with Sybyl for this maximum number of steps (default 5000)
-wc Write out structure after each monomer to file current.mol2
-wi Write out structure after every optimisation step
statistics
Reads in and calculates statistics (various measures of centre and spread) for a column of values in each of one or more files.
tabulate
Output data from SDF_DATA fields to a tab delimited text file and produce histograms of data values using gnuplot
Designed to act principally on SDF files but will also work with some data (DOCK and PMFSCORE output) read from comment lines in mol2 format
unit_cell_size
Report the individual volumes and mean volume of the unit cells of a series of molecules.
Flags:
-i Initial time (default 0)
-ts Time step between files
water_to_ion
Replace random water molecules (single atom) counter ions. Defaults to Na ions. Covers a very large range of inorganic ions.
Flags:
-e_<string> Ion element (default Na). Now handles a large range of single-atom cations and anions.
-n_<number> Number of ions to add
-r_<string> Residue name of added ions (optional)
-a_<string> Atom name to use for ions (optional)
-amber Add PDB TER record after each ion to produce input for Amber
write_mol
Test script to read any Silico format and write (by default) in the same format.
Flags:
-check Check integrity of molecules
write_seq
Sequence format test script
Silico protein/DNA sequence format routines are under development
Flags:
-combine Combine sequences from all files to a single file (all.XXX)
-nodup Remove duplicate sequences
-nostop Remove any amino acid sequences containing a stop ($)
-clean Clean up sequences (equivalent to -nodup -nostop)
-nl_<number> Number of residues per line in output
-split Split sequences into individual files
-sort Sort sequences alphabetically by name