Molecular Modeling
Computer Exercises in
Protein X-ray
Crystallography and Molecular Mechanics
Paul D. Swartz and Carla Mattos
Department of Molecular and Structural Biochemistry
Before You Start
Your UNIX (Linux, Irix, Unicos, SUNIX, AIX, etc.) Account
Software Source and Availability:
Overview
of Protein Structure.
Higher Order Secondary
Structure:
Tutorial
0. The data files and preliminary
analysis.
Step 0: The Scalepak Output file
Step 2: The Matthews Coefficient
Step 0: Create a working directory for refining the protein structure
Step 1: The molecular replacement model from the Protein Data Bank
Step
2: Convert
data format of the diffraction data so that CNS can read the data
Step
3: Create
a truncated data set
Step4: Create
a Molecular Topology File (.mtf file)
.
Step 5: Initial fitting of a structure to the electron density
Step 5a: The big guns of rigid refinement - Rotating
Step 5b: The big
guns of rigid refinement -Translating
.
Step 5c: Dont
get mad - get CCP4
Step
6: The
non-rigid refinement
Step
8: Coot
manual refinement of the protein structure now the fun part!
Step 9: CNS refinement of protein structure .
Step
10: Manual
refinement of crystallographic water
..
Step 10a. If it
was necessary to use the translation and rotation algorithms to get a good fit
Step 10b. Copy the
solvent.pdb file into the o directory using the command
.
Step 11: Rinse and repeat . More CNS refinement
Step 12. Adding organic solvent molecules - HICUP .
Before You Start
Your UNIX (Linux, Irix, Unicos, SUNIX, AIX, etc.)
Account
UNIX (UNIX is a registered trademark in the
UNIX is a very powerful operating system developed by
The laboratory has 9 UNIX computers to which you will have access. The computers each have a name.
Time to log on to your account. Pick your computer and have a seat. Type in you userid (user name) and press Enter. Fill in your assigned password and press enter again. After a short time, a desktop will appear on the screen. Some setup has been done for your account to create a button for Netscape and another for opening a window called a shell. Left click on the button for opening a shell and select shell from the list. A window with a curser will appear.
Notes on typography in this tutorial. Commands that are to be typed into the computer are preceeded by an arrow > and variables in the commands are in italics.
The password that you are assigned is generic and must be changed by you to something unique. The following are passages from the manual pages concerning password selection and protection ..
Remember the following two principles
Protect your password. Do not write down your password - memorize it. In particular, Do not write it down and leave it anywhere, and do not place it in
an unencrypted file! Use unrelated passwords for systems controlled by different organizations. Do not give or share your password, in particular to someone claiming to be from computer support or a vendor. Do not let anyone watch you enter your password.
Choose a hard-to-guess password. passwd will try to prevent you from choosing a really bad password, but it is not foolproof; create your password wisely. Do not use something you can find in a dictionary (in any language or jargon). Do not use a name (including that of a spouse, parent, child, pet, fantasy character, famous person, and location) or any variation of your personal or account name. Do not use accessible information about you (such as your phone number, license plate, or social security number) or your environment. Do not use a birthday or a simple pattern (such as backwards, followed by a digit, or preceded by a digit. Instead, use a mixture of upper and lower case letters, as well as digits or punctuation. When choosing a new password, make sure it is unrelated to any previous password. Use long passwords (say 8 characters long). You might use a word pair with punctuation inserted, a passphrase (an understandable sequence of words), or the first letter of each word in a passphrase.
These principles are partially enforced by the system, but only partly so. Vigilance on your part will make the system much more secure.
To change your password use the passwd command:
>passwd
>(current) UNIX password: (enter your current password and press Enter)
>New password: (enter your new password and press Enter)
>Retype new password:
If you choose a password that the UNIX finds in its dictionary, the following response will appear:
>BAD
PASSWORD: it is based on a dictionary word
New password:
At the prompt, enter a different password and press Enter.
You will have an account on the laboratory computer you are assigned and will have to set your password on that computer only.
SSH
ssh Secure shell - a command that allows you to log on to another computer remotely.
Type the command ssh followed by a computer name. A one time message will appear asking if you want to develop a close personal relationship with the remote computer. Type the response yes and press Enter. The remote computer will ask for your password. Provide the password and press Enter. Now the shell in which you are working is the same as a shell that you would have opened if you had seated yourself in front of the remote computer, logged on, and opened a shell. The passwd command can be used in this remote shell and you can remotely change your password. When you are finished using the remote computer, the command logout will close the connection. Always remember to log out of a ssh connection. Repeat the process on all of the computers in the above list. You can use the same password on all of the computers if you wish.
LS
ls list lists the files found in the directory. Syntax is ls or ls pathname.
CP
cp copy allows creating a copy of a file to a new file name. Syntax is cp file.name newfile.name.
CD
cd Change directory a command that allows you to enter another directory. The syntax is cd directory name. Abbreviations are available. The command cd ../ will change directory to one slot down the directory tree, cd ../../ goes down two, etc. the command cd returns to the home directory.
MKDIR
mkdir make directory allows creation of directories. The syntax is mkdir directory-name.
RM
rm remove allows removal of files and directories. The syntax is rm filename for removal of files and rm rf directory-name for removal of directories. Note that the files and directories are truly removed, not copied to a trash directory that will allow retrieval.
MV
mv move allows movement of files or directories from one location to another and renaming of files and directories. Syntax is mv filename newpath/filename (or new-filename for renaming).
TAIL
tail view the end of the file. Syntax is tail filename. Given no other information, this command will show the last 20 lines of the file named filename. Given the flag i, tail i filename where i is an integer number will show the last I lines of the file. The flag f , tail f filename, will show new lines as they appear in the file named filename. This last flag is very useful in watching what is happening in an output file. Exiting from the tail f filename command is achieved using cntrl+C.
CAT
cat concatenates files. The syntax is cat file1 file2 >> file3. The result is that file1 and file2 are put at the bottom of file3 in consecutive order. This allows quick and easy addition of data to files. Concatenation can also be used to send a file to print eg. cat file | lp pipes (|) the concatenated file(s) to the printer (lp).
EDIT
nedit an graphical editor available on the computers to which you have access. Works very much like all of the other editors that you have used. Syntax nedit filename will bring up an editor window with the document. For those who want to be generic, the non-graphical vi editor is available on all UNIX operating systems and is a very hand thing to know how to use since some of the supercomputing platforms do not have graphical editors available. Information on how to use the vi editor is available on the computer using the command line
>man
vi
Organization of computer system:
The computer lab consists of 9 linux work stations with the hard drives
cross mounted so that all hard drives are accessible from each computer. The
Software Source and Availability:
CNS Crystallography and NMR Systems is available for no fee from : http://cns.csb.yale.edu/v1.1/
*
Ono - http://xray.bmc.uu.se/~alwyn/Distribution/distrib_frameset.html, O is distributed by anonymous ftp server at
*
Dejavu - http://alpha2.bmc.uu.se/~gerard/manuals/dejavu_man.html
*
Rave - http://xray.bmc.uu.se/usf/rave_man.html
*
Xutil - http://alpha2.bmc.uu.se/usf/xutil.html
*
PyMol - http://pymol.sourceforge.net/
*
APBS - http://apbs.sourceforge.net/
*
Coot - http://www.ysbl.york.ac.uk/~emsley/software/
Overview of Protein Structure.
The primary structure of a protein is
its sequence of amino acids. Biological
proteins are made of 20 amino acids (19 amino acids, 1 imino acid) that fall in
7 general groups, each group having particular characteristics or
chemistries. The sequence of amino acids
in a protein contains the necessary information for most proteins to refold
from denatured form however, at this point in time, we do not have the
knowledge to define the three dimensional stricture of a protein based only on
its sequence.
Secondary structure refers to the 3-dimensional shape and size of
individual amino acids and organized groups of amino acids.
Amino acids
|
Aliphatic Sidechain Amino Acids |
|
Alcoholic Sidechain Amino Acids |
|
Thiolate Sidechain Amino Acids |
|
Acidic and Acidamide Sidechain Amino
Acids |
|
Basic Amino Acids |
|
Aromatic Sidechain Amino Acids |
|
Imino Acid |
Amino acids are polymerized in a dehydration reaction to form
dipeptides, polypeptides and proteins.
The condensation dehydration reaction removes water from an amine group
and a carboxylate group to form an amide bond.

The resulting amide bond has two resonance structures that give the C-N bond a partial double bond character and form a semi-rigid plane of the atoms O-C-N-H.


The planarity of the peptide bond
requires that the 3 dimensional structure of the protein backbone can be
described as a series of F and Y angles. The Rqamachandran plot is a plot of F and Y angles for
either a single protein or a large group of proteins and reveals that the
values of the angles are clustered for proteins and specific clusters of angles
correspond to specific types of protein secondary structure called a-helices, b-sheets and b-turns.

coot
All amino acids are chiral and that means that proteins are as
well. Chiral refers to whether a
molecule is superimposable on its mirror image.
A chiral bio-molecule is one which has a tetravalent carbon atom that
has four different groups bonded. If its
reflected image is rotated 180 degrees on its axis, the reflected molecule does
not superimpose on the original molecule.

A characteristic of chiral molecules
is that they can rotate plane polarized light and in fact proteins do and in
fact, protein crystals can rotate polarized light as well. The direction that a chiral molecule rotates
plane polarized light determines its designation as D rotating light to the
right (dextrorotatory) or L rotating light to the left (devorotatory). The two isomers are called enantiomers and a
mixture of the two enantiomers is called a racemic mixture. Biological systems are particular about chirality
and for the most part, only the D enantiomer is used.
Protein Stabilizing Elements: (Return)
Hydrogen bond: (Return) A hydrogen bond is an attractive
electrostatic interaction between an electron withdrawing atom that has a
hydrogen bonded and an electron donating atom that does not. The electron donating atom is called the
acceptor because it is accepting the hydrogen which is bonded to the donor
atom. The primary consideration for hydrogen
bonding is distance between the donor and acceptor atom but an important
consideration is the angle, a, described by the donor, hydrogen,
and acceptor. Unfortunately in
crystallography, the location of the hydrogen atom is rarely known (only in
neutron diffraction experiments) so the distance is the primary consideration. Donor and acceptors that are between 2.5 Ang.
and 3.6 Ang. apart can be considered to be in a hydrogen bonded conformation.

the hydrogen bond is an important and
pervasive forcd in proteins and is responsible for the existence of most of the
secondary and tertiary structure exhibited by proteins.
Disulfide Bonds: (Return) Disulfide bonds are reversible covalent bonds
between two cystine residues that can be greatly separated in the primary
structure. In many proteins they make important
contributions to protein stability and in some they are critical to the
functioning of the protein. Insulin (2BN3.pdb), a small peptide
hormone is secreted as a proenzyme that has a central segment removed
post-translationally and the two smaller proteins are held together by
disulfide bonds. The SGCI (1KIO.pdb) serine
protease inhibitor is a peptide with practically no secondary structure that is
entirely stabilized by disulfides.
Disulfide bonds are formed between two cystine residues via an oxidative
elimination of two protons and two electrons.

Salt bridge: (Return) The salt bridge is an electrostatic bond
formed between two sidechains of opposite charge (and acidic and a basic amino
acid). Salt bridges generally cannot
survive salvation or access to ionic small molecules so the salt bridges that
are see ar mostly in the proteins interior.
It can be thought of as a mechanism by wich harged sidechains can be
solvated within the low dielectric protein interior. Identifying a dalt bridge is a proximity
issue as in hydrogen bonding but, unlike hydrogen bonding, there is really no
angular consideration.
Higher Order Secondary Structure: (Return)