Molecular Modeling

 

Computer Exercises in

 

Protein X-ray Crystallography and Molecular Mechanics

 

 

 

Paul D. Swartz and Carla Mattos

Department of Molecular and Structural Biochemistry

North Carolina State University

Raleigh, NC

 

 

 

Outline

 

Before You Start …

UNIX and Computer Basics

            Your UNIX (Linux, Irix, Unicos, SUNIX, AIX, etc.) Account

            The Password

            UNIX ommands

 

 

Protein X-ray Crystallography

     Software Source and Availability:

 

Overview of Protein Structure.

Primary Structure:

Amino Acid Sequence:

Secondary Structure:

Amino Acid Structure:

The Peptide Bond:

Stereoisomers:

Protein Stabilizing Elements:

Hydrogen bond:   

Disulfide Bonds: 

Salt bridge: 

Higher Order Secondary Structure:

The Helix:

Alpha Helix

3-10 Helix

Pi Helix

The Sheet:

The Turn:

Supersecondary structure:

Alpha-Alpha Structure: 

Alpha-beta Structure: 

Beta-Beta Structure: 

Tertiary Structure:

 

Tutorial 0.  The data files and preliminary analysis.

Step 0:            The Scalepak Output file

Step 1:            The Bravais Lattice

Step 2:            The Matthews Coefficient

 

Tutorial 1.

Step 0:             Create a working directory for refining the protein structure…

Step 1:             The molecular replacement model – from the Protein Data Bank…

Step 2:             Convert data format of the diffraction data so that CNS can read the data…

Step 3:             Create a truncated data set …

Step4:              Create a Molecular Topology File (.mtf file)….

Step 5:            Initial fitting of a structure to the electron density

   Step 5a:       The big guns of rigid refinement - Rotating…

   Step 5b:       The big guns of rigid refinement -Translating ….

   Step 5c:        Don’t get mad - get CCP4 …

Step 6:            The non-rigid refinement…

Questions for Tutorial 1.

 

Tutorial 2b.

Step 7:            Generate MTZ file

Step 8:             Coot – manual refinement of the protein structure – now the fun part!

Step 9:             CNS refinement of protein structure ….

Questions for Tutorial 2b.

 

Tutorial 3.

Step 10:           Manual refinement of crystallographic water …..

   Step 10a.      If it was necessary to use the translation and rotation algorithms to get a good fit

   Step 10b.     Copy the solvent.pdb file into the ‘o’ directory using the command ….

Step 11:           Rinse and repeat ….  More CNS refinement

Questions for Tutorial 3.

 

Tutorial 4a.

Step 12.           Adding organic solvent molecules - HICUP….

Questions for Tutorial 4.

 

 

 

Before You Start

 

UNIX and Computer Basics

 

Your UNIX (Linux, Irix, Unicos, SUNIX, AIX, etc.) Account

 

UNIX (UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company Ltd.) is case sensitive.  Commands in capitol do not work.  1RPH.pdb is a different file name from 1RPH.PDB … etc.

 

UNIX is a very powerful operating system developed by Bell laboratories in the 1970’s.  It has been added to by many different individuals and companies since and has become the pre-eminent operating system for scientific computing.  The operating system combines the Windows/MacOS point and click mode of operation with traditional command line operation and is designed to be a powerful networking system.  The most powerful computers in the world use variations of UNIX as their operating system – Cray with Unicos, and IBM RS6000 Superclusters with AIX – so if you want to continue in computational sciences it is a good idea to become familiar with UNIX in its standard form. The operating system you will encounter is Redhat Linux, version 9.0.

 

The laboratory has 9 UNIX computers to which you will have access.  The computers each have a name.

 

Time to log on to your account.  Pick your computer and have a seat.  Type in you userid (user name) and press Enter.  Fill in your assigned password and press enter again.  After a short time, a desktop will appear on the screen.  Some setup has been done for your account to create a button for Netscape and another for opening a window called a shell.  Left click on the button for opening a shell and select ‘shell’ from the list.  A window with a curser will appear. 

 

Notes on typography in this tutorial.  Commands that are to be typed into the computer are preceeded by an arrow “>” and variables in the commands are in italics.

 

The Password

 

The password that you are assigned is generic and must be changed by you to something unique.  The following are passages from the manual pages concerning password selection and protection…..

 

Remember the following two principles

Protect your password.  Do not write down your password - memorize it.   In particular, Do not write it down and leave it anywhere, and do not place it in

an unencrypted file!  Use unrelated passwords for systems controlled by different  organizations.  Do not give or share your password, in particular to someone claiming to be from computer support or a vendor.   Do not let anyone watch you enter your password. 

 

Choose a hard-to-guess password.  passwd will try to prevent you from choosing a really bad  password,  but it is not foolproof; create your password wisely.  Do not use something you can find in a dictionary (in any  language or  jargon).  Do not use a name (including that of a spouse, parent, child, pet, fantasy character, famous person, and location) or any variation of your personal or account name.  Do not use accessible information about you (such as your phone number, license plate, or social security number) or your environment.  Do not use a birthday or a simple pattern  (such as backwards, followed by a digit, or preceded by a digit. Instead, use a mixture of upper and lower case letters, as well as digits or punctuation.  When choosing a new password, make sure it is unrelated to any previous password. Use long passwords (say 8 characters long).   You might use a word pair with punctuation inserted, a passphrase (an understandable sequence of words), or  the  first letter of each word in a passphrase.

 

These  principles are partially enforced by the system, but only partly so.  Vigilance on your part will make the system much more secure.

 

To change your password use the “passwd” command:

 

>passwd

>(current) UNIX password:   (enter your current password and press Enter)

>New password:   (enter your new password and press Enter)

>Retype new password:

 

If you choose a password that the UNIX finds in its dictionary, the following response will appear:

 

      >BAD PASSWORD: it is based on a dictionary word

        New password:

 

At the prompt, enter a different password and press Enter.

 

You will have an account on the laboratory computer you are assigned and will have to set your password on that computer only. 

 

 

 

 

UNIX ommands

 

SSH

 

ssh – Secure shell - a command that allows you to log on to another computer remotely.

 

Type the command ssh followed by a computer name.  A one time message will appear asking if you want to develop a close personal relationship with the remote computer.  Type the response “yes” and press Enter.  The remote computer will ask for your password.  Provide the password and press Enter.  Now the shell in which you are working is the same as a shell that you would have opened if you had seated yourself in front of the remote computer, logged on, and opened a shell.  The passwd command can be used in this remote shell and you can remotely change your password.  When you are finished using the remote computer, the command ‘logout’ will close the connection.  Always remember to log out of a ssh connection.  Repeat the process on all of the computers in the above list.  You can use the same password on all of the computers if you wish.

 

 

LS

 

ls – list – lists the files found in the directory.  Syntax is “ls” or “ls pathname”.

 

 

CP

 

cp – copy – allows creating a copy of a file to a new file name.  Syntax is “cp file.name newfile.name”.

 

 

CD

 

cd – Change directory – a command that allows you to enter another directory.  The syntax is “cd directory name”.  Abbreviations are available.  The command “cd ../” will change directory to one slot down the directory tree, “cd ../../” goes down two, etc.  the command cd returns to the home directory.

 

 

MKDIR

 

mkdir – make directory – allows creation of directories.  The syntax is “mkdir directory-name”.

 

 

RM

 

rm – remove – allows removal of files and directories.  The syntax is rm filename for removal of files and “rm –rf directory-name” for removal of directories.  Note that the files and directories are truly removed, not copied to a trash directory that will allow retrieval. 

 

 

MV

 

mv – move – allows movement of files or directories from one location to another and renaming of files and directories.  Syntax is “mv filename newpath/filename” (or new-filename for renaming).

 

TAIL

tail – view the end of the file.  Syntax is “tail filename”.  Given no other information, this command will show the last 20 lines of the file named filename.  Given the flag –i, “tail –i filename” where i is an integer number will show the last I lines of the file.  The flag –f , “tail –f filename”, will show new lines as they appear in the file named filename.  This last flag is very useful in watching what is happening in an output file.  Exiting from  the “tail –f filename” command is achieved using cntrl+C.

 

CAT

cat – concatenates files.  The syntax is “cat file1 file2 >> file3”.  The result is that file1 and file2 are put at the bottom of file3 in consecutive order.  This allows quick and easy addition of data to files.  Concatenation can also be used to send a file to print eg. “cat file | lp” pipes (|) the concatenated file(s) to the printer (lp).

 

EDIT

 

nedit – an graphical editor available on the computers to which you have access.  Works very much like all of the other editors that you have used.  Syntax “nedit filename” will bring up an editor window with the document.  For those who want to be generic, the non-graphical vi editor is available on all UNIX operating systems and is a very hand thing to know how to use since some of the supercomputing platforms do not have graphical editors available.  Information on how to use the vi editor is available on the computer using the command line …

 

>man vi

 

Organization of computer system:  The computer lab consists of 9 linux work stations with the hard drives cross mounted so that all hard drives are accessible from each computer.  The mount points for the hard drives is in /home.  Using the hard drive of another computer is as easy as changing directories.

 

 

Protein X-ray Crystallography

 

Software Source and Availability:

     CNS –        Crystallography and NMR Systems – is available for no fee from :  http://cns.csb.yale.edu/v1.1/

*

     Ono -          http://xray.bmc.uu.se/~alwyn/Distribution/distrib_frameset.html, O is distributed by anonymous ftp server at

*

     Dejavu -     http://alpha2.bmc.uu.se/~gerard/manuals/dejavu_man.html

*

     Rave -         http://xray.bmc.uu.se/usf/rave_man.html

*

     Xutil -         http://alpha2.bmc.uu.se/usf/xutil.html

*

     PyMol -      http://pymol.sourceforge.net/

*

     APBS -       http://apbs.sourceforge.net/

*

     Coot  -        http://www.ysbl.york.ac.uk/~emsley/software/

Structure refinement

Overview of Protein Structure.

 

Primary Structure:(Return)

    

Amino Acid Sequence:(Return)

The primary structure of a protein is its sequence of amino acids.  Biological proteins are made of 20 amino acids (19 amino acids, 1 imino acid) that fall in 7 general groups, each group having particular characteristics or chemistries.  The sequence of amino acids in a protein contains the necessary information for most proteins to refold from denatured form however, at this point in time, we do not have the knowledge to define the three dimensional stricture of a protein based only on its sequence.

 

Secondary Structure: (Return)

     Secondary structure refers to the 3-dimensional shape and size of individual amino acids and organized groups of amino acids. 

Amino Acid Structure:(Return)

     Amino acids

Aliphatic Sidechain Amino Acids

Glycine

Alanine

Valine

Leucine

Isoleucine

Alcoholic Sidechain Amino Acids

Serine

Threonine

Thiolate Sidechain Amino Acids

Cysteine

Methionine

Acidic and Acidamide Sidechain Amino Acids

Aspartic Acid

Asparagine

Glutamic Acid

Glutamine

Basic Amino Acids

Arginine

Lysine

Histidine

Aromatic Sidechain Amino Acids

Phenylalanine

Tyrosine

Tryptophan

Imino Acid

Proline

 

 

The Peptide Bond:(Return)

 

     Amino acids are polymerized in a dehydration reaction to form dipeptides, polypeptides and proteins.  The condensation dehydration reaction removes water from an amine group and a carboxylate group to form an amide bond.

 

The resulting amide bond has two resonance structures that give the C-N bond a partial double bond character and form a semi-rigid plane of the atoms O-C-N-H.

 

 

 

The planarity of the peptide bond requires that the 3 dimensional structure of the protein backbone can be described as a series of F and Y angles.  The Rqamachandran plot is a plot of F and Y angles for either a single protein or a large group of proteins and reveals that the values of the angles are clustered for proteins and specific clusters of angles correspond to specific types of protein secondary structure called a-helices, b-sheets and b-turns.

 

coot

 

Stereoisomers: (Return)

 

     All amino acids are chiral and that means that proteins are as well.  Chiral refers to whether a molecule is superimposable on its mirror image.  A chiral bio-molecule is one which has a tetravalent carbon atom that has four different groups bonded.  If its reflected image is rotated 180 degrees on its axis, the reflected molecule does not superimpose on the original molecule.

 

 

A characteristic of chiral molecules is that they can rotate plane polarized light and in fact proteins do and in fact, protein crystals can rotate polarized light as well.  The direction that a chiral molecule rotates plane polarized light determines its designation as D – rotating light to the right (dextrorotatory) or L – rotating light to the left (devorotatory).  The two isomers are called enantiomers and a mixture of the two enantiomers is called a racemic mixture.  Biological systems are particular about chirality and for the most part, only the D enantiomer is used.

 

Protein Stabilizing Elements: (Return)

 

Hydrogen bond: (Return)   A hydrogen bond is an attractive electrostatic interaction between an electron withdrawing atom that has a hydrogen bonded and an electron donating atom that does not.  The electron donating atom is called the acceptor because it is accepting the hydrogen which is bonded to the donor atom.  The primary consideration for hydrogen bonding is distance between the donor and acceptor atom but an important consideration is the angle, a, described by the donor, hydrogen, and acceptor.  Unfortunately in crystallography, the location of the hydrogen atom is rarely known (only in neutron diffraction experiments) so the distance is the primary consideration.  Donor and acceptors that are between 2.5 Ang. and 3.6 Ang. apart can be considered to be in a hydrogen bonded conformation.

 

 

 

the hydrogen bond is an important and pervasive forcd in proteins and is responsible for the existence of most of the secondary and tertiary structure exhibited by proteins.

 

 

Disulfide Bonds: (Return)  Disulfide bonds are reversible covalent bonds between two cystine residues that can be greatly separated in the primary structure.  In many proteins they make important contributions to protein stability and in some they are critical to the functioning of the protein.  Insulin (2BN3.pdb), a small peptide hormone is secreted as a proenzyme that has a central segment removed post-translationally and the two smaller proteins are held together by disulfide bonds.  The SGCI (1KIO.pdb) serine protease inhibitor is a peptide with practically no secondary structure that is entirely stabilized by disulfides.

     Disulfide bonds are formed between two cystine residues via an oxidative elimination of two protons and two electrons.

 

 

Salt bridge: (Return)  The salt bridge is an electrostatic bond formed between two sidechains of opposite charge (and acidic and a basic amino acid).  Salt bridges generally cannot survive salvation or access to ionic small molecules so the salt bridges that are see ar mostly in the proteins interior.  It can be thought of as a mechanism by wich harged sidechains can be solvated within the low dielectric protein interior.  Identifying a dalt bridge is a proximity issue as in hydrogen bonding but, unlike hydrogen bonding, there is really no angular consideration.

 

Higher Order Secondary Structure: (Return)

 

The Helix: (Return)