Tuesday, September 4, 2012

Major Biological Databases

Major Biological Databases Available Via the World Wide Web


________________________________________________________________________________

AceDB Genome database for Caenorhabditis elegans
www.acedb.org

________________________________________________________________________________
DDBJ Primary nucleotide sequence database in Japan
www.ddbj.nig.ac.jp

________________________________________________________________________________
EMBL Primary nucleotide sequence database in Europe
www.ebi.ac.uk/embl/index.html

________________________________________________________________________________
Entrez NCBI portal for a variety of biological databases
www.ncbi.nlm.nih.gov/gquery/gquery.fcgi

________________________________________________________________________________
ExPASY Proteomics database
http://us.expasy.org

___________________________________________________________________________________
FlyBase A database of the Drosophila genome
http://flybase.bio.indiana.edu/
_________________________________________________________________________________
GenBank Primary nucleotide sequence database in NCBI
www.ncbi.nlm.nih.gov/Genbank

________________________________________________________________________________
HIV databases HIV sequence data and related immunologic information
www.hiv.lanl.gov/content/index

________________________________________________________________________________
Microarray gene expression database DNA microarray data and analysis tools
www.ebi.ac.uk/microarray

________________________________________________________________________________
OMIM Genetic information of human diseases
www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM

________________________________________________________________________________
PIR Annotated protein sequences
http://pir.georgetown.edu/pirwww/pirhome3.shtml

________________________________________________________________________________
PubMed Biomedical literature information
www.ncbi.nlm.nih.gov/PubMed

________________________________________________________________________________
Ribosomal database project Ribosomal RNA sequences and phylogenetic trees derived from the sequences
http://rdp.cme.msu.edu/html

________________________________________________________________________________
SRS General sequence retrieval system
http://srs6.ebi.ac.uk

________________________________________________________________________________
SWISS-Prot Curated protein sequence database
www.ebi.ac.uk/swissprot/access.html

________________________________________________________________________________
TAIR Arabidopsis information database
www.arabidopsis.org

Sunday, September 2, 2012

Homology modeling of protein using Modeller Software


Step 1: Installing Modeller Software

Download and install latest version of  Modeller software:http://salilab.org/modeller/download_installation.html
After registration, you will get an academic licence for installation.

Step 2: Download and install python:http://www.python.org/download/releases/2.4/

Step 3: Paste your raw protein sequence and search against pdb database using BLASTP program

http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastp&BLAST_PROGRAMS=blastp&PAGE_TYPE=BlastSearch&SHOW_DEFAULTS=on&LINK_LOC=blasthome
Note: Choose database search set as protein data bank (pdb). find the best template hit. The template hit should have atleast greater than 35% of identity. Note down the pdb id (Eg:1BDM).

Step 4: Now download the pdb file from pdb database:http://www.rcsb.org/pdb/home/home.do

Step 5: Go to C:\\Program Files\Modeller folder\

Now, copy the downloaded pdb file in modeller folder. Now rename the pdb file name as "1bdm.pdb"

Step 6: Now, open new notepad file, copy and paste the following program

>P1;TvLDH
sequence:TvLDH:::::::0.00: 0.00
MSEAAHVLITGAAGQIGYILSHWIASGELYGDRQVYLHLLDIPPAMNRLTALTMELEDCAFPHLAGFVATTDPKA
AFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSKWAKPSVKVLVIGNPDNTNCEIAMLHAKNLKPEN
FSSLSMLDQNRAYYEVASKLGVDVKDVHDIIVWGNHGESMVADLTQATFTKEGKTQKVVDVLDHDYVFDTFFKKI
GHRAWDILEHRGFTSAASPTKAAIQHMKAWLFGTAPGEVLSMGIPVPEGNPYGIKPGVVFSFPCNVDKEGKIHVV
EGFKVNDWLREKLDFTEKDLFHEKEIALNHLAQGG*
Now, replace the existing sequence with your sequence. Only, sequence has to be replaced. Now, save the notepad file as "TvLDH.ali". Give the file name within the double quote.

Step 7: Now, open new notepad file, copy and paste the following program

from modeller import *
env = environ()
aln = alignment(env)
mdl = model(env, file='1bdm', model_segment=('FIRST:A','LAST:A'))
aln.append_model(mdl, align_codes='1bdmA', atom_files='1bdm.pdb')
aln.append(file='TvLDH.ali', align_codes='TvLDH')
aln.align2d()
aln.write(file='TvLDH-1bdmA.ali', alignment_format='PIR')
aln.write(file='TvLDH-1bdmA.pap', alignment_format='PAP')
Now, save the notepad file as "align.py". Give the file name within the double quote. It will automatically changed into python format.
Now, run the "align.py" by double clicking it. It will run now and create 'TvLDH-1bdmA.ali' and 'TvLDH-1bdmA.pap' file. Depend on the sequence length, it will take time to complete.
Step 8: Now, open new notepad file, copy and paste the following program
from modeller import *
from modeller.automodel import *
env = environ()
a = automodel(env, alnfile='TvLDH-1bdmA.ali',
              knowns='1bdmA', sequence='TvLDH',
              assess_methods=(assess.DOPE, assess.GA341))
a.starting_model = 1
a.ending_model = 5
a.make()
Now, save the notepad file as "model.py". Give the file name within the double quote.
Run the "model.py" file by double clicking it. It will run and generate 5 pdb model.
Now check the model efficiency and accurecy by validating it. Go to SAVES server. http://nihserver.mbi.ucla.edu/SAVES/. upload the modelled pdb file, and run all the program.
Check the ramachandran plot value (should be greater than 95%) and errat value (should be great then 90%). If it is less, Do model optimization, energy minimization and protein (molecular dynamics) simulation.

Installing AutoDock in Ubuntu Linux Easy steps




Go to Command Prompt and type:  sudo apt-get install autodocktools
Go to http://autodock.scripps.edu/ and In download option, click ADT to dowload latest version of MGL-Tools downloads page.
After few seconds, It will ask Yes or No to install further, type yes.
Download file MGLTools-1.5.2-Linux-x86-Install to your home directory (e.g. /home/cepe/Desktop).
If you do not know where is your home directory,
Open terminal window by going to Applications --> Accessories --> Terminal; type cd ~ and press Enter; type pwd and press Enter; your home directory will be printed on the screen.
Open terminal window by going to Applications --> Accessories --> Terminal.
Type chmod +x MGLTools-1.5.2-Linux-x86-Install and press Enter.
Type ./MGLTools-1.5.2-Linux-x86-Install and press Enter.
An installation window should appear on the screen. Press Next, give yes, and pressNext, Next, Finish


Friday, August 31, 2012

How to run Blast?

Basics of blast

BLAST is an acronym for Basic Local Alignment Search Tool and refers to a suite of programs used to generate alignments between a nucleotide or protein sequence, referred to as a “query” and nucleotide or protein sequences within a database, referred to as “subject” sequences. The original BLAST program used a protein “query” sequence to scan a protein sequence database. A version operating on nucleotide query” sequences and a nucleotide sequence database soon followed. The introduction of an intermediate layer in which nucleotide sequences are translated into their corresponding protein sequences according to a specified genetic code allows cross-comparisons between nucleotide and protein sequences. Specialized variants of BLAST allow fast searches of nucleotide databases with very large query sequences, or the generation of alignments between a single pair of sequences. Both the standalone and web version of BLAST are available from the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov). The web version provides searches of the complete genomes of Homo sapiens as well as those of many model organisms, including mouse, rat, fruit fly, and Arabidopsis thaliana, allowing BLAST alignments to be seen in a full genomic context


Query and Database Sequence Formats

BLAST “query” sequences are given as character strings of single letter nucleotide or amino acid codes, preceded by a definition line, beginning with a “>” symbol and containing identifiers and descriptive information. This format is known as FASTA. BLAST databases are constructed from concatenated FASTA formatted sequences using a program called “formatdb” that produces a mixture of binary- and ascii-encoded files containing the sequences and indexing information used during the BLAST search.


Overview of the Algorithm

BLAST begins a search by indexing all character strings of a certain length within the “query” by their starting position in the query. The length of the string to index, called the “wordsize” is configurable by the user. The allowable range for the “wordsize” varies according to the BLAST program used; typical values are 3 for protein-to-protein sequence searches and 11 for nucleotide to nucleotide searches. BLAST then scans the database looking for matches between the “words” indexed in the “query” and strings found within the database sequences. For nucleotide-to-nucleotide searches, these matches must be exact; for protein-to-protein searches, the score of the match as determined using a substitution matrix, must exceed a specified threshold. When a word match is found, two nearby words in the case of protein searches, BLAST attempts to extend both forward and backward from the match to produce an alignment. BLAST will continue this extension as long as the alignment score continues to increase or until it drops by a critical amount owing to the negative scores given by mismatches. This critical amount is known as the “dropoff.” The methods BLAST uses to initiate refine alignments

Course and Website


This BLAST Quickstart chapter illustrates the use of the principal BLAST programs to solve problems that arise in the analysis of protein and nucleotide sequences. Each section provides a succinct description of a protocol with two problems that serve as practical examples. Relevant theory is given when it affects the selection of a search strategy or search parameter, however, the emphasis is on the procedure itself. The sections follow closely the structure of the BLAST QuickStart Mini-Course found at Here The BLAST QuickStart is one of 10 2-h format Mini-Courses offered by NCBI on campus at the National Institutes of Health and at locations around the country to over 4000 students a year. The courses use a paired problems approach in which the first of two similar problems or problem sets is solved by the instructor during the first hour on a computer linked to a projection system, while the students watch; in the second hour, the students tackle the second problem, or set of problems at their own computers. These courses have been effective as practical introductions to bioinformatics procedures. To get the most from the sections next, it will be necessary to navigate to the URL previously listed and click on the “BLAST Quickstart” link to reach the online exercises, although the liberal collection of screen shots will allow the reader follow along for the most part without web access.


step by step procedure.


1. open NCBI website.


2. now click on blast.


Blasta


3. copy and paste your sequence in given field.


blasta


4. select desire parameter for your selection.


blasta


5. click of blast.

6. after 10-15 second it display the result. 


blasta