Basics of blast
BLAST is an acronym for Basic Local Alignment Search Tool and refers to a suite of programs used to generate alignments between a nucleotide or protein sequence, referred to as a “query” and nucleotide or protein sequences within a database, referred to as “subject” sequences. The original BLAST program used a protein “query” sequence to scan a protein sequence database. A version operating on nucleotide query” sequences and a nucleotide sequence database soon followed. The introduction of an intermediate layer in which nucleotide sequences are translated into their corresponding protein sequences according to a specified genetic code allows cross-comparisons between nucleotide and protein sequences. Specialized variants of BLAST allow fast searches of nucleotide databases with very large query sequences, or the generation of alignments between a single pair of sequences. Both the standalone and web version of BLAST are available from the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov). The web version provides searches of the complete genomes of Homo sapiens as well as those of many model organisms, including mouse, rat, fruit fly, and Arabidopsis thaliana, allowing BLAST alignments to be seen in a full genomic context
Query and Database Sequence Formats
BLAST “query” sequences are given as character strings of single letter nucleotide or amino acid codes, preceded by a definition line, beginning with a “>” symbol and containing identifiers and descriptive information. This format is known as FASTA. BLAST databases are constructed from concatenated FASTA formatted sequences using a program called “formatdb” that produces a mixture of binary- and ascii-encoded files containing the sequences and indexing information used during the BLAST search.
BLAST “query” sequences are given as character strings of single letter nucleotide or amino acid codes, preceded by a definition line, beginning with a “>” symbol and containing identifiers and descriptive information. This format is known as FASTA. BLAST databases are constructed from concatenated FASTA formatted sequences using a program called “formatdb” that produces a mixture of binary- and ascii-encoded files containing the sequences and indexing information used during the BLAST search.
Overview of the Algorithm
BLAST begins a search by indexing all character strings of a certain length within the “query” by their starting position in the query. The length of the string to index, called the “wordsize” is configurable by the user. The allowable range for the “wordsize” varies according to the BLAST program used; typical values are 3 for protein-to-protein sequence searches and 11 for nucleotide to nucleotide searches. BLAST then scans the database looking for matches between the “words” indexed in the “query” and strings found within the database sequences. For nucleotide-to-nucleotide searches, these matches must be exact; for protein-to-protein searches, the score of the match as determined using a substitution matrix, must exceed a specified threshold. When a word match is found, two nearby words in the case of protein searches, BLAST attempts to extend both forward and backward from the match to produce an alignment. BLAST will continue this extension as long as the alignment score continues to increase or until it drops by a critical amount owing to the negative scores given by mismatches. This critical amount is known as the “dropoff.” The methods BLAST uses to initiate refine alignments
Course and Website
This BLAST Quickstart chapter illustrates the use of the principal BLAST programs to solve problems that arise in the analysis of protein and nucleotide sequences. Each section provides a succinct description of a protocol with two problems that serve as practical examples. Relevant theory is given when it affects the selection of a search strategy or search parameter, however, the emphasis is on the procedure itself. The sections follow closely the structure of the BLAST QuickStart Mini-Course found at Here The BLAST QuickStart is one of 10 2-h format Mini-Courses offered by NCBI on campus at the National Institutes of Health and at locations around the country to over 4000 students a year. The courses use a paired problems approach in which the first of two similar problems or problem sets is solved by the instructor during the first hour on a computer linked to a projection system, while the students watch; in the second hour, the students tackle the second problem, or set of problems at their own computers. These courses have been effective as practical introductions to bioinformatics procedures. To get the most from the sections next, it will be necessary to navigate to the URL previously listed and click on the “BLAST Quickstart” link to reach the online exercises, although the liberal collection of screen shots will allow the reader follow along for the most part without web access.
step by step procedure.
1. open NCBI website.
2. now click on blast.
3. copy and paste your sequence in given field.
4. select desire parameter for your selection.
5. click of blast.
6. after 10-15 second it display the result.
BLAST begins a search by indexing all character strings of a certain length within the “query” by their starting position in the query. The length of the string to index, called the “wordsize” is configurable by the user. The allowable range for the “wordsize” varies according to the BLAST program used; typical values are 3 for protein-to-protein sequence searches and 11 for nucleotide to nucleotide searches. BLAST then scans the database looking for matches between the “words” indexed in the “query” and strings found within the database sequences. For nucleotide-to-nucleotide searches, these matches must be exact; for protein-to-protein searches, the score of the match as determined using a substitution matrix, must exceed a specified threshold. When a word match is found, two nearby words in the case of protein searches, BLAST attempts to extend both forward and backward from the match to produce an alignment. BLAST will continue this extension as long as the alignment score continues to increase or until it drops by a critical amount owing to the negative scores given by mismatches. This critical amount is known as the “dropoff.” The methods BLAST uses to initiate refine alignments
Course and Website
This BLAST Quickstart chapter illustrates the use of the principal BLAST programs to solve problems that arise in the analysis of protein and nucleotide sequences. Each section provides a succinct description of a protocol with two problems that serve as practical examples. Relevant theory is given when it affects the selection of a search strategy or search parameter, however, the emphasis is on the procedure itself. The sections follow closely the structure of the BLAST QuickStart Mini-Course found at Here The BLAST QuickStart is one of 10 2-h format Mini-Courses offered by NCBI on campus at the National Institutes of Health and at locations around the country to over 4000 students a year. The courses use a paired problems approach in which the first of two similar problems or problem sets is solved by the instructor during the first hour on a computer linked to a projection system, while the students watch; in the second hour, the students tackle the second problem, or set of problems at their own computers. These courses have been effective as practical introductions to bioinformatics procedures. To get the most from the sections next, it will be necessary to navigate to the URL previously listed and click on the “BLAST Quickstart” link to reach the online exercises, although the liberal collection of screen shots will allow the reader follow along for the most part without web access.
step by step procedure.
1. open NCBI website.
2. now click on blast.
3. copy and paste your sequence in given field.
4. select desire parameter for your selection.
5. click of blast.
6. after 10-15 second it display the result.
No comments:
Post a Comment