Download Bacterial Sequence Identity Activity Studying the

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Bacterial Sequence Identity Activity
Studying the microbiome for any environment the data often starts as a set of nucleotide
sequences. For bacterial communities, the 16S rRNA gene is the predominate gene used to
identify the bacteria species present. Initial community sequencing efforts utilized 16S rRNA
gene clone libraries, which has now graduated to next-generation sequencing techniques that are
completely independent of culturing bacteria and are more reliable to sequence all bacteria in a
sample. Regardless, the same gene is used and the gene sequence is matched to a reference
database to provide an identity or the most closely related organism.
A standard format for gene sequences is called FASTA. The first line is started with ‘>’ and the
description of the sequence, the remaining lines are the sequence typically reported as ‘A, T, C,
& G’.
Below are two 16S rRNA gene FASTA sequences with the description removed. The goal is to
use an online database, GenBank, to identify the bacteria they are from. To take an unknown
FASTA sequence and match to GenBank nucleotide database the Basic Local Alignment Search
Tool (BLAST) is used.
Follow the instructions below to use the BLAST tool to identify the unknown FASTA sequences
with the GenBank database.
1) Copy (or download) the FASTA sequence
2) Go to the BLAST tool located on the NCBI website.
(http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearc&
LINK_LOC=blasthome)
3) For each sequence. Paste sequence (or upload) FASTA file into the website as the “Query
Sequence”
4) Select the BLAST button on the bottom of the page
Once the search is complete the results will show. The header repeats the search conditions.
The Graphic Summary displays all the results with how well that matched to the query sequence,
the length as well as the color indicate where and how the sequences align. Following the
Graphic Summary is the entries in GenBank that matched the unknown query sequence listed in
the best to least match. The columns in the table help organize the match sequences – key
columns are Description, Ident (abrv. for Identity or the percent match to the query sequence),
and accession number (how sequences are cataloged in GenBank).
5) For each sequence results, scroll to the table of GenBank matches. Read through the top
10-20 matches.
a. What is the top identity of the bacteria that is best matched with the unknown
sequence? (give accession number – the last column listed in the table)
6) Google the bacteria to find the most common environmental requirements.
a. Which bacterium is more human or environmentally associated?
>FASTAsequence1-UnknownBacteria-1
NNAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCTTAACACATGCAAGTCGAGCGG
TAGCACANGGGAGCTTGCTCCCTGGGTGACGAGCGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGA
TGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTT
CGGGCCTCTTGCCATCAGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAATGGCTCACCTAGGCGA
CGATCCCTAGCTNGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGA
GGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCNGNGTGTGTGAAGAAGGCC
TTCGGGTTGTAAAGCACTTTCAGCGAGGAGGAAGGTGGTGAGCTTAATACGCTCATCAATTGACGTTACT
CGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGNGTGCAAGCGTTAATCGGA
ATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGN
ACTGCATTTGAAACTGGCAAGCTAGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGC
GTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCNNAAG
CGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATGTCGATTTGGAGGTTGTGC
CCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAATCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAA
CTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAAC
CTTACCTACTCTTGACATCCAGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCTGAGACAGGT
GCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATC
CTTTGTTGCNAGCNNTTCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGAT
GACGTCAAGTCATCATGGCCCTTACGAGTAGGGCTACACACGTGCTACAATGGCATATACAAAGAGAAGC
GACCTCGCGAGAGCAAGCGGACCTCATAAAGTATGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCA
TGAAGTCGGAATCGCTAGTAATCGTAGATCAGAATGCTACGGTGAATACGTTCCCGGGCCTTGTACACAC
CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCAC
TTTGTGATTCATGACNGGGGNNNNNNNGTAACAAGGTAACCGNNNNNGAACCTGNNNNNNGATCACCTCC
TTA
>FASTAsequence2-UnknownBacteria-2
GATCCTGGCTCAGGACGAACGCTGACGGCGTGCTTAACACATGCAAGTCGAACGCTGAAGCCTGGCTTTG
TGTTGGGTGGATGAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTCTTCTTCGGGATAACGGT
CTGAAAGGGCTGCTAATACCGGGTATTCACTGGTCCTCGCATGGGGGTTGGTGGAAAGGTTTTTTCTGGT
GGGGGATGGGCTCGCGGCCTATCAGCTTGTTGGTGGGGTGATGGCCTACCAAGGCTTTGACGGGTAGCCG
GCCTGAGAGGGTGACCGGTCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGG
AATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGGGATGGAGGCCTTCGGGTTGTGAA
CCTCTTTCGCCCGTGGTCAAGCCGCAACTGTGGGTTGTGGTGAGGGTAGTGGGTAAAGAAGCGCCGGCTA
ACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCGAGCGTTGTCCGGAATTATTGGGCGTAAAGGGCT
TGTAGGCGGCTGGTCGCGTCTGCCGTGAAAATCCTCTGGCTCAACTGGGGGCGTGCGGTGGGTACGGGCT
GGCTTGAGTGCGGTAGGGGAGGCTGGAACTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAAGAACA
CCGGTGGCGAAGGCGGGTCTCTGGGCCGTTACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGAT
TAGATACCCTGGTAGTCCACGCTGTAAACGTTGGGCACTAGGTGTGGGGGCCACCCGTGGTTTCCGCGCC
GTAGCTAACGCTTTAAGTGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGG
GGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACAT
GCGCCCCGGGCGCGCGGAGACGCGCGCGCATTTGGTTGGGGGTGTGCAGGTGGTGCATGGTTGTCGTCAG
CTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGCCCTATGTTGCCAGCGCGTTA
TGGCGGGGACTCGTGGGGGACTGCCGGGGTTAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGC
CCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCTGGTACAGAGGGTTGCGATGCCGTGAGGCGGGGC
GAATCCCTTAAAGCCGGTCTCAGTTCGGATTGGGGTCTGCAACTCGACCCCATGAAGGTGGAGTCGCTAG
TAATCGCAGATCAGCAACGCTGCGGTGAATACGTCCTCGGGCCTTGTACACACCGCCCGTCACGTCACGA
AAGTTGGTAACGCCCGAAGCCC
Related documents