Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Bacterial Sequence Identity Activity Studying the microbiome for any environment the data often starts as a set of nucleotide sequences. For bacterial communities, the 16S rRNA gene is the predominate gene used to identify the bacteria species present. Initial community sequencing efforts utilized 16S rRNA gene clone libraries, which has now graduated to next-generation sequencing techniques that are completely independent of culturing bacteria and are more reliable to sequence all bacteria in a sample. Regardless, the same gene is used and the gene sequence is matched to a reference database to provide an identity or the most closely related organism. A standard format for gene sequences is called FASTA. The first line is started with ‘>’ and the description of the sequence, the remaining lines are the sequence typically reported as ‘A, T, C, & G’. Below are two 16S rRNA gene FASTA sequences with the description removed. The goal is to use an online database, GenBank, to identify the bacteria they are from. To take an unknown FASTA sequence and match to GenBank nucleotide database the Basic Local Alignment Search Tool (BLAST) is used. Follow the instructions below to use the BLAST tool to identify the unknown FASTA sequences with the GenBank database. 1) Copy (or download) the FASTA sequence 2) Go to the BLAST tool located on the NCBI website. (http://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearc& LINK_LOC=blasthome) 3) For each sequence. Paste sequence (or upload) FASTA file into the website as the “Query Sequence” 4) Select the BLAST button on the bottom of the page Once the search is complete the results will show. The header repeats the search conditions. The Graphic Summary displays all the results with how well that matched to the query sequence, the length as well as the color indicate where and how the sequences align. Following the Graphic Summary is the entries in GenBank that matched the unknown query sequence listed in the best to least match. The columns in the table help organize the match sequences – key columns are Description, Ident (abrv. for Identity or the percent match to the query sequence), and accession number (how sequences are cataloged in GenBank). 5) For each sequence results, scroll to the table of GenBank matches. Read through the top 10-20 matches. a. What is the top identity of the bacteria that is best matched with the unknown sequence? (give accession number – the last column listed in the table) 6) Google the bacteria to find the most common environmental requirements. a. Which bacterium is more human or environmentally associated? >FASTAsequence1-UnknownBacteria-1 NNAATTGAAGAGTTTGATCATGGCTCAGATTGAACGCTGGCGGCAGGCTTAACACATGCAAGTCGAGCGG TAGCACANGGGAGCTTGCTCCCTGGGTGACGAGCGGCGGACGGGTGAGTAATGTCTGGGAAACTGCCTGA TGGAGGGGGATAACTACTGGAAACGGTAGCTAATACCGCATAACGTCGCAAGACCAAAGAGGGGGACCTT CGGGCCTCTTGCCATCAGATGTGCCCAGATGGGATTAGCTAGTAGGTGGGGTAATGGCTCACCTAGGCGA CGATCCCTAGCTNGTCTGAGAGGATGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGA GGCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCNGNGTGTGTGAAGAAGGCC TTCGGGTTGTAAAGCACTTTCAGCGAGGAGGAAGGTGGTGAGCTTAATACGCTCATCAATTGACGTTACT CGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGNGTGCAAGCGTTAATCGGA ATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGN ACTGCATTTGAAACTGGCAAGCTAGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGC GTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCNNAAG CGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCTGTAAACGATGTCGATTTGGAGGTTGTGC CCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAATCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAA CTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAAC CTTACCTACTCTTGACATCCAGAGAACTTTCCAGAGATGGATTGGTGCCTTCGGGAACTCTGAGACAGGT GCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATC CTTTGTTGCNAGCNNTTCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGAT GACGTCAAGTCATCATGGCCCTTACGAGTAGGGCTACACACGTGCTACAATGGCATATACAAAGAGAAGC GACCTCGCGAGAGCAAGCGGACCTCATAAAGTATGTCGTAGTCCGGATTGGAGTCTGCAACTCGACTCCA TGAAGTCGGAATCGCTAGTAATCGTAGATCAGAATGCTACGGTGAATACGTTCCCGGGCCTTGTACACAC CGCCCGTCACACCATGGGAGTGGGTTGCAAAAGAAGTAGGTAGCTTAACCTTCGGGAGGGCGCTTACCAC TTTGTGATTCATGACNGGGGNNNNNNNGTAACAAGGTAACCGNNNNNGAACCTGNNNNNNGATCACCTCC TTA >FASTAsequence2-UnknownBacteria-2 GATCCTGGCTCAGGACGAACGCTGACGGCGTGCTTAACACATGCAAGTCGAACGCTGAAGCCTGGCTTTG TGTTGGGTGGATGAGTGGCGAACGGGTGAGTAACACGTGAGTAACCTGCCCTCTTCTTCGGGATAACGGT CTGAAAGGGCTGCTAATACCGGGTATTCACTGGTCCTCGCATGGGGGTTGGTGGAAAGGTTTTTTCTGGT GGGGGATGGGCTCGCGGCCTATCAGCTTGTTGGTGGGGTGATGGCCTACCAAGGCTTTGACGGGTAGCCG GCCTGAGAGGGTGACCGGTCACATTGGGACTGAGATACGGCCCAGACTCCTACGGGAGGCAGCAGTGGGG AATATTGCACAATGGGCGAAAGCCTGATGCAGCGACGCCGCGTGAGGGATGGAGGCCTTCGGGTTGTGAA CCTCTTTCGCCCGTGGTCAAGCCGCAACTGTGGGTTGTGGTGAGGGTAGTGGGTAAAGAAGCGCCGGCTA ACTACGTGCCAGCAGCCGCGGTAATACGTAGGGCGCGAGCGTTGTCCGGAATTATTGGGCGTAAAGGGCT TGTAGGCGGCTGGTCGCGTCTGCCGTGAAAATCCTCTGGCTCAACTGGGGGCGTGCGGTGGGTACGGGCT GGCTTGAGTGCGGTAGGGGAGGCTGGAACTCCTGGTGTAGCGGTGGAATGCGCAGATATCAGGAAGAACA CCGGTGGCGAAGGCGGGTCTCTGGGCCGTTACTGACGCTGAGGAGCGAAAGCGTGGGGAGCGAACAGGAT TAGATACCCTGGTAGTCCACGCTGTAAACGTTGGGCACTAGGTGTGGGGGCCACCCGTGGTTTCCGCGCC GTAGCTAACGCTTTAAGTGCCCCGCCTGGGGAGTACGGCCGCAAGGCTAAAACTCAAAGGAATTGACGGG GGCCCGCACAAGCGGCGGAGCATGCGGATTAATTCGATGCAACGCGAAGAACCTTACCAAGGCTTGACAT GCGCCCCGGGCGCGCGGAGACGCGCGCGCATTTGGTTGGGGGTGTGCAGGTGGTGCATGGTTGTCGTCAG CTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGCCCTATGTTGCCAGCGCGTTA TGGCGGGGACTCGTGGGGGACTGCCGGGGTTAACTCGGAGGAAGGTGGGGATGACGTCAAATCATCATGC CCCTTATGTCTTGGGCTTCACGCATGCTACAATGGCTGGTACAGAGGGTTGCGATGCCGTGAGGCGGGGC GAATCCCTTAAAGCCGGTCTCAGTTCGGATTGGGGTCTGCAACTCGACCCCATGAAGGTGGAGTCGCTAG TAATCGCAGATCAGCAACGCTGCGGTGAATACGTCCTCGGGCCTTGTACACACCGCCCGTCACGTCACGA AAGTTGGTAACGCCCGAAGCCC