Jiří Vondrášek Ústav organické chemie a biochemie AV ČR Bioinformatika podzimní škola výpočetní chemie, Praha 2006
Aplikace: teorie biotechnologie farmacie medicína genetické inženýrství Informatika nad biologickými molekulami (daty). Bioinformatika extrahuje molekulární informační systém pro molekulární biologii. Bioinformatika je konceptualizovaná molekulární biologie (ve smyslu fyzikálně chemickém) na níž je aplikována informatika (odvozená od matematické informatiky a statistiky). bioinformatika
experimentální data počítačová analýza strukturovaná data (databáze), hypotézy sekvence geny kontigy funkce metabolismus (vše) struktura bioinformatika
velikosti genomů Mycoplasma genitalium0.58 Mbp Escherichia coli4.6 Mbp Saccharomyces cerevisiae16 chr.11.2 Mbp Arabidopsis thaliana5 chr Mbp Drosophila melanogaster5 chr.~137.0 Mbp Homo sapiens24 chr.~ 3.3 Gbp
centrální dogma molekulární genetiky reverzní transkripce DNARNAprotein transkripce translace replikace informacefunkce
DNA geny evoluční vztahy mezi geny a organizmy funkce proteiny struktura
sekvence
>jana (4797 nt) GAATTCGCCGCGGGGCTGCGCATCACCGATGCCGCCACCATCGAGATCGTCGAGATGGTACTGGCCGGCTCGATCAACAAGCAGCTCGTCGGCTACATCA ACGAAGCGGGCGGCAAGGCCGTCGGCCTGTGCGGCAAGGACGGCAACATGGTGTCCGCCACCAAGGCGACGCGCACCATGGTCGATCCGGATTCGCGGAT CGAAGAGGTGATCGACCTCGGTTTCGTCGGCGAGCCGGAGAAGGTCGACCTCACCCTGCTCAACCAGCTGATCGGCCACGAGTTGATCCCGGTGCTGGCG CCGCTGGCGACCTCCGCGTCGGGCCAGACCTTCAACGTCAATGCCGACACCTTTGCAGGTGCGGTTGCCGGTGCGCTGCGGGCCAAGCGCCTGCTGCTGC TGACCGACGTGCCGGGCGTGCTCGACCAGAACAAGAAGCTGATCCCCGAACTGTCGATCAAGGATGCCCGCAAGCTGATCGCAGACGGCACCATCTCGGG CGGCATGATCCCCAAGGTCGAGACCTGCATCTACGCGCTCGAACAGGGCGTCGAAGGCGTCGTCATCCTCGACGGCAAGGTCCCGCACGCAGTGCTGCTC GAATTGTTCACCAACCAGGGCACCGGCACGCTGATCCACAAGTGATGCGAGGCTGCGGCGACAACATCCGTCATGGCCGGGCTCGTCCCGGCCATCCACG TCTTTCCGGCGGTTTTCTCAGCAAGACGTGGATGCCCGGCACAAGGCCGGGCATGACGGGGTGGAGATCGCGCGCCCTCGCCGCCATTGTCACCACCCTC GCCCTCACCTCCGCCGCCCACGCCGACCTCAAGCTCTGCAACCGCATGAGCTACGTGGTCGAGACGGCGATCGGGGTCGATTCCAACGGCACCACCGCCT CGCGCGGATGGCTGCGGATTGATCCGGCGCAATGCCGGGTCGTGGTGCAAGGCGCGCTCAACGCCGACCGCATCATGCTGAATGCCCGCGCGCTGGCGGT GTACGGCGTCTCGCCGCTGCCGCAGAACGGCACTGACCGGCTGTGCATTGCCGAAGACAATTTCGTCATCGCCGCCGCGCGGCAATGCCGCGGCGGCCAA ACGCTCGCCGCCTTCACCGAGATCAAGCCCACCGACACCGAGGACGGCAACAAGATCGCTTATCTGGCGGAAGACTCCGGCTACGACGACGAACAGGCCA AACTCGCCGCGATCCAGCGGCTGCTGGTGATCGCCGGTTACGACGCCTCGCCGATCGACGGCGTCGACGGCCCGAAGACGCAGGCCGCGCTGTCCGCCTT CCTCAAGAGCCGAGGCCTGAAGCCCGAGATCGTCGATGCGCCGGATTTCTTCGACGTGATGATCAAGGCAGTGCAGCAGCCGTCCGGCAGCGGGCTGACC TGGTGCAACGACACCAAGTACAAGATCATGGCGGCCGTCGGCGAAGACGACGGCAAGACTGTCACCAGCCGCGGCTGGTACGGTGTTGCGCCCGGCCAAT GCCTGCGCCCCGACCTCGGCGCACAGCCGAAGCGGGTGTTCAGCTTCGCCGAAGCGGTCGACGGCAGCGGCAGGCCGGTGACCATCAAGGGCCGTGCGCT GAACTGGGGCGGCGGCGTGACGCTGTGCACGCGTGACAGCAAGTTCGAGATCGGCGAGCAAGGCGATTGCGCGGCGCGCGGCCTCGCCGCCACCGGCTTC GCCGCCGTCGATCTCAGTAGCGGCAAGACATTGAGGTTGTCCGCCCCATGATGCAGCTCGGCAAACGCGGCTTCGATCACGTCGAGACCTGGGTGTTCGA TCTCGACAACACGCTGTACCCGCATCACCTCAACCTATGGCAGCAGGTCGATGCGCGGATCCGCGACTTCGTCGCCGACTGGCTGAAGGTTTCGCCGGAA GAAGCCTTCCGTATCCAGAAGGATTACTACAAGCGCTACGGCACCACGATGCGCGGGATGATGACCGAGCACGGCGTTCACGCCGACGACTACCTGGCTT ATGTCCACGCCATCGACCATTCGCCGCTGCAGCCGAATCCGGCGATGGGCGATGCGATCGAGCGACTGCCGGGCCGCAAGCTGATCCTGACCAACGGCTC GACCGCCCATGCGGGCAAGGTGCTGGAGCGGCTCGGCATCGGCCATCATTTCGAGGCGGTGTTCGACATCATTGCGGCCGACCTCGAGCCGAAGCCGGCG CCGCAGACCTACCGCCGTTTTCTCGATCGCCATGGTGTCGACCCGGCCCGCGCCGCGATGTTCGAAGACCTCGCCCGCAACCTCACCGTGCCGCACCAGC TCGGCATGACCACCGTGCTGGTGGTGCCTGACGATAGCCAGGACGTGGTCCGCGAAGATTGGGAGCTTGAAGGCCGCGACGCCGCCCACGTCGATCACGT GACTGATGATTTGACAGGGTTCTTGGGGAAGCTGAGTTCGCTGTAGGCCGGGGACGCCTCCCAAGCGTCAATCGTCATCGCCGCCGGATGCAAGGCGGCT AGGTATTGCGGAGCGCTCGCGATCTTCCGTCCAATGCCCTGGGATACTGGATCGCCCGGACGAGCCGGGCGACGACGTTGAAGAGAGATGACGTGGCGTC ACCACATCCCCCGCCGTCATCGCCCGCGCAGGCGGGCGATGACTTGGCGGACGGGGCGGCGCCTTGACTCCGACCCGGCGAATCCGGACAACACTCCGCA AAACTCTCCCTGAAATCAGCCTCCCAAGGACCCGTCGATGCCGCTCACCGCCCTGGAATCTACCATCAACGCCGCTTTCGACGCGCGCGACACCGTTACC GCGGCGACGCAGGGCGAGATTCGTCAGGCCGTCGAGGATGCGCTCGATCTGCTCGACCAGGGCAAGGTGCGGGTGGCGCGGCGCGACGACTCCGGCGCCT GGACGGTCAATCAGTGGCTGAAGAAAGCAGTGCTGCTGTCGTTCCGGCTCAACGACATGGGCGTGATCGCCGGCGGCCCGGGCGGCGCCAACTGGTGGGA CAAGGTGCCGTCGAAGTTCGAGGGCTGGGGTGAGAACCGCTTCCGCGAGGCCGGCTTCCGCGCCGTGCCGGGCCGATCGTCGCGCGTCGGCCTTTATCGC CAAGACGCGGTACTGATCCGTCCTTCGTCAATCTCGGCGCTTACGTCGATGAAAGCACCATGGTCGAACACCTGGGCGACCGTCGGCTCCTGCGCCCAGA TCGGCAAGCGCGTGCACATCTCCGGCGGTGCCGGCATCGGCGGCGTGCTCGAGCCGCTGCAGGCCGGCCCGGTGATCATCGAGGACGACTGCTTCATCGG CGCCCGCTCCGAAGTCGCCGAAGGCGTGATCGTGCGCAAGGGTGCGGTGCTGGCGATGGGCGTTTTCCTCGGCGCCTCGACCAAGATCGTCGACCGCGAG ACCGGCGAAATCTTCGTCGGCGAAGTGCCGGAATATGCCGTGCTGGTGCCCGGCACCCTGCCCGGCAAGCCGATGAAGAACGGCGCCCCCGGCCCAGCCA CCGCCTGCGCGGTGATCGTCAAGCGCGTCGACGAGCGCACCCGTTCCAAGACCTCGATCAACGAATTGCTGCGGGACTGACACCTGTAGGAGGCGCGAAT GGACTGGACCACGCTGTTCTTCAGCTTTCGAGGTCGGATCAATCGCGCCAAATACTGGCTGGTCGGACTGATCTACGTCGCCGCCTGGATGG …. sekvence
Co lze v DNA najít? strukturní a organizační elementy evoluční vztahy geny promotory a další řídící elementy „cizí“ DNA všeobecná analýza
geny Jak najít geny?
Leucin Rhodobacter capsulatus antikodónpočet % CUA 3 <1 CUC CUG CUU UUA 0 0 UUG 27 3 Escherichia coli % geny
alignment Jaké proteiny geny kódují?
PSI-BLAST HMMER SSEARCH BLITZ FASTA BLAST Dot plot 1:1 n:n n ClustalW MultAlign 1:n Dot plot SSEARCHftp://ftp.virginia.edu/pub/fasta BLITZ... alignment
PSI-BLAST HMMER SSEARCH BLITZ FASTA BLAST Dot plot 1:1 n:n n ClustalW MultAlign 1:n FASTAhttp:// BLASThttp://ncbi.nlm.nih.gov/blast alignment
PSI-BLAST HMMER SSEARCH BLITZ FASTA BLAST Dot plot 1:1 n:n n ClustalW MultAlign 1:n PSI-BLASThttp://ncbi.nlm.nih.gov HMMER ClustalW MultAlign alignment