proteome: ตุลาคม 2008

วันอาทิตย์ที่ 26 ตุลาคม พ.ศ. 2551

แหล่งที่มาของข้อมูล assigment 1

เนื้อหาส่วนใหญ่ได้มาจาก Dr. Marc Wilkins Marc Wilkins (Australia)
สามรถเข้าไปดูได้โดยตรงที่
http://210.212.212.8/s-star/schedule.html เป็น lecture ที่ 11

assignment 2 phylogenetic tree

ASSIGNMENT 2
Phylogenetic tree

Who are the ancestors of the dinosaurs?

Science 1994 Nov 18;266(5188):1229-1232

Compare a dinosaur's sequence of mitochondrial cytochrome b with other 12 species, Human, Dog, Rabbit, Rhinoceros, Dugong, Mouse, Whale, Bovine, Sicklebill, Chicken, Magpie, and Frog. (ClustalX)

* Create a phylogenetic tree to analyze their relationship. (Phylip3.5c)A sequence of dinosaur's mitochondrial cytochrome b:มี

cccttctattattcattctcattctattcgttattcttgtactccacacatccaaacaacaaagcataatattccacccattgagtccattccta
tcctgattcttagtccccgaaccttttacactcacatg

มีขั้นตอนการทำ ดังนี้

1. หาลำดับเบส(sequence)ที่เป็น mitochondrial cytochrome b ของสิ่งมีชีวิตทั้ง 12 ชนิด จาก NCBI และบันทึกไว้เป็น FASTA format ไว้ใน txt file

2.ทำ Sequence alignment ด้วยโปรแกรม ClustalX จะได้ output file ที่ชื่อ dnd file และ phylip file

3. เปลี่ยนชื่อ phy file ให้เป็น infile เพื่อใช้ในโปรแกรม phylip 3.5c

4. เปิดโปรแกรมที่ชื่อ seqboot.exe เพื่อหาไฟล์ที่ชื่อ infile จากนั้นโปรแกรม seqboot จะทำหน้าที่สร้างชุดซ้ำของข้อมูลหรือ bootstrapping เพื่อนำไปใช้ในการสร้าง phylogenetic tree

5.หลังจากที่โปรแกรม seqboot ทำการสร้างชุดซ้ำของข้อมูลจะได้ output เป็น outfile

6.ภายหลังจากที่ได้ outfile ให้ทำการเปลี่ยนชื่อ infile ที่เรานำเข้าตอนแรกเพื่อที่จะได้เปลี่ยน outfile เป็น infile ใหม่ (เพราะ outfile ที่เปลี่ยนเป็น infile ใหม่นี้จะมีขนาดเพิ่มขึ้นจากขบวนการ bootstrapping )

7.เลือก squboot แล้วเปลี่ยนค่าก่อนการวิเคราะห์ดังนี้

D=Molecular sequence

O=ANSI

R=1000

Random seed =111

ตอบ Yes จะได้ file ชื่อ outfile ออกมา นำ file ชื่อ infile เก่าเปลี่ยนชื่อแล้ว สร้าง folder ใหม่ 1 folder เพื่อเก็บ file ที่ใช้แล้วไม่ให้มีจำนวน file ใน folder exe มากเกินไป เปลี่ยนชื่อ file outfile เป็น infile เพื่อการวิเคราะห์ในขั้นตอนต่อไปและทำการเปลี่ยนชื่อ file ในรูปแบบนี้ตลอด

8.จากนั้นเปิดโปรแกรม dnadist.exe และตั้งค่าดังนี้

8.1) เปลี่ยนค่า Distance จาก F84 เป็น Kimura 2-parameter โดยการกด D แล้ว Enter

8.2) เปลี่ยนค่า Form matrix จาก Square เป็น Lower-triangular โดนกด L แล้ว Enter

8.3) กำหนด multiple data sets โดยกด M แล้ว Enter จากนั้นโปรแกรมจะถามต้องการใช้ข้อมูลเป็น multiple data sets หรือ weight ให้กด D แล้ว Enter เพื่อเลือก multiple data sets จากนั้นโปรแกรมจะถามจำนวน data sets ในตัวอย่างนี้ซึ่งในขั้นตอนที่ 8 เราเลือกเท่ากับ 1000 จากนั้นกด Enter
8.4) จากนั้น Y แล้ว Enter เพื่อให้โปรแกรมเริ่มทำงาน ซึ่งจะได้ผลลัพธ์เป็น outfile อีกครั้ง

9. เปลี่ยน outfile เป็น infile แล้วคลิก Neighbor เพื่อเข้าสู่การวิเคราะห์การเชื่อมโยงความคล้ายความต่างของลำดับ DNA ที่ได้จากการวิเคราะห์ genetic distance โดยเปลี่ยนค่าก่อนการวิเคราะห์ดังนี้

L =Yes , M = select type D Data set =100, random seed = 111

เมื่อวิเคราะห์เสร็จแล้วจะได้ file 2 file คือ outfile และ outtree แล้ว เปลี่ยน file ชื่อ outfile เป็น infile และ เปลี่ยน outtree เป็น intree

10. คลิกเลือก consense เพื่อการสร้าง Phylogeny Tree เปลี่ยนแปลงค่าดังนี้
T = ANSI

จะได้ file ออกมา 2 file คือ infile และ outtree

11.นำ file outtree คัดลอกไปเปิดใน Treeview program

เมื่อคัดลอกไปแล้วหน้าจอของ Treeview Program เปิดขึ้นแสดงภาพ Phylogeny tree

assignment 3 haploview

Assigment 3

Here is the part of chromosome X, SNP genotyping result from Hapmap project

1. How many haplotype blocks in this region of Chromosome X?
2. Could you find out the tagging SNP in each haplotype block?

step by step

1. เปิดโปรแกรม Haploview 4.0 ซึ่งสามารถ download โปรแกรมดังกล่าวได้จาก http://www.broad.mit.edu/mpg/haploview/download.php โดยโปรแกรมดังกล่าวต้องการโปรแกรม java ในการทำงานควบคู่ไปด้วย

2. เมื่อเปิดโปรแกรม Haploview จะพบหน้าต่าง Welcome to HaploView

3. ที่หน้าต่าง Welcome to HaploView เลือกที่คำสั่ง HapMap Download
3.1 ตั้งค่า Release เป็น 21
3.2 เลือก Chromosome ที่ต้องการศึกษา ซึ่งในที่นี้คือ Chromosome X
3.3 ตั้งค่า Analysis Panel เป็นการเลือกกลุ่มประชากรที่ต้องการศึกษา ซึ่งมีรายละเอียดดังนี้YRI: Yoruba in Ibadan, NigeriaJPT: Japanese in Tokyo, Japan,CHB: Han Chinese in Beijing, China,CEU: CEPH (Utah residents with ancestry from northern and western Europe)
3.4 เลือกกลุ่มประชากรที่ต้องการศึกษา ซึ่งในที่นี้เลือก CHB+JPT จากนั้นกดเลือก ok เพื่อเริ่มต้นการทำงาน

4. หลังจากที่โปรแกรมทำงานจะปรากฏหน้าต่าง Check Markers

5. จากนั้นเลือกที่หน้าต่าง Haplotypes หรือที่หน้าต่าง LD Plot ซึ่งจะแสดงผลลัพธ์ดังภาพ

1. How many haplotype blocks in this region of chromosome X?

Answer: มีทั้งหมด 3 block ซึ่ง marker มีดังนั้ ID 8 -9 (rs90805 - rs979848), ID 13 -17 (rs1548474 - rs1877923) และ ID 24 -29 (rs1795683 - rs1634656)

2. Could you find out the tagging SNP in each haplotype block?

ภายหลังจากได้ผลลัพธ์ในหน้าต่าง Haplotypes เพื่อแสดง Tagging SNPในแต่ละ haplotype block โดยเลือกคำสั่ง Display แล้วกดเลือก Show tags in blocks จะปรากฏดังภาพ

Answer : tagging SNPs สำหรับ haplotype block 1 มี GAG และ ATT ของ markers 8 และ 9, haplotype block 2 มี GTT และ TTG ของ markers 13 และ 15, haplotype block 3 มี CCG และ GAG ของ markers 24 และ 27.

คณะผู้จัดทำ

1. น.ส. ศิริรัตน์ เฉลยสรรพ

2. นาย พรชัย หมีแหม่ว

3. น.ส.วชิรญาย์ อธิมัง

วันเสาร์ที่ 25 ตุลาคม พ.ศ. 2551

assignment 1 proteome

ASSIGNMENT 1

Genomics is the study of an organism's entire genome. The field includes intensive efforts to determine the entire D NA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis, epistasis, pleiotropy and other interactions between loci and alleles within the genome. In contrast, the investigation of single genes, their functions and roles "the term "genomics" encompasses a broader scope of scientific inquiry and associated technologies than when genomics was initially considered. A genome is the sum total of all an individual organism's genes. Thus, genomics is the study of all the genes of a cell, or tissue, at the DNA (genotype), mRNA (transcriptome), or protein (proteome) levels. A major branch of genomics is still concerned with sequencing the genomes of various organisms, but the knowledge of full genomes has created the possibility for the field of functional genomics, mainly concerned with patterns of gene expression during various conditions. The most important tools here are microarrays and bioinformatics. Study of the full set of proteins in a cell type or tissue, and the changes during various conditions, is called proteomics. However, a genome sequence does not tell you how an organism works. Also, a study of proteins, the functional molecules, is essential.Genomics are studies of simultaneous investigation of all the genes expressed at any given time in the cell or tissue of interest, not just the traditional gene-by-gene approach.In order to study the expression of all genes in a cell or tissue sample, it is necessary to identify the mRNAs present. Although there are about 26,000 human genes, the proteome is larger than the genome, in the sense that there are more proteins than genes. This is due to alternative splicing of genes and post-translational modifications like glycosylation or phosphorylation

Post-translation Modifications
- Post-translation Modifications is the chemical modification of a protein after its translation
- Translation is the process of synthesizing the peptide chain of amino acids specified by the nucleotide sequence on the mRNA

Type of Post-translation Modifications
Several type of PTMs characterized:
-Proteolytic cleavage
-Glycosylation
-Methylation
-Phosphorylation

Genomics Vs Proteomics

Organism have the same genome but they have different proteome

The figure shows difference stage stages of the life cycle in the insect. With the same genes , they have different morphologies in various stages of life cycle. The morphological differences accompany differential cellular and organelle functions are directly determined by the variety of their proteome,not by the genome that can be modified by various proteomics.

Defining the proteome
The term “proteome” was first coined in 1994 by an Australian postdoctoral fellow named
Marc Wilkins. In his definition, the proteome refers to the total set of proteins expressed in a
given cell at a given time, the study of which is termed “proteomics.” Wilkins’s word soon
took on a life of its own, encompassing everything from protein characterization techniques,
such as mass spectrometry and two-dimensional gel electrophoresis, to anything remotely
related to the quantitative measurement of protein expression. To make matters worse, a
diverse range of biotechnology companies have adopted the word in the hope of attracting
some of the field’s current cachet with investors. As Chris Ashton, Marketing Director for
Oxford Glycosciences (Oxford, UK) points out, “Any lab with any kind of two-dimensional
gels and mass spec calls themselves a proteomics company.” And if this wasn’t confusing
enough, the area itself has propagated an entire taxonomy of its own, with a rapid

Proteomics overview

Proteomics involves the systematic study of proteins in order to provide a comprehensive view of the structure, function and regulation of biological systems. Advances in instrumentation and methodologies have fueled an expansion of the scope of biological studies from simple biochemical analysis of single proteins to measurements of complex protein mixtures. Proteomics is rapidly becoming an essential component of biological research. Coupled with advances in bioinformatics, this approach to comprehensively describing biological systems will undoubtedly have a major impact on our understanding of the phenotypes of both normal and diseased cells.• Initially, proteomics focused on the generation of protein maps using two-dimensional polyacrylamide gel electrophoresis. The field has since expanded to include not only protein expression profiling, but the analysis of post-translational modifications and protein-protein interactions. Protein expression, or the quantitative measurement of the global levels of proteins, may still be done with two-dimensional gels, however, mass spectrometry has been incorporated to increase sensitivity, specificity and to provide results in a high-throughput format. A variety of platforms are available to conduct protein expression studies and this site provides links to these resources.• The study of protein-protein interactions has been revolutionized by the development of protein microarrays. Analagous to DNA microarrays, these biochips are printed with antibodies or proteins and probed with a complex protein mixture. The intenisty or indentity of the resulting protein-protein interactions may be detected by fluorescence imaging or mass spectrometry. Other protein capture methods may be used in place of arrays, including the yeast two-hybrid system or the isolation of proteins/protein complexes by affinity chromatography or other separation techniques.

Proteomics analysis can be classified as three main categories
- expression proteomics
- bioinformatics analysis
- Functional proteomics

Proteomics aims to :
- Separate, identify and characterize proteins on large-scale
- Define levels of proteins in cells / tissues and how these change
- Investigate protein complexes
- Elucidate protein function, pathway , and interrelationships

Laboratories are divided into 3 stations where participants receive hands-on experience in:

Station I: Sample Preparation: protein digestion and sample
cleanup, phosphopeptide enrichment, peptide labeling , 2-dimension electrophoresis

Station II: Instrumentation: includes high performance liquid
chromatography (HPLC) and 3 mass spectrometers: ion trap,
quadrupole time-of-flight, and electrospray time-of-flight.
Students use multiple instruments to analyze their own
peptide digests, generated during the sample preparation
laboratories.

Station III: Practical Proteomics: introduction to database
searching, off-line fractionation techniques, and spectral
Interpretation.

Database Searching: The proteomics module culminates
with a 1/2 day database searching laboratory where students
analyze data they acquired on the mass spectrometers during
the course. Includes: basic MS and MS/MS searching using
MASCOT and Spectrum Mill, and spectral review and validation.

Two-dimensional gel electrophoresis

Abbreviated as 2-DE electrophoresis, is a form of gel electrophoresis commonly used to analyze proteins. Mixtures of proteins are separated by two properties in two dimensions on 2D gels.The two dimensions that proteins are separated into using this technique can be isoelectric point, protein complex mass in the native state, and protein mass.
In the first dimention , the pretien are separated by isoelectric point, which resolves proteins on the basis of charge.
In the second dimension, proteins are separated by molecular weight using SDS-PAGE.To separate the proteins by isoelectric point is called isoelectric focusing (IEF). Thereby, a gradient of pH is applied to a gel and an electric potential is applied across the gel, making one end more positive than the other. At all pHs other than their isoelectric point, proteins will be charged. If they are positively charged, they will be pulled towards the more negative end of the gel and if they are negatively charged they will be pulled to the more positive end of the gel. The proteins applied in the first dimension will move along the gel and will accumulate at their isoelectric point; that is, the point at which the overall charge on the protein is 0 (a neutral charge).

Identification of eluted protein spots( automated spots cutting)

Mass spectrometry

- MALDI/TOF-MS
-ESI/Quadrupoles

compared between normal and abnormal tissues

Mass spectrometry

Mass spectrometry is an analytical technique that identifies the chemical composition of a compound or sample based on the mass-to-charge ratio of charged particles.[1] A sample undergoes chemical fragmentation forming charged particles (ions). The ratio of charge to mass of the particles is calculated by passing them through electric and magnetic fields in a mass spectrometer.
The design of a mass spectrometer has three essential modules: an ion source, which transforms the molecules in a sample into ionized fragments; a mass analyzer, which sorts the ions by their masses by applying electric and magnetic fields; and a detector, which measures the value of some indicator quantity and thus provides data for calculating the abundances of each ion fragment present. The technique has both qualitative and quantitative uses, such as identifying unknown compounds, determining the isotopic composition of elements in a compound, determining the structure of a compound by observing its fragmentation, quantifying the amount of a compound in a sample, studying the fundamentals of gas phase ion chemistry (the chemistry of ions and neutrals in a vacuum), and determining other physical, chemical, or biological properties of compounds.

Main steps of measuring with a mass spectrometer

Protein Sequence database

In the field of bioinformatics, a sequence database is a large collection of DNA, protein, or other sequences stored on a computer. A database can include sequences from only one organism, as in databases including all the proteins in Saccharomyces cerevisiae, or it can include sequences from all organisms whose DNA has been sequenced

UniProt is the universal protein resource, a central repository of protein data created by combining Swiss-Prot, TrEMBL and PIR. This makes it the world's most comprehensive resource on protein information.
SWISS-2DPAGE contains data on proteins identified on various 2-D PAGE and SDS-PAGE reference maps. We can locate these proteins on the 2-D PAGE maps or display the region of a 2-D PAGE map where one might expect to find a protein from UniProtKB/Swiss-Prot

Discovery of Differentially Expressed Proteins

Differential display can reveal:
- changes in protein expression
- altered processing or modification of proteins
These changes may be responsible for a phenotype. Co-regulation or co-modification of protein can provide clues to pathways and function
Why Spend Millions on Proteomics
Proteomics will :
- define the proteome of a cell or tissue
- provide means of comparing proteomes to explain phenotype (e.g. disease vs normal stages )
- provide clue to protein function by defining co-stimurated and co-regurated proteins
- be powerful in combination with other technologies such as two-hybrid functional assay and gene knocotus

proteinomics will not :
-replace genome sequencing
-be as easy as genome sequencing

proteome