ebook img

MHC Protocols PDF

327 Pages·2003·3.199 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview MHC Protocols

MMeetthhooddss iinn MMoolleeccuullaarr BBiioollooggyy TTMM VOLUME 210 MMHHCC PPrroottooccoollss EEddiitteedd bbyy SStteepphheenn HH.. PPoowwiiss RRoobbeerrtt WW.. VVaauugghhaann HHUUMMAANNAA PPRREESSSS 医网琴声 www.dnathink.org September 2002 (ISBN 1-59259-291-0) MHC Protocols 1. HLA Informatics: Accessing HLA Sequences from Sequence Databases (288 Robinson, James; Marsh, Steven G. E. KB) pp. 03-22 2. Accessing HLA Sequencing Data Through the 6ace Database Horton, Roger; Beck, Stephan (592 pp. 23-42 KB) 3. HLA Typing by Restriction Fragment Length Polymorphism Analysis (172 Vaughan, Robet W. KB) pp. 45-60 4. PCR-Restriction Fragment Length Polymorphism Typing of Class I and II Alleles Vaughan, Robert W. pp. 61-66 5. PCR-Sequence-Specific Oligonucleotide Probe Typing for HLA-A, -B, and -DR (880 Middleton, Derek; Williams, F. KB) pp. 67-112 6. HLA-DPA1 and -DPB1 Typing Using the PCR and Nonradioactive Sequence-Specific Oligonucleotide Probes Steiner, Lori L.; Moonsamy, Priscilla V.; Bugawan, (456 Teodorica L.; Begovich, Ann B. KB) pp. 113-142 7. PCR-Sequence-Specific Primer Typing of HLA Class I and Class II Alleles (536 Bunce, Mike KB) pp. 143-172 8. HLA Typing With Reference Strand-Mediated Conformation Analysis Argüello, J. Rafael; Pérez-Rodríguez, Martha; Pay, Andrea; (220 Fisher, Gaby; McWhinnie, Alasdair; Madrigal, J. Alejandro KB) pp. 173-190 9. Sequencing Protocols for Detection of HLA Class I Polymorphism (500 Dunn, Paul P. J.; Cox, Steven T.; Little, Ann-Margaret KB) pp. 191-222 10. HLA-E and HLA-G Typing Martinez-Laso, Jorge; Gomez-Casado, Eduardo; Arnaiz- (156 Villena, Antonio KB) pp. 223-236 11. Typing Alleles of HLA-DM Teisserenc, Hélène (144 pp. 237-246 KB) 12. Typing Alleles of TAP1 and TAP2 Powis, Stephen H. (148 pp. 249-258 KB) 13. Determining Alleles of the C2 Gene by Southern Blotting Zhu, Zeng-Bian; Volanakis, John E. (148 pp. 259-268 KB) 14. Complement C4 Protein and DNA Typing Methods Schneider, Peter M.; Mauff, Gottfried (292 pp. 269-296 KB) 15. Typing of Tumor Necrosis Factor Alleles Wilson, Anthony Gerard (148 pp. 297-304 KB) 16. Molecular Typing of the MHC Class I Chain-Related Gene Locus Collins, R. W. M.; Stephens, Henry A. F.; Vaughan, Robert (216 W. KB) pp. 305-322 17. HLA Microsatellite Analysis Carrington, Mary (152 pp. 325-332 KB) HLA Informatics 3 1 HLA Informatics Accessing HLA Sequences from Sequence Databases James Robinson and Steven G. E. Marsh 1. Introduction Scientists working in the human leukocyte antigen (HLA) field have access to a number of different informatics resources for the analysis and interpretation of HLA sequences. Recent advances in bioinformatics have resulted in an increase in the number of tools and facilities available for sequence analysis. Researchers can now utilize these tools to analyze the genomic information within their own field; in addition, the HLA field also has a number of groups developing software specifically for this area. Within this chapter, we will be discussing both the more general tools and the specialist HLA informatics tools available. There are a number of different sequence databases available and the type of database can influence the information retrieved. Data- bases range from the large international data repositories like the European Molecular Biology Laboratory Nucleotide Sequence Database (EMBL), GenBank®, and the DNA Databank of Japan (DDBJ) through to specialist systems like the ImMunoGeneTics project IMGT/HLA Database. It is therefore imperative that the user understands the merits and flaws of the various systems. This From: Methods in Molecular Biology, vol. 210: MHC Protocols Edited by: S. H. Powis and Robert W. Vaughan © Humana Press Inc., Totowa, NJ 3 4 Robinson and Marsh chapter will provide information on the various options available to the user, to permit an informed decision when selecting a database. The main databases discussed in this article are the large public sequence databases, protein structure databases and databases spe- cializing in the human major histocompatibility complex (MHC). All of the databases mentioned in this chapter are accessed via the Internet. The advent of the Internet has revolutionized the dis- semination of sequence information, allowing scientists from all over the world to log on and access the many databases that relate to gene sequences using the World Wide Web (WWW). The uniform resource locators (URLs) of all the databases discussed will be in- cluded in Section 6. of this chapter. 2. Generalist Nucleotide Sequence Databases 2.1. EMBL/GenBank/DDBJ The generalist databanks are not HLA-specific, but rather large international data repositories for all organisms. The three main gen- eral nucleotide sequence databases are the EMBL (1), GenBank (2), and the DDBJ (3). These three databases form an international col- laboration and exchange sequences daily, so that each contains iden- tical data. Most published sequences can be found in these databases. The retrieval of HLA sequences from these systems is possible through a number of tools. The EMBL database will be used as the model system for all examples in this chapter. A number of different methods can be used to retrieve HLA sequences. A conceptually simple approach would be to copy the entire database to the users system and then to remove all nonrel- evant entries leaving a smaller HLA-related database. However, this method would actually take longer than most search tools, as only a small subset (around 0.001%) of the database represents HLA data. A more pragmatic approach is to use the search tools provided, although these may be time-consuming if large numbers of sequences are being retrieved. Search tools allow you to perform complex queries and download the retrieved entries in a specified HLA Informatics 5 format. The search engines for EMBL are maintained by the Euro- pean Bioinformatics Institute (EBI). The search engines are also mirrored at other sites on the Internet, allowing the user to access a local server. The EMBL database is very large database containing over 17 million nucleotide sequences, and to this are linked other databases like SWISS-PROT and TREMBL (translations of EMBL) (4). All these databases are available and can be searched at the EBI Web site. In this chapter, we will only discuss the retrieval of nucle- otide sequences, although the same methods apply for the retrieval of protein sequences from the appropriate database. The EMBL database receives around 14,000 sequences/12 megabases of sequence per day, and this data is processed and made available through flat files. These files include sequence features, which provide a large amount of additional data allowing for ad- vanced queries. Sequence features are best described as a number of different motifs, references and data particular to the DNA se- quence. The most common sequence features are the source of se- quence (species, cell type, etc.), coding details, exon coordinates, and protein translation. Additional sequence features include pro- moter sequences and their coordinates and complimentary se- quences. There are two main approaches for sequence searching. One is to search on particular keywords or features contained within the sequence documentation, and the other is to search for the actual nucleotide or protein sequence. 2.2. Sequence Similarity Searches Sequence similarity searches look for matches to the actual sequence from larger databases. These matches are based on a num- ber of similarity measures and, in general, retrieve identical or highly similar sequences. The most recognized sequence similarity search tool is the Basic Local Alignment Search Tool (BLAST) algorithm (5). This widely distributed algorithm has been refined over many years (6) and is now the premier sequence similarity search tool. There are number of tools within the BLAST family, BLASTN for nucleotides, BLASTP for proteins, and BLASTX for 6 Robinson and Marsh translated sequences. BLAST works by searching for areas of local similarity between sequences (see Fig. 1). These regions are then linked to form a score for a particular sequence. The higher the score, the more accurate the match. The BLAST algorithm does suffer from a particular problem when using it to identify some HLA sequences. The first general problem with BLAST is that very short sequences cannot be used, the minimum sequence length is 20 bps. The second problem associated with HLA sequences is due to the high similarity between HLA alleles. BLAST can accurately retrieve HLA entries from a database where there are a huge range of sequences, i.e., EMBL. However, when used on a more specialist subset of data, care is needed when analyzing the results. BLAST has its own scoring system and uses this rather than sequence iden- tity. In this system, a 546-bp sequence of approx 95% identity may score higher than a shorter (e.g., 270 bps) sequence of 100% iden- tity. Users should therefore study the large output files carefully, as identical matches may not appear first in the output. 2.3. Sequence Search Tools The EBI provides one of the most advanced flat file search tools, the Sequence Retrieval System (SRS) (7). This tool allows the user to search on any of the sequence features, accession numbers, key- words, or sequence description and is probably the best method of retrieving HLA sequences from a general nucleotide database. The disadvantage of the SRS system is that it does require some famil- iarity with the flat file format (see Fig. 2). Once users are accus- tomed to the way data is presented, they can quickly build up very complex queries. An advantage of the SRS tool is that it can also be Fig. 1. (opposite page) SRS search tool showing flat file output. The figure shows the standard output of a flat file, which is a text-based file that identifies different sections by the line headings, e.g., AC, accession number; DE, description; KW, keywords; DR, cross references to other databases; and SQ, sequence. Lines also begin with O, phylogeny; R, references; and F, features. The output file shown is taken from the EMBL Nucleotide Sequence Database. HLA Informatics 7 8 Robinson and Marsh HLA Informatics 9 used to launch other applications, e.g., BLAST. The SRS tool also allows the users to customize the output of searches, meaning that you can quickly see how relevant entries are to the search criteria. SRS can be found at the EBI Web site and can be used to search a number of different databases. The GenBank search engine, Entrez (8) also works on accession numbers, but provides several advanced options. These include bulk retrievals of entries in a preformatted manner. This is very useful for retrieving sequences, once a list of known accession numbers is available. The search engine for DDBJ is restricted to searching via accession numbers. 2.4. Other Generalist Sequence Databases There are other databases that are not included in the EMBL/ GenBank/DDBJ collaboration. These systems also have their own search engines and retrieval facilities, and similar problems with regard to data integrity and sequence retrieval also apply, which will be discussed later (see Subheading 2.5.). The Genome Sequence DataBase (GSDB) (9) is run by the National Center for Genome Resources, USA. The GSDB database is included here, as there are a number of HLA sequences unique to this database and GSDB does not automatically forward these to the other main databases. The GSDB search engine retrieves entries and sequence features, but only by accession number. This means that an accession number must be known before a sequence can be retrieved. Fig. 2. (opposite page) Example BLAST outputs. Selected sections of a standard BLAST output for a search using the A*01011 sequence against the EMBL database. The first part of the figure shows the sequences and descriptions retrieved by the EMBL accession numbers. The variation in Keywords and Description can be seen here; also note the lack of official nomenclature. Please note inclusion of Pygmy Chim- panzee (Pan paniscus) sequence which is high in the scoring due to their high sequence similarity. The second part of this figure shows how BLAST displays the sequence identities. Identity is shown by a pipe (|) between the bases, and mismatches have no join.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.