Bioinformatics Biocomputing and Perl An Introduction to Bioinformatics Computing Skills and Practice Michael Moorhouse Post-Doctoral Worker from Erasmus MC, The Netherlands Paul Barry Department of Computing and Networking, Institute of Technology, Carlow, Ireland Bioinformatics Biocomputing and Perl Bioinformatics Biocomputing and Perl An Introduction to Bioinformatics Computing Skills and Practice Michael Moorhouse Post-Doctoral Worker from Erasmus MC, The Netherlands Paul Barry Department of Computing and Networking, Institute of Technology, Carlow, Ireland Copyright2004 JohnWiley&SonsLtd,TheAtrium,SouthernGate,Chichester, WestSussexPO198SQ,England Telephone(+44)1243779777 Email(forordersandcustomerserviceenquiries):[email protected] VisitourHomePageonwww.wileyeurope.comorwww.wiley.com AllRightsReserved.Nopartofthispublicationmaybereproduced,storedinaretrievalsystem ortransmittedinanyformorbyanymeans,electronic,mechanical,photocopying,recording, scanningorotherwise,exceptunderthetermsoftheCopyright,DesignsandPatentsAct1988 orunderthetermsofalicenceissuedbytheCopyrightLicensingAgencyLtd,90Tottenham CourtRoad,LondonW1T4LP,UK,withoutthepermissioninwritingofthePublisher.Requests tothePublishershouldbeaddressedtothePermissionsDepartment,JohnWiley&SonsLtd, TheAtrium,SouthernGate,Chichester,WestSussexPO198SQ,England,oremailedto [email protected],orfaxedto(+44)1243770620. Thispublicationisdesignedtoprovideaccurateandauthoritativeinformationinregardtothe subjectmattercovered.ItissoldontheunderstandingthatthePublisherisnotengagedin renderingprofessionalservices.Ifprofessionaladviceorotherexpertassistanceisrequired, theservicesofacompetentprofessionalshouldbesought. OtherWileyEditorialOffices JohnWiley&SonsInc.,111RiverStreet,Hoboken,NJ07030,USA Jossey-Bass,989MarketStreet,SanFrancisco,CA94103-1741,USA Wiley-VCHVerlagGmbH,Boschstr.12,D-69469Weinheim,Germany JohnWiley&SonsAustraliaLtd,33ParkRoad,Milton,Queensland4064,Australia JohnWiley&Sons(Asia)PteLtd,2ClementiLoop#02-01,JinXingDistripark,Singapore129809 JohnWiley&SonsCanadaLtd,22WorcesterRoad,Etobicoke,Ontario,CanadaM9W1L1 Wileyalsopublishesitsbooksinavarietyofelectronicformats.Somecontentthatappears inprintmaynotbeavailableinelectronicbooks. BritishLibraryCataloguinginPublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary ISBN0-470-85331-X Typesetin9.5/12.5ptLucidaBrightbyLaserwordsPrivateLimited,Chennai,India PrintedandboundinGreatBritainbyAntonyRoweLtd,Chippenham,Wiltshire Thisbookisprintedonacid-freepaperresponsiblymanufacturedfromsustainableforestry inwhichatleasttwotreesareplantedforeachoneusedforpaperproduction. For my parents, who taught me the value of knowledge – MJM For three great kids: Joseph, Aaron and Aideen – PJB Contents Preface xv 1 Settingthe Biological Scene 1 1.1 IntroducingBiologicalSequenceAnalysis 1 1.2 ProteinandPolypeptides 4 1.3 GeneralisedModelsandtheirUse 5 1.4 TheCentralDogmaofMolecularBiology 6 1.4.1 Transcription 6 1.4.2 Translation 7 1.5 GenomeSequencing 10 1.5.1 Sequenceassembly 11 1.6 TheExampleDNA-gene-proteinsystemwewilluse 12 WheretofromHere 13 2 Settingthe Technological Scene 15 2.1 TheLayersofTechnology 15 2.1.1 Frompassiveusertoactivedeveloper 16 2.2 Findingperl 17 2.2.1 Checkingforperl 17 WheretofromHere 18 I Working with Perl 19 3 TheBasics 21 3.1 Let’sGetStarted! 21 3.1.1 RunningPerlprograms 22 3.1.2 Syntaxandsemantics 23 3.1.3 Program:runthyself! 25 3.2 Iteration 26 3.2.1 UsingthePerlwhileconstruct 26 3.3 MoreIterations 30 3.3.1 Introducingvariablecontainers 31 3.3.2 Variablecontainersandloops 32