Instruction Set Alex Aravind February 12, 2009 1 Introduction CPU or processor is the brain of the computer. The question is how hard is to design this compo- nent? Typically, processor design is carried out in a way as shown in Figure 1. New Machine Project Performance Instruction Set Objectives Definition Implementation Fabrication Tuning & & Testing Bug fixing Sales & Usage Figure 1: Processor Design and Implementation Process While designing a new processor, many factors will be considered such as performance, cost, applications, ease of use, e(cid:14)ciency, extensibility, etc. These factors in(cid:13)uence the nature of instruc- tion set, datapath, size of register set, size of bus (For example, 8086 - 16bit, 8088 - 8bit (to make cheaper) but slower, 286 - 16 bit, 386 - 32 bit ...), etc. Next few classes, we will focus on the instruction set and its importance. 2 Instructions to Computer Instructionsarethelanguageweusetospeaktothecomputer. Webasicallyaskthecomputertodo sometask. Wewritehowataskmustbeaccomplishedbythecomputerasasequenceofinstructions and call it as a computer program. The instructions that the machine can understand is very primitive or low level (in terms of electromagnetic waves) and they must be very precise. On the other hand, the instruction that we are comfortable is very high level and often ambiguous. So there is a big gap between users and the computer. In the beginning this was the situation and therefore only experts could talk to the computer. Universities and industries made huge e(cid:11)ort in bridging this gap. Bridging thisgapundoubtedlywasnot aneasy taskinthe beginning. Itwas acomplex process. Oneapproachwhichhelpedtohandlethiscomplexity ispopularlyknownaslayering abstraction and it is shown in (cid:12)gure 2. 1 High Level Langauge Level (High Level Language Programming) Assembly Language Level (Assembly Language Programming) Operating System Level (System and Utility Calls) Instruction Set Level (Machine Language Programming) Microarchitecture Level (Microprogramming) Digital Logic Level (GATES) Figure 2: Six Level Computer In this approach, a computer can be viewed in many levels. The higher layers can be viewed as virtual machines with di(cid:11)ering level of sophistication. A machine can be built in each level or a translator can be designed to translate the program into a lower level. 2.1 Machine Instructions Literally speaking, nowadays no one speaks directly to the physical machine or processor. There are many layers of software agents, they talk to each other on behalf of users and eventually the bottom most software talks to the processor. The language in which the software tells to the processor what to do is called the machine instruction set or simply instruction set. Hereafter, in this context, instruction refers to machine instruction or its one-to-one correspondence assembly instruction. Each instruction basically consists of an opcode (what to do) and the information about operands (input and output or control transfer). As we have seen, addressing modes are basi- cally the ways of specifying the operands and that in(cid:13)uence the size of the instructions (and hence memory usage) and the complexity of the control circuitry. Let us start with the question of what factors in(cid:13)uence the design of an instruction set for a computer. Addressing Modes Data path Memory Instruction Set I/O Registers set Control Unit Figure 3: Things in(cid:13)uencing Instruction Set Addressing: The ways of specifying the operands. (cid:15) 2 Data path: The path in which the data (cid:13)ows during the execution of the instruction. (cid:15) Basically, datapath includes the registers involved and the processing unit. Control: Tells the data path, memory, and I/O what to do according to the instructions of (cid:15) the program. Registers: Used to hold data closer to CPU for fast access. (cid:15) Memory: Large memory area used to hold program data. (cid:15) I/O: To interact with outside world. (cid:15) Next we will look at various types of instructions. 3 Instruction Types Broadly, based on the functions, instructions can be classi(cid:12)ed into three categories: Data Movement Instructions: Move data back and forth between CPU and memory or (cid:15) the I/O systems. Movement of physical objects takes the object fromone place, say A, to another place, say B. After the object is moved from A to B, it disappears at A and appears at B. In case of data movement, a copy is moved. After the movement, locations A and B holds identical copies of the data. So the proper term would be data copy movement. Some systems considerI/O addressesas partof memory addresses. Theyare called memory mapped I/O systems. Inthatcase, I/Oregistersmaybeconsideredaspartofmemoryand same memory access instructions are used to access the I/O registers. This model simpli(cid:12)es the instruction set and programming e(cid:11)ort. In general, there are four possible data movements: { register to register { register to memory { memory to register { memory to memory Four instructions, one for each, can be used. Some may use LOAD - memory to register, STORE- register to memory, andMOV - register to register. Memory to memory is not very common. In some systems, I/O system is treated separately, in that case, IN for I/O register to CPU register and OUT for CPU register to I/O register or their equivalent instructions are used. Data processing Instructions: Only arithmetic, logical, and relational operations are (cid:15) performed on the data. { Arithmetic Operations: Addition, subtraction, multiplication, and division are mostly supported. For example, ADD for simple add and ADC for add with carry from previous operation are usually supported. Similarly, SUB for subtract and SBC for subtract with carry (carry is then subtracted) are usually supported. 3 Memory Registers STORE MOV LOAD IN OUT Memory Mapped I/O I/O Figure 4: Data Movement Addition and subtraction are normally combinely done using two’s complement and addition. Multiplication and division are more complicated that we will see in details later. { Logical Instructions: These operations allow to manipulate individual bits. We can change or extract a particular bit or a group of bits. Bits can be shifted or rotated right or left. Any use? Multiplication or division by power of 2 can be done by shifting. Left shift - multiplies by 2. (cid:3) Right shift - divides by 2. (cid:3) Shift is e(cid:14)cient. For example, to compute 16n+2n you do 5 shifts and one add. (16n = 4shift, 2n = 1 shift and then ADD) To extract, a constant string called mask is applied with AND operation on the original string. For example: Given string is S = 10101110 01100111 11000110 11101101. To extract the second byte from the left, the following mask is used M = 00000000 11111111 00000000 00000000. S AND M = 00000000 01100111 00000000 00000000. Shift operation may be used to extract the required byte. To change, (cid:12)rstthe complementary part is extracted with suitable mask andthen ORed with change value. Example: Change the second byte from the right to 11000111. The mask M = 11111111 11111111 00000000 11111111. S AND M = 10101110 01100111 00000000 11101101. Change value C = 00000000 00000000 110000111 00000000 4 (S AND M) OR C = 10101110 01100111 11000111 11101101 Other logical operations used are NOT and EOR. { Compare Instructions: Operations used to test a condition or verify a relation. This operation basically sets some condition (cid:13)ags. The operation is CMP. Based on the condition, usually a decision is made to take further action. Control Instructions: Used to control the (cid:13)ow of execution. It is also called branch (cid:15) instructions. Often, branch is combined with condition (relation) operation, so called conditional branch. Basically, thereare two types ofbranchinstructions: conditional andunconditional. We need to specify where to jump, usually by a label. Sometimes a same set of instructions must be executed many times, by changing some data values. This is called iteration or loop. Conditional jump can be used to implement loop. In large programs, often, some tasks are performed many times. This task can be written once as a procedure, function, method, etc., and used many times for logical clarity and brevity. During a programexecution, whennecessary, the control can bejumpedto that task and then the control must return back to the original place. So, it usually involves a pair of instructions - one to jump (CALL) and other to return back (RET). A program calling itself is called recursion, a concept very e(cid:11)ective and theoreticians like its power and use. So we need two instructions CALL and RET for function calls. Also, often, using functions requires passing parameters. Based on the number of operands, instructions can be classi(cid:12)ed into three types: zero operand, one operand (monadic), and two operand (dyadic) instructions. ZeroOperand Instructions: Nooperand,theinstructiondoesnotrequireanyoperand(NOOP) (cid:15) or the operand locations are known (sys). Monadic Instructions: Instructions requiring one operand. Shift, rotate, jump, etc. (cid:15) Diadic Instructions: Instructions requiring two operands. ADD, SUB, etc. (cid:15) Note: Some operands may be speci(cid:12)ed implicitly. ADD A, B, C // A+B to C. ADD A, B // A+B to A. ADD A // Add a to accumulator ADD // Add top two items in the stack and put the result back in the stack. With this generic introduction about instructions, now let us look at the instruction sets used in practice. 4 Instruction Set Instruction set de(cid:12)nes the characteristics of the machine that is intended to execute them. For this reason, understanding the instruction set and programming with instruction set are most e(cid:11)ective way of understanding the architecture of machine itself. For the programmer, it does not matter how thecircuits are organized andconnections are made aslong as theycould instructthe machine using the primitive instructions it supports. Realizing this importance, IBM coined the term instruction set architecture (ISA) and promoted the idea of understanding the computer architecture in terms of its instruction set. This 5 approachalleviatesthedi(cid:14)cultyofdealingwithconnectionsandcircuitsinthephysicallevel, which cumbersome and adds very little insight to understanding the architecture of a computer. This is one of the main reasons that most computer architecture course in modern times are supported with assembly language programming for a real or simulated processor (as we do with simulated 8088 processor). Broadly, computers can be classi(cid:12)ed as reduced instruction set computers (RISC) and complex instruction set computers (CISC). Although there are some guiding principles to design RISC, there is no uni(cid:12)ed set of guidelines followed for CISC design. So CISC are simply non RISCs. In terms of history, CISC precedes RISC. So let us start with CISC. 4.1 CISC Initially,therewasnoconsistentprocedureorpolicytodesigninstructionsets,andmostlyheuristics and intuition based. Convenience was the prime factor in designing instruction set. Convenience can be obtained through (cid:13)exible addressing and higher level operations. The arguments in favor of richer instruction sets were: Richer instruction set would simplify compilers. (cid:15) Richer instruction set would alleviate software crisis.1 (cid:15) Richer instruction set would improve architecture quality. (cid:15) Moreaddressingobviouslyallowthepossibilityofmoreandcomplexinstructions. Morecomplex instructions complicate the circuitry and complex circuitry may be slow overall. This is shown in Figure 4. Convenience Flexible Addressing Higher level instructions More and complex instructions Complex circuitry More cost, energy, heat Slower speed Figure 5: Addressing Modes So, there is a problem. How was it handled? Initially with the concept called microprogram- ming. 4.1.1 Microinstruction Based Approach Theconceptofmicroinstructionswasusedtoavoidbuildingcircuitstoexecutecomplexinstructions directly. The approach was that design machine with limited number of primitive instructions and then built the other complex instructions using these primitive instructions. That is, each complex instruction was implemented as a set of primitive instructions. The code in the primitive instructions was called microcode. The microcode was kept in a special fast ROM memory 1The situation of software development could notcatch upwith hardware developmentwas called software crisis andit still exists. 6 called control memory for execution. So every instruction from memory was executed as a set of microinstructions2 from control memory. This is IBM System/360’s idea. So adding a new instruction was just adding a set of microcode corresponding to that instruction. This was the approach in the 70s. 4.1.2 CISC Example MULT (2), (20) Multiplies the content of the memory locations 2 and 20 and stores the result in location 2. This is a complex instruction. RISC Approach: LOAD A, (2) LOAD B, (20) PROD A,B STORE (2), A PROD multiplies A and B and stores the result in A. 4.1.3 Size of Instruction Set Examples: IBM 370/168 had 208 instructions (cid:15) VAX-11/780 had 303 instructions (cid:15) 4.2 RISC RISC, although it might look very natural approach, is a revolutionary approach to design instruc- tion set. First let us look at the impact of RISC. Microprocessor started the computer revolution in 1971. First 15 years the performance improvement was 35% per year. Since RISC was commer- cialized around 1978, the rate was increased to 55%, projecting doubling of performance every 18 months3. So due to RISC, we are using the future computers now. 4.2.1 Brief History RISC has its roots in three research projects: the IBM 801 - John Cooke around 1974, received Turing Award and the Presidential Medal (cid:15) of Technology for this innovation. the Berkeley RISC processor - David Patterson in 1980 (from University of California Berke- (cid:15) ley) came up with a term called RISC - reduced instruction set and designed a chip named RISC I CPU and then RISC II. It was a DARPA project and played key role in promoting RISC. Sun SPARC was a derivative of Berkeley RISC II. the Stanford MIPS processor - John Hennessy in 1981 designed and fabricated a chip called (cid:15) MIPS. It was also a DARPA project. What are RISC’s architectural Principles? (cid:15) { Compiler technology should be used to simplify instructions rather than to generate complex instructions. 2Each word of control memory is called a microinstruction. 3Moore’s Law. 7 { Simple decoding and \pipelined" execution are more important than program size. { Since cache memory is available, microinstructions should not be faster than simple instructions. How to go about determining the instruction set for RISC? (cid:15) It follows a bunch of ideas. Start with applications and do quantitative analysis rather than following intuitive analysis. { Analyze the applications to determine which operations are used most frequently. { Optimize data path design to execute these instructions as quickly as possible. { Include other instructions only if they (cid:12)t into the previously developed data path, (cid:3) are relatively frequent, and (cid:3) their inclusion will not slow the execution of the more frequent instructions. (cid:3) { Apply a similar strategy to the design of other processor resources. { Push as much complexity as reasonable from run-time hardware into the compile-time software (Basically, do the work ahead). What are the speci(cid:12)c guidelines to design RISC? (cid:15) 1. Single cycle execution of most instructions - more primitive level instructions. (Execute one instruction per cycle - Patterson 1981) 2. Only load and store instructions access memory (Patterson 1981). Move data between registers and memory explicity by simple load/store instruction, not as a part of other instructions using complex addressing mode. 3. Allinstructionsmustbedirectlyexecutedbythehardware(thatishardwiredinstruction decoding) - No microprogramming. 4. Instructions should be easy to decode - simple instruction and easy addressing. 5. Relatively few instructions and addressing modes. 6. Fixed instruction format for simple decoding. (All instructions are of same size - Pat- terson 1981, should not cross word boundaries). 7. Maximize the rate at which the instructions are issued - pipeline. Highly pipelined data path for much concurrency. 8. Support high-level language. (Procedure call/return is the most time-consuming oper- ation in typical high level language programs and due to program structuring modern programs create considerable amount of call/returns.) This must be as fast as possible. The operations involved in call/returns are: { save register values { pass parameters { receive result { restore register values To make this operations faster use register windows to handle call/returns. 9. Multiple register sets(Large register set). Thatis, provideplenty ofregisters. (Organize and use them as register windows - more registers are available, but at a time only a (cid:12)xed number of registers are visible to a program for use. Many uses are possible with register windows. For example, a stack can be emulated - each window is a stack frame. (80% of the scalars were local variables and over 90% of the arrays or structures were global variables - Patterson 1981). 8 10. Use many levels of memory (hierarchy of memories). Do all processors follow these? (cid:15) Not all, but main characteristics such as { cache, { pipelined data path, { register window are usually implemented in all RISC processors. What is the size of the instruction set of a typical RISC? (cid:15) { IBM 801 - 120 instructions { MIPS - 55 instructions { RISC I - 39 instructions Do RISC increase code size? (cid:15) Yes. But, overall, with other advantageous, RISC performs better. Reducing memory access time, which generally is a bottleneck, is key to RISC design. How does it perform better? (cid:15) { Simple instruction set allows easier optimization and design of the circuitry. { The area saved due to simpli(cid:12)ed circuitry by eliminating seldom-used instructions can be used in ways that accelerate the performance of more commonly used instructions. { Simpler instruction set simpli(cid:12)es the translation process and allows better optimization. Then why RISC could not wipe-out CISC processors? (cid:15) { Huge investment was already made with CISC processors. { Compatibility kept these processors market keep going. However, the trend has changed now. Currently almost all modern processors implement the core ideas of RISC. What about Intel series? (cid:15) { Starting with 486, the Intel CPUs contain a RISC core that executes the simplest (and typically most common) instructions in a single data path cycle, while interpreting the more complex instructions in the usual way. It is a hybrid approach. { The popular processor called Advanced RISC Machine (ARM), used in most embedded systems, is a RISC. Intel has license to produce its own ARM series. As the name suggests RISC means reduced instruction set. How far an instruction set can (cid:15) be reduced? To process we need some instruction. So zero is out of question. What about one instruction? Yes. This is called ultimate RISC (URISC), although the architecture does not satisfy other required RISC properties such as pipeline, large register set, etc. Letusseewhatisthatinstructionandhowitissu(cid:14)cienttosolve allthecomputingproblems. 9 { Instruction Structure: Sinceonlyoneinstruction,noopcodeisneeded. Theinstructionessentiallyhasaddresses for operand and jump location. In its generic form it has four addresses: L A B J Where A and B are operand addresses, L is instruction label and J is the address of the jump location. Now instruction is as follows. \A B J: Meaning subtract A from B and store the result in B. If B < 0 then jump to J". Assume that the program always starts its execution at location 1 and stops when jump to location 0. The jump address could be absolute or relative to the current location. If J is a label then it is absolute and if J is constant integer then it is relative. In case of relative jump, we can simply write how many locations it jumps from the current location. Consider the following example. T S -4 T P +3 T P j1 Here-4indicates4locations backwardjump,+3indicates3locationsforwardjump,and j1 indicates it is absolute address. In case of the absolute jump, the target instruction will have instruction label. That is, something like j1 R S or j1 R S j7. No jump indication or +1 refers to jump to the next instruction. Also, in assembly language level, .word is used to initialize memory with a given value. That is, T .word 4 means T is initialized with the constant 4. Now we will see how this one instruction can be used build all other instructions. { Power or completeness of the Instruction: It is su(cid:14)cient if we show how the following operations can be implemented using this one instruction. Data manipulation (cid:3) Addition (cid:1) Subtraction. (cid:1) Data movement (cid:3) moving data from one memory location M1 to another location M2. (cid:1) swapping M1 and M2. (cid:1) Flow control (cid:3) Jump to L if M1 < M2. (cid:1) Jump to L if M1 >= M2. (cid:1) Jump to L if M1 == M1. (cid:1) Let us see how we can accomplish these. We will use T variables as temporary variables to avoid modi(cid:12)cation to original value when needed. Subtraction: Given. (cid:3) Addition: Assume that a is stored A, b is stored in B, and sum must be stored in S (cid:3) and let us use the temp variable T. S S // initialize S. T T // initialize T. A T +1 // put 0-a in T. B T +1 // put -a-b in T. T S // put 0-(a-b), i.e, a+b in S. 10
Description: