The processor (often called the CPU) is the brain of our PC. As its name suggests a processor is device which can processes something, that something is data, this data is made up of 0’s and 1’s (zeroes and ones in digital electronics).
To understand a processor we should have knowledge about digital systems and its functions. All of the works are inside PC is carried out by the means of voltage, or more accurately two difference voltages.
Digital systems use only two voltages, one which is a low voltage for off stage and represent it as 0 and the high voltage for on stage represented as 1.
These 0s and 1s are called bits. A single letter is 8 bits (like A, B)
8 bit 10010010 =1 byte
Processor Architecture
Figure 1
A processor (or CPU) processes bits (binary digits) of data. In its simplest form, the processor will retrieve some data as input, perform some process on that data using ALU, CU and memory, and then store the result in either its own internal memory (cache) or the systems memory.
This called output.
Figure 2
According to the figure 2 computer Architecture is the way we are talking to machine. Actually computer architecture is things of high level components fix together and they work together to deliver performance.
The Brief History of Computer Architecture
3.1 First Generation (1940-1950) – Vacuum Tube
Figure 3
ENIAC- 1945: Designed by Mauchly & Echert, built by US army to calculate trajectories for ballistic shells during WWII, used 18000 vacuum tubes and 1500 relays, programmed by manually setting switches
UNIVAC – 1950: the first commercial computer
John Von Neumann architecture: Goldstine and Von Neumann took the idea of ENIAC and developed concept of storing a program in the memory. Known as the “Von Neumann” architecture and has been the basis for virtually every machine designed
Features:
Electron emitting devices
Data and programs are stored in a single read-write memory
Memory contents are addressable by location, regardless of the content itself
Machine language/Assemble language
Sequential execution
Use of drum memory or magnetic core memory, programs and data are loaded using paper tape or punch cards
2 Kb memory, 10 KIPS
Two types of models for a computing machine:
Harvard architecture – This has physically separate storage and signal pathways for its instructions and data. (The term invented the Harvard. Mark I(Relay-based computer system, which use punched tape for save store instructions and relay latches for data)
Von Neumann architecture – a single storage structure to hold both the set of instructions and the data. Such machines are also known as stored-program computers.
Von Neumann bottleneck – Which very small amount of bandwidth, or the data transfer rate, between CPU and memory.
3.2 Second Generation (1950-1964) – Transistors
Figure 4
William Shockley, John Bardeen, and Walter Brattain invent the transistor that reduce size of computers and improve reliability.
First operating Systems: handled one program at a time
On-off switches controlled by electricity
High level languages
Floating point arithmetic
Reduced the computational time from milliseconds to microseconds
First operating Systems: handled one program at a time
1959 – IBM´s 7000 series mainframes were the company´s first transistorized computers
3.4 Third Generation (1964-1974) – Integrated Circuits (IC)
Figure 5
Microprocessor chips combines thousands of transistors, entire circuit on one computer ship
Semiconductor memory
Multiple computer models with different performance characteristics
Smaller computers that did not need a specialized room
2 Mb memory, 5 MIPS
Use of cache memory
IBM’s System 360 – the first family of computers making a clear distinction between architecture and implementation
3.4 Fourth Generation (1974-present) Very Large-Scale Integration (VLSI)/Ultra Large Scale Integration (ULSI)
Combines millions of transistors
Single-chip processor and the single-board computer emerged
Creation of the Personal Computer (PC)
Wide spread use of data communications
Artificial intelligence: Functions & logic predicates
Object-Oriented programming: Objects & operations on objects
Massively parallel machine
Smallest in size because of the high component density
1971 – The 4004
Figure 6
The 4004 was the world’s first universal microprocessor, invented by Federico Faggin, Ted Hoff, and Stan Mazor. With just over 2,300 MOS transistors in an area of only 3 by 4 millimeters had as much power as the ENIAC.
Feathers:
4-bit CPU
1K data memory and 4K program memory
clock rate: 740kHz
Just a few years later, the word size of the 4004 was doubled to form the 8008.
Intel 8080 Motorola 68000 Intel 386 Alpha 21264
Figure 7
Caparison of CPU(Intel,Motoroal,Alpha)
Intel 8080-1974
Motorola 68000-1979
Intel 386-1985
Alpha 21264
8-bit Data
32 bit architecture internally but 16 bit data bus
32-bit Data
64-bit Address/Data, Adaptive Branch Prediction
16-bit Address
16 32-bit registers, 8 data and
8 address registers
improved addressing
Superscalar, 15.2M Transistors
6 μm NMOS
2 stage pipeline
security modes (kernal,
system services, application
services, applications)
Out-of-Order Execution, 0.35 μm CMOS Process
6K Transistors
no vertual memory support
256 TLB entries
2 MHz
68020 was fully 32 bit
externally
128KB Cache, 600 MHz
Table 1 – Caparison of CPU
History of Computer Invention
Computer History
Year/Enter
Computer History
Inventors/Inventions
Computer History
Description of Event
1936
Konrad Zuse – Z1 Computer
First freely programmable computer.
1942
John Atanasoff & Clifford Berry
ABC Computer
Who was first in the computing biz is not always as easy as ABC.
1944
Howard Aiken & Grace Hopper
Harvard Mark I Computer
The Harvard Mark 1 computer.
1946
John Presper Eckert & John W. Mauchly
ENIAC 1 Computer
20,000 vacuum tubes later…
1948
Frederic Williams & Tom Kilburn
Manchester Baby Computer & The Williams Tube
Baby and the Williams Tube turn on the memories.
1947/48
John Bardeen, Walter Brattain & Wiliam Shockley
The Transistor
No, a transistor is not a computer, but this invention greatly affected the history of computers.
1951
John Presper Eckert & John W. Mauchly
UNIVAC Computer
First commercial computer & able to pick presidential winners.
1953
International Business Machines
IBM 701 EDPM Computer
IBM enters into ‘The History of Computers’.
1954
John Backus & IBM
FORTRAN Computer Programming Language
The first successful high level programming language.
1955
(In Use 1959)
Stanford Research Institute, Bank of America, and General Electric
ERMA and MICR
The first bank industry computer – also MICR (magnetic ink character recognition) for reading checks.
1958
Jack Kilby & Robert Noyce
The Integrated Circuit
Otherwise known as ‘The Chip’
1962
Steve Russell & MIT
Spacewar Computer Game
The first computer game invented.
1964
Douglas Engelbart
Computer Mouse & Windows
Nicknamed the mouse because the tail came out the end.
1969
ARPAnet
The original Internet.
1970
Intel 1103 Computer Memory
The world’s first available dynamic RAM chip.
1971
Faggin, Hoff & Mazor
Intel 4004 Computer Microprocessor
The first microprocessor.
1971
Alan Shugart &IBM
The “Floppy” Disk
Nicknamed the “Floppy” for its flexibility.
1973
Robert Metcalfe & Xerox
The Ethernet Computer Networking
Networking.
1974/75
Scelbi & Mark-8 Altair & IBM 5100 Computers
The first consumer computers.
1976/77
Apple I, II & TRS-80 & Commodore Pet Computers
More first consumer computers.
1978
Dan Bricklin & Bob Frankston
VisiCalc Spreadsheet Software
Any product that pays for itself in two weeks is a surefire winner.
1979
Seymour Rubenstein & Rob Barnaby
WordStar Software
Word Processors.
1981
IBM
The IBM PC – Home Computer
From an “Acorn” grows a personal computer revolution
1981
Microsoft
MS-DOS Computer Operating System
From “Quick And Dirty” comes the operating system of the century.
1983
Apple Lisa Computer
The first home computer with a GUI, graphical user interface.
1984
Apple Macintosh Computer
The more affordable home computer with a GUI.
1985
Microsoft Windows
Microsoft begins the friendly war with Apple.
More info each invention was available at http://inventors.about.com/library/blcoindex.htm by clicking each year full details are available
Analyzes of Pentium 4 32-bit Microprocessor Architectures
Produced From 2000 to 2008
Manufacturer Intel
Max. CPU clock rate 3.6 GHz
FSB speeds 400 MHz
Feature size 180 nm to 65 nm
Instruction set x86 (i386), MMX, SSE2, rapid execution engine, hyper pipelined technology, advanced dynamic execution, a new cache subsystem
Micro architecture Net Burst
Socket(s) Socket 478
Core name(s) Willamette
Northwood
Prescott
Cedar Mill
Any computerChip details are available at http://happytrees.org/chips?page=manufacturer&manufacturer=Intel&family=4004
The mean of 32-bit Microprocessor
32-bit mentions number of bits that can be processed or transmitted in parallel, which basically means at the same time as one. A single element in a data format, called Octets (four Bytes) or double word.
The term ’32-bit’ is also applied to the following within:
A 32-bit microprocessor can process 32bit width of the data and memory addresses in registers.
Data bus 32bit of the physical number of wires which can transmit 32 bits in parallel.
32 bit will divided to two parts inside Graphical device, such as a scanner or digital camera, 24 bits are used to specifying the number of bits used to represent each pixel. That is true colour and the remaining 8 bits are used for control information.
With in Operating system number of bits used for memory addresses.
In computer architecture, memory addresses, data units are wide 32-bits (4 bytes or octets)
32-bits is store 0 to 4,294,967,295 or −2,147,483,648 to 2,147,483,647 for any number (integer) using two’s complement encoding.
Four gigabytes (4,000,000,000 bytes) of addressable memory are available for read/write data if a processor has 32-bit memory address data lines
32-bit is a most important implementation in computing for last 20 years to recently. Because 32-bit CPU and ALU are based on registers, address buses, or data buses of its size. And this term has become standard 32-bit processors.
Normally address and data buses are wider than 32-bits. But the 32-bit processor can store and manipulate internally as quantities
Pentium IV 3.6GHz Bus Architecture
Figure 8
North Bridge or Memory Controller Hub make main communication pathway. This called as the processor bus (front-side bus/ FSB) which is in between the CPU and motherboard chipset. This bus runs at 66MHz to 800MHz in modern systems according to mother board design and its chipset.
Here are the basic differences between Pentium 4 architecture and the other CPU architecture:
Pentium 4 can transfer four data in each clock cycle. This called as QDR (Quad Data Rate).Then the local bus can transfer data 4 times faster its actual clock rate, (see table 2 below). When the clock rate is 100MHz and its local bus is 400Mhz.Then the data transfer rate is 3.2 GB/s on System Interface.
Real Clock
Performance
Transfer Rate
100 MHz
400 MHz
3.2 GB/s
133 MHz
533 MHz
4.2 GB/s
200 MHz
800 MHz
6.4 GB/s
266 MHz
1,066 MHz
8.5 GB/s
Table 2
The L2 and L1 data path is 256-bit wide. This was 64bits in early Intel processors. (See figure 8,L2 is L2 cache/ control, L1 is L1 D-Cache/ D-TLB) So current processors can communicate is four times faster than early processors at same clock were running. However early and current processor data path in between L2 and the pre-fetch unit is 64-bit wider.( pre-fetch unit is BTB and I-TLB in figure 8)
The L1 instruction cache was relocated with a new name “Trace Cache”. L1 built before fetch unit and after decode unit. (Some people make mistakes because of this L1 new place and name. Actually L1 is not missing in Pentium IV. Just with new name and different place). New L1 in Pentium IV (trace cache) can get more than 12 K microinstructions. Because its size 150KB.If one instruction is 100 bit wider then PIV trace cache (L1) can work 8 times faster than early processors.(12KB*100/150)
Early Intel processors have 40 internal registers. But it is 128 On Pentium 4. This was done by registry renaming unit Register Alias Table as shown Figure 8(Rename/Alloc and RAT)
P IV built with 5 execution units in parallel. 2 units for loading and storing data to RAM memory.
This bus normally consist with 50-100 physical address lines(Circuit path).It has divided to three subassemblies
The address bus (memory bus) is unidirectional (one sides only at a time) transports memory addresses .The processor will read or write data when needed to access
The data bus is a bidirectional bus(both side at a same time) which use to transfers(send or get) instructions to the processor/from the processor
The control bus (command bus) also a bidirectional bus which can carry orders and synchronization signals receiving from the control unit and delivering to all other hardware components. Then the hardware will transmit its respond signal.
For example, If we have a Pentium 4(3.6GHz) processor which has 800MHz bus when its clock rate is 200MHz. By using the below formula we can calculate its maximum instantaneous transfer rate.
800MHz x 8 bytes (64 bits) = 6400MBps
Bus Architecture: – Three buses:
Address:
If I/O, a value between 0000H and FFFFH is issued.
If memory, it depends on the architecture:
20-bits (8086/8088)
24-bits (80286/80386SX)
25-bits (80386SL/SLC/EX)
32-bits (80386DX/80486/Pentium)
36-bits (Pentium Pro/II/III)
Data:
8-bits (8088)
16-bits (8086/80286/80386SX/SL/SLC/EX)
32-bits (80386DX/80486/Pentium)
64-bits (Pentium/Pro/II/III)
Control:
Most systems have at least 4 control bus connections (active low).
MRDC (Memory ReaD Control), MWRC, IORC (I/O Read Control), IOWC.
Bus Standards:
ISA (Industry Standard Architecture): 8 MHz
8-bit (8086/8088)
16-bit (80286-Pentium)
EISA: 8 MHz
32-bit (older 386 and 486 machines)
PCI (Peripheral Component Interconnect): 33 MHz
32-bit or 64-bit (Pentiums)
New: PCI Express and PCI-X 533 MTS
VESA (Video Electronic Standards Association): Runs at processor speed
32-bit or 64-bit (Pentiums)
Only disk and video. Competes with the PCI but is not popular
USB (Universal Serial Bus): 1.5 Mbps,12 Mbps and now 480 Mbps
Newest systems
Serial connection to microprocessor
For keyboards, the mouse, modems and sound cards
To reduce system cost through fewer wires
AGP (Advanced Graphics Port): 66MHz
Newest systems
Fast parallel connection: Across 64-bits for 533MB/sec
For video cards
To accommodate the new DVD (Digital Versatile Disk) players
Latest AGP 3.0 with peak bandwidth of 2.1GB/s
ALU (Arithmetic Logic Unit)
ALU (Arithmetic Logic Unit) is one most import part inside CPU for integer operations. This is a very small unit inside CPU (See Figure 9). In Intel processor they make this unit run in two times than processor clock. If CPU is working at 1.8GHz then ALU will works at 3.6GHz speed. But this will not realize that doubling of ALU speed will faster for other operation like floating-point operations in SSE or MMX
Figure 9
ALUs execute simple integer instructions; therefore the new CPU should prove just perfect in integer operations. However, the doubling of ALU working frequency doesn’t tell in any way on the Pentium 4 performance when working with floating-point operations, SSE or MMX.
However Pentium 4 1.4GHz ALU latency(logic, add, subtract, multiply, divide, shift) will work same as Pentium III 1GHz. Normally PIV 1.4GHz ALU latency will spend 0.35ns to execute an add(+) operation. But Pentium III takes 1ns for the same instruction. Although the ALU frequency was double but there is no big different in operation execution time.
Further details about ALU and SSE, MMX are available at http://www.xbitlabs.com/articles/cpu/display/pentium4-1400-1.html
MEMORY MANAGEMENT UNIT (MMU)
Figure 10
Memory management is managing computer memory. In simplest way this will provide portion of memory to programs when they wanted. Not only providing but also freeing it when not in use or no longer needed.
The MMU/IOMMU1 responsible is managing the computer’s memory system. This component located between the CPU and system memory as a buffer. MMU/IOMMU can translate CPU-visible virtual addresses to physical addresses; the IOMMU takes care of mapping device-visible virtual addresses (device addresses or I/O addresses) to physical addresses. They can provide memory protection for against misbehave devices weather it is separate chip usually it is interconnect with CPU.
There are three areas performed by MMU
Hardware memory management
Operating system memory management
Application memory management
The hardware memory management includes random access memory (RAM) and memory caches. RAM is the physical storage that is located on the system board. It is the main storage where the computer read and written data. Memory caches will helps to CPU speed up its processing time by holding copies of some data in main memory.
1. IOMMU is the Graphics Address Remapping Table (GART) used by AGP and PCI Express graphics cards
Operating system memory management is using hard disk allocated space as memory when the physical memory is out of memory space. This hard disk space called as Virtual Memory. (Figure 10) This process is done by the computer automatically when the program requested it. This allocation is done by the MMU according to the operating system and other applications. The virtual address area in CPU is included a range of addresses divided into pages which are allows operating system to allocated space in hard disk in equal size. (Figure 12)
Figure 11
Application memory management is the process of allocating the memory for program to run. There are many copies for one program in larger operating systems. The memory management unit will assign memory address for the program which is best fits to its run. These kinds of program assign same address. Also the memory management unit distributes memory resources (Garbage collection1) for programs on its needs. Finally the memory will be recycled by the MMU for further usage when operation is done.
1. Garbage collection is the automated allocation and removing of computer memory resources for a program.
.
Figure 12
IA-32
Intel invented 80386, microprocessor in 1985 which extended to 8086 with 32-bit and IA-32 architecture. This architecture support P5 (Pentium), P6 (PentiumPro, II, III), P7 (Pentium 4), and Pentium M family processor over a long time while maintaining full software compatibility with OS code even the MMU is extremely complex with many different possible operating modes.
Table 3- Summarization of Intel IA-32 in Major processors
Example for other well known memory management Architecture
VAX
ARM
BM System/370 and successors
DEC Alpha
MIPS
Sun 1
PowerPC
IA-32 x86-64(Extended of IA-32)
Unisys MCP Systems (Burroughs B5000)
Pentium 4 Pipeline
Pipeline is a technique used in the CPU and other digital electronic devices to increase their processing speed. By reducing its fetch, decode and execute time. Intel Pentium III use 11 stage pipelines but Pentium 4 has 20 stages .So Pentium 4 processor will executed a instruction faster than a Pentium III. If its on 90 nm Pentium 4 generation processors much faster than both. Because of 90 nm Pentium 4(Prescott) has 31-stage pipeline.
Pipeline is using in order to reduce processing time for an instruction or else increase the clock rate of processor. These stages are constructed by using fewer transistors. By having more stages for each individual stage, helps to achieve higher clock rates. Pentium 4 faster than Pentium III. Because it can work at a higher clock rate. But Pentium III CPU would be faster than a Pentium 4 at same clock rate because of pipeline size
Therefore Intel has already announced that they not use Net burst (Pentium 4) architecture for their 8th generation processors. They are planned to use Pentium M architecture. That is Pentium III architecture base on Intel’s 6th generation architecture
In Figure 13, shows Pentium 4 20-stage pipeline.
Figure 13
Pentium 4 pipeline.
How a given instruction is processed by Pentium 4 processors in each stage (See figure 13)
TC Nxt IP: This stage used for branch target buffer (BTB) waiting for the next microinstruction to be executed. This is a Trace cache next instruction pointer. This used two stages.
TC Fetch: microinstruction fetched to the Trace cache. This step used two stages.
Drive: The resource allocator and register renaming circuit gets processed microinstruction
Alloc: Checks what resources will be needed to the CPU according the microinstruction and Allocated- EX the memory load and store buffers.
Rename: x86 registers it will be renamed into one of the 128 internal registers. This step used two stages.
Que: According to microinstruction type they will be categorize in a Queue (Ex integer or floating point). Keeps them in the scheduler until same type is an open.
Sch: This Schedule will take all Microinstructions are according to their type to be executed. It must be in order before arriving to this stage. Other the scheduler will re-orders all instruction to keep all execution units full. This step used three stages.
Disp: Sends the microinstructions to execution engines and dispatched. This step used two stages.
RF: Stored instructions which are read in the internal registers called Register file. This step used two stages.
Ex: Executed Microinstructions.
Flgs: Updated microprocessor flags.
Br Ck: The branch prediction circuit will check that program is same predicted. Branch Check.
Drive: branch target buffer (BTB) present on the processor’s entrance for the sent results
Figure 14
F: instruction fetch
D: decode
E: execute
M: memory access
W: register write-back
More details are available at http://www.hardwaresecrets.com/article/Inside-Pentium-4-Architecture/235/3
The Future of Microprocessor Architecture
A new micro-architecture which is using by Intel CPUs on 2011 called Sandy Bridge. A Nehalem micro-architecture which was used in the Core i7, Core i3 and Core i5 processors was evaluated to the Sandy Bridge
Intel’s 7th generation micro-architecture for the Pentium 4 called Netburst no longer using for their 8th generation. They decide to go to their 6th generation micro-architecture which is use in Pentium Pro, Pentium II, and Pentium III, dubbed P6. which proved to be more efficient. Intel developed the Core architecture by using the Pentium M CPU (6th generation CPU). Finally Intel develop this little bit more by adding an integrated memory controller and released it as the Nehalem micro-architecture and used for Core 2 processor series (Core 2 Duo, Core 2 Quad, Core i3, Core i5, and Core i7). All new generation of Core i3, Core i5, and Core i7 processors to be released in 2011 and 2012 use Sandy-Bridge micro-architecture
The main features of the Sandy Bridge micro-architecture
The north bridge chip integrated with memory controller, graphics controller and PCI Express controller as the rest of the CPU.
32-nm manufacturing process
Ring architecture
New decoded microinstructions cache for storing 1,536 microinstructions, which can translates in more or less to 6 kB in L0
32 kB L1 instruction and 32 kB L1 data cache (same as Nehalem)
L2 was renamed to “mid-level cache” (MLC) with 256 kB
L3 memory cache called LLC (Last Level Cache) which is shared for CPU cores and the graphics engine
Next generation Turbo Boost technology
New AVX (Advanced Vector Extensions) instruction set
Improved graphics controller
Redesigned DDR3 supporting memories up to DDR3-1333
Integrated PCI Express controller (x16 lane or two x8 lanes-same as Nehalem)
socket 1155 pins
More details are available at http://www.hardwaresecrets.com/article/Inside-the-Intel-Sandy-Bridge- Microarchitecture/1161/1
Conclusions
We can’t get any definite conclusion on Pentium 4 performance. There are bunch of cool advantages. This will allow to Intel to easily increase the processor working However Pentium 4 falls behind Athlon processor, because of the super deep 20-stage pipeline and small L1 data cache. The performance of Pentium 4 is same in some application. That’s why in the nearest Pentium 4 won’t be able to beat Athlon CPU. They can get higher working frequencies using new Palomino core and DDR SDRAM support.
On other hand Pentium 4 have some bad drawbacks. One is cost for the CPU comparing to AMD and price of their RDRAM and the main boards for Pentium 4. Second one is there is no big different in application running based on Athlon CPU which is equal to Pentium III
However now they have shift to new architecture called Sandy Bridge and new chip set and new memory. But still we haven’t got a any idea about its prices. However Intel will shift to 0.13 micron manufacturing technology and new chipsets which can support cheaper memory than the today’s RDRAM. Then Intel will win market and High ended workstation.