Dual-core versus dual-processor

This site may earn affiliate commissions from the links on this page. Terms of use.

a little historywhen you look back at computer evolution over the past 15 years it becomes clear that the trend has always been to go multi-core. this has come about for several reasons, but a couple jump out at us. first, chipmakers have known for a long time that they would someday hit a theoretical limit, thereby prohibiting them from going any further (due to the laws of physics). they also knew that before that theoretical limit came they would reach a practical, real-world stopping point, namely that of a cost-effective implementation of the physics required to produce their silicon-based products. for that reason we have seen rollout schedules and timelines and product upgrades that allow a certain amount of money to be made throughout the r&d effort so as to spread out the cost of going further.

second, multi-core chips are a straightforward, logical progression of that which chipmakers have sought for so long: to go faster.

a little realitysemiconductor gates operate much more quickly than chips do as a whole. in fact, an actual cpu doing real work on something (such as adding two numbers together, or grabbing some data from memory) is so unbelievably slow compared to the internal operations that go on inside of the logic processing units that you could pretty easily compare it to a car. in order for a car to drive 55 mph it must have an engine going exceedingly fast inside. it's the same with cpus. for all of the spreadsheet processing, internet browsing, e-mailing, and game playing they do, the stuff inside is going amazingly fast by comparison.

still, with all of this speed there is only so much that can be done in the real-world. this is because those semiconductor circuits are so primitive, to get any real-world, tangible use out of them you have to put an extremely large number of them together in sequence. once they are “affixed” in this way, those real workloads that actually mean something begin to emerge. because of this requirement there is an absolute maximum speed we can cost-effectively see microprocessors made in volume production.

looking aheadchipmakers have known this forever. from the first khz-level microprocessors that were produced the semicon crafters looked to the future to see what they could see … and i bet that many of them did not think we could get to where we are today. yet, here we are.

the human mind continues to push the boundaries of physics, with physics always pushing back, and so far … we have found ways to work together with those laws to make things happen. but we're told that is fast coming to a close. within a decade the level of advancement may be stifled to the point where it is no longer cost-effective to proceed with “smaller and faster.”

we are actually seeing the side effects of this already today. it is becoming more expensive (and impractical) to go faster, so chipmakers have turned to wider implementations, i.e., multi-core. this allows the computing power to increase (nearly double) without having to make anything smaller, faster, or much more difficult to manufacture.

dual-core vs. dual-processornow, returning to our subject focus for a moment. it has long been obvious that dual-processing was of great benefit for users. in 2000 i bought a dual-processor server motherboard and filled it with pentium ii xeons (and later pentium iii xeons). that machine still, to this day, serves me adequately for nearly everything i do, despite it only having a 550mhz clockspeed. the reason? the two processors can be working separately on isolated tasks, meaning that while one cpu is busy compiling something i'm working on, the other cpu is free to handle regular os requests, gui updates, my surfing, e-mailing, etc. the only time i really see any notable slowdown is when i run java apps, but i also see a slowdown running them on a high-speed single-core machine.

dual-processing is nice, but it does not compare to dual-core. in order for two disparate, separated, and isolated components to work cooperatively together, they need to “talk” to each other. with cpus the same holds true. because of the physical implementation of a workload across two (or more) processors, a type of coherency must be maintained between them because it cannot be known in advance which part of memory or i/o port might be being accessed by either processor at any given time. since by definition either processor could access any piece of memory at any time, the likelihood of either processor needing something the other processor has just used becomes a real consideration.

for this reason multi-processing systems implement a type of “snooping” protocol that basically asks the other processor(s) if any of the required memory locations happen to be in that chip's cache rather than in main memory. the most common implementations are mesi (modified, exclusive, shared and invalid) and moesi (modified, owner, exclusive, shared and invalid). these protocols represent a sequence of electrically defined “commands” that communicate data states. a chip must issue a request and wait for the responses before acting. if no other cpu is using that memory then it's good to go; if another cpu is using that memory then it must wait for it to be available.

this coherency traffic occurs on the main system bus and is the primary reason intel suffers in performance when scaling (because intel uses a shared bus architecture). amd also has issues with scaling to 8 processors because of the number of direct ht links.

a two- or four-way system can find out from every other processor in only one “hop,” meaning each cpu has a “direct line of sight” on this cache coherency roadmap to every other processor. when you jump to 8-way it must make two hops there and two hops back to get to the processor the furthest away. this greatly increases the required wait time before proceeding. the numa (non-uniform memory architecture) can improve upon that, but only if implemented properly.

in a dual-processor configuration there are typically inches between processors. even traveling at the speed of electricity through circuits on a high-speed interface clock-rate, it takes time for those transactions to occur.

switching to dual-core, we see that things become a great deal simpler. it is my understanding that today's implementations are little more than bolt-on dual-core implementations, meaning there is no real specialty circuitry that's been created to greatly enhance the inter-core communication between the right- and left-side cores inside the physical processor package. they still operate on ht links, but they just happen to be very close to one another (which speeds things up a bit). were they to implement a shared l2 or l3 cache or increase the ht speeds internally to operate at much higher frequencies (double their current implementation) for inter-core communication, we would see dual-core performance going up notably. but that kind of development costs money, and chipmakers are in this to make money, not give us the best solution. can't really fault them for that, however. :)

the conclusionthere really are no two ways about it. if you have any intention of getting a new system, you're going to want to go with a dual-system. now, the decision as to whether or not you go with a single-socket, dual-core system, or a dual-socket, single-core system is up to you. with a dual-socket, single-core system you have the potential of upgrading in the future to dual-core chips, making your two-way system suddenly become a four-way system. that may be of interest to some folks and could, potentially, nearly double your system's performance in certain apps.

benchmarks have shown us that dual-core is better than dual-processor. economics have shown us that dual-core is cheaper than dual-processor. and user experiences have shown us that even the somewhat slower clocked (and cheaper) dual-experience greatly exceeds even a fast single-core system.

multi-cores are the way of the future. we've heard intel talk about “hundreds of cores” on a single cpu. we have cell technology allowing many, many specialized cores working in cooperation to process data. in short, we have hints of what's to come, but the ultimate choice will be one that evolves practically through economics and usability. as wonderful as some products are from a design point of view, they don't always have great utility in the real world (consider itanium and its original roadmaps).

i hope you have enjoyed this article. please feel free to post comments below.

user comments 38 comment(s)

right on(4:48pm est fri jul 07 2006)first post!

good article to which i share the same views. i'm upgrading to amd64 x2 later this year but i wish it was a dual opty personally. maybe in about four years when opterons and boards are at $100 or so. til then – by cp

thanks, sir rick(4:52pm est fri jul 07 2006)a good write up, again.

the only thing that i would add is this: that there are a lot of things that don't benefit in any way from dual core or dual chip.

i know that looks like a substantial list, and that one might conclude, “then what's dualie so good at then?”

but the truth is just as sir rick alludes: the availability of multiple cores seems to litterally 'break the bottleneck' that causes jerky response from single-threaded cpu systems. you might not get your query done in the absolute fastest time, but you can still use the computer for other things effectively. most of the time, my old 600 mhz p3 dualie has both cpus going pretty good, averaging about 50% per cpu. i can toss extra stuff on the load queue, and it does great. not so with my amd fx/57 (with great memory and a huge raid striped hard drive). while it certainly gets the analysis programs done in 1/6th the time, it also becomes a dog of flanders.

in the mean time, dual core does fine – the fact that the operating system, be it linux or windows 2000/xp, on its own can take advantage of more than one processor, leads to smoother response in general.

like goatguy, support a piii 550 dual server – runs well most of the time. find math intensive operations slow (self-extracting/unzip/zip files for example).

i have found dual core does help games, as the os will have services and processes running in the background in any case whilst playing (too time consuming to keep shutting down ones not needed whilst playing – and would you want to shut antivirus/firewall whilst connected to the net…).

p6dge(6:00pm est fri jul 07 2006)i think that's what my mb is. i use to run dual p3 500's and upgraded them to 850's and you bet it screams compared to my lowly p4. but, i looks like intel at least hasn’t really advance much with their p4 at all. geek published a chart showing current die sizes how many cores could you fit on what a current p4 transistor count is…or something to that effect. in theory i could probably be running 50 p3 850 at maybe 1.3ghz or something like that. then i could crank up f.e.a.r. – by klay

hell!(7:35pm est fri jul 07 2006)throwing an overclocked opteron 165 @ 2.6ghz with 2gb of ram at vista beta2 x86-64 edition feels like xp on a 486 (only less stable) [ok so thats a tad harsh, still thanks to m$ for the beta none-the-less].

back on subject, for real tasks that most regular people do on their pc's, dual core is a real bonus. i mean one core for the spyware & malware, another for the av scanner, and maybe in between some spare time to run the os? gotta be the way forward! – by anqe

in all cases, snooping each other's l2 cache have severe performance penalty. imagine having to transfer data between the caches, and delays associated this will go thru the roof! that's why a unified shared cache would be more practical

in the future, the move to unified shared cache will become more significant as more cores are implemented on a single die.

and intel are expert at this large (and shared) cache thingy.. their 4mb 8-way cache on 65nm is much smaller than amd's 2x1mb 2-way cache on 90nm. its probably why amd hired ex-intel's itanium engineers also (the itanium features a shared l3 cache).

as die shrink increases and better fab process (45nm, 32nm and beyond) introduced, the significance of shared cache will become apparent and also we will be seeing more cores per die. the days of multi-processors systems will be numbered pretty soon… we may soon see “super-computer”-like processors on out very desktop in the near future… – by future shock

great article(12:32am est sat jul 08 2006)i liked the explaination. i like tech, but am not a super geek in know all the details behind how things work. enlightening. – by grateful

core 2 duo(1:38am est sat jul 08 2006)“but that kind of development costs money, and chipmakers are in this to make money, not give us the best solution. can't really fault them for that, however. :)”

core 2 duo has a shared l2-cache.– by intelmakeschips

2x256k athlon x2 is the new celeron(2:20am est sat jul 08 2006)the consumer-end has never benefited enough to justify the price increase of multiple cpu sockets.

volumes of x2, pentium d and core duo have been large enough to get the software developers to start developing multi-threaded versions of new apps.

my opinion is that the new 2x256k athlon x2 cpu is going to effectively become the new “celeron-class” cpu. this means that single-threaded apps that could benefit from multi-threads will be at a disadvantage. who wants to develop apps that are already at an architectural disadvantage for the majority of newly purchased computer owners?

availability and prooven performance in multi-threaded software will be the real traction for multi-socket and multi-core… but it could not have come sooner than the arrival of the dual-core celeron-class cpu. – by chipace

dualies rool(2:26am est sat jul 08 2006)i have a dual pii 333mhz workstation, and for everything except gaming it's perfectly adequate. about the only non-gaming tasks that suffer are java (you need a quad system to cope with that steaming pile of bloat) and ripping cds to mp3 which takes an age.

but the basic point you're making that two is better than one is 110% spot on.

my dual pii has had longevity far beyond anything i could have expected when i bought the system with it's original single pii 266.

once you go dual, you never go back….– by highlandcynic

dualies…(3:43am est sat jul 08 2006)most average users would never know the difference, they have to hard of a time just attaching a pic to their email…– by lifetime gamer

fdsdv(3:49am est sat jul 08 2006)so the cell is the future.. is that what the article is getting at ? so intel and amd are moving towards what ibm/sony/toshiba have done – by dfd

ubuntu(9:58am est sat jul 08 2006)the many kinds of kernels386 – the default kernel in ubuntu is the 386 kernel (but i hear its really a 486 kernel). what that means is that its the most compatible kernel because it supports the oldest tech. here is wiki page to learn more: 686 – the kernel that is recommended for use with any intel processors in a computer that are more recent then a pentium pro. so even old pentium 2s and 3s can get in on the act. installing this kernel with may improve performance. k7 – this kernel is recommended for a computer with a athlon or newer amd cpu. new 64 bit amd cpus can use it as well if you have the 32 bit version of ubuntu installed. installing this kernel with may improve performance. smp – this kernel is needed to use both cpus in a multi-cpu setting. the kernel is also required to “enable” hyperthreading in modern pentium 4 cpus. also needed for dual core processors (the letters stand for symmetric multiprocessing). very important to install this, as it will improve performance. how to install new kernels

you need to install the file below as indicated:

for a modern pentium 2+ (686) kernel:

quote:sudo apt-get install linux-686

for a modern pentium 2+ kernel for dual processors, dual cores, or hyperthreading:

it is amd that has offered all the advancments.. intel just wants to milk you testicles…

who is this new moderator trying to act smart and neutral.

– by rick geek's fluffer

dual cores, the beginning of the end?(12:41pm est sat jul 08 2006)will dual core processors, finally give desktops that boost that will satisfy businesses longer? will the ungrade cycle slow down? most desktops will have plenty of processing power and responsiveness to slow down the need for faster and faster computers. – by noworkia

re: rick geek's fluffer(12:49pm est sat jul 08 2006)“dual-core

bottom line amd invented it and intel copied it.”

ya, amd invented dual core, just 4 short years after ibm began shipping dual core processors in the power4.

i am not a fan boy(9:22pm est sat jul 08 2006)i report facts in the most unbiased way! i always add incite and provide new prospective on technology…

i am not a fanboy or biased to amd – by rick geek

too funny(9:59pm est sat jul 08 2006)rick geek wrote:

“i always add incite…”

ok, i like ya rick i like your articles, opinions, and all, but…”incite” intead of “insight”. could this be the proverbial “freudian slip” we sometimes hear about? anyway, i got a chuckle out of it, ya know? take care, and keep tellin' it like it is, man. – by athlon goodness

hmmm(12:10am est sun jul 09 2006)since there is a sared and unshared l2 cache, is it possible to have just one l2 cache for 2 processors? have it as large as it needs to be, but available to both…. or is that the problem of cache snooping?

i think future shock said much te same above?

anyways. dual cores rock. no doubt about it.

– by headley

headley(2:48am est sun jul 09 2006)what you describe is essentially the arrangement for core 2 duo, for example, as already said in the thread, although rick doesn't mention it (what a surprise). if you mean at the system-level rather than die-level (ie. processors on different dies rather than cores on the same die), an extra level of cache (eg. level 3 or 4) is also sometimes used, but that is rather outside the scope of the discussion on cache coherency. – by intelmakeschips

re:tux(10:09am est sun jul 09 2006)i installed 686 – the kernel that is recommended for use with any intel processors in a computer that are more recent then a pentium pro. so even old pentium 2s and 3s can get in on the act. installing this kernel with may improve performance==now my computer is very fast.thanks. – by newbie

hmmm(6:08pm est sun jul 09 2006)hey intelmakeschips, that is a good point :)

i was referring to there being just one large 4mb cache though, that was available to both cores, that isn't split by the bus… if you know what i mean?

intels way is good though, no doubt about it

– by headley

not to start a flame war, but…(9:51pm est sun jul 09 2006)switch already above made his normal request to have people drop what they know and buy an apple machine. my question is, has apple's market share gone up over the last year? i am pc to the bone, but apple does make a nice looking machine. has this translated into more sales? – by bneals

nice stuff(10:34pm est sun jul 09 2006)yah duellies are nice, maybe one day i will upgrade to them, but for right now my 2 pentium 3 1 gigs and amd athlon 3000+ do everything i need them to do, and that is gaming, gaming gaming. :-) – by slick

omg, stop the noise!(11:54pm est sun jul 09 2006)damn, no hyperlinks…..even to this site lmao. well, i would like to say there are some short memories in this forum and by mr. rick. do a search here on “intel diamond” and this discussion is done. g'day!!! bitches……. – by godzulu

headley(2:27am est mon jul 10 2006)“i was referring to there being just one large 4mb cache though, that was available to both cores, that isn't split by the bus… if you know what i mean?”

yes, i know what you mean and that is the core 2 duo arrangement. – by intelmakeschips

variations on parallelism(2:27am est mon jul 10 2006)i was a bit disappointed with the laymen's level of the article.i appreciate the restraint not to name amd vs. intel, but especially here amd, ibm, etc. would have some data to offer.the dual 500mhz pentium iii mentioned may be good enough for compiling and development work (or simple web server), but it ain't good enough any more for most games, or even running your google desktop imho. and yes, ultimately performance is not about mhz, it is about total system operations per time.

parallelism is the name of the game:of course parallel implementations can vary from physically distinct units (such as math-coprocessors, multiple processors, dsps (read graphics or audio chips) to simd/mimd, or pipelining.

if rick will look hard enough, he will indeed find some classification schemes out there that make different architectures comparable. i would appreciate if his article would put out a bit less guess work and some more hard facts. this is a place for geeks.

the evolution of parallelism is constrained optimization, whatever is easiest and promises the best payoff is done first. in fact risc is all about development speed. factor also in the difficulties of rewriting compilers that map to the new hardware. vliw is all about parallelism.

time for multi-core and numa is now, but integration of gpus, multiple gpus, etc., will be next.

the problem is to clean out the garbage of development every so often, for example to gain power efficiencies by shutting of pieces of the chips that are not used …

rick: can you give us a bit more of a technical context of multi-core, multiprocessor, blades/clusters, etc.?in fact a modern blade server may offer high efficiencies in hardware and administration, while the software to use such clusters (in comparison to single computers) is still primitive.what are the tradeoffs really? and no, just the low latency in multicore vs. board is not sufficient.

bring it on …

thanks. – by thinkgeek

rick there is a flaw in your argument(8:03am est tue jul 11 2006)the flaw is, do you want each core to grow smaller with each new generation? i think you will find, no, instead of cores moving toward single transistor regime, you have cores growing larger to handle more bits at the very least. as a result, i expect that the multi-core approach will not be that cost-effective or high-performance. a larger core has its own distance latency to deal with as well as that of the larger cache. and a larger core, not to mention more of them in a single chip, increases the area on the wafer, making them more expensive and difficult to yield. – by analyst

laptops(2:54pm est sat jul 22 2006)i was wondering if anyone had any suggestions to what kind of laptop i should get… i was looking at the duo core's, but i read all of your blogs and was wondering if amd x2 laptop would be better for gaming and general computing? – by questionneer

dear questioneer(11:18am est fri jul 28 2006)that answer would be general computing – by the man

rickgeeks…(8:25pm est fri sep 29 2006)fantastic analysis…i too had a dual 500 machine that i built in 1999 with 10k rpm scsi drives. if it had not malfunctioned, i do believe i could still be using it for most things execpt for vmware. and, i believe that disk speed is still more important in many instances than cpu speed.– by paul.

*nix os – multithreaded(7:35pm est sun may 20 2007)i was reading around the net regarding the performance issues that linux has with multi-core and/or multi-processor systems and the general consensus seems to be that linux for the most part has little to no gains from having multiple processors / cores. then a friend directed me to a whole new linux-like os they are building specifically to make use of multiple processors / cores. built multithreaded from the ground up! the kernel itself is written specifically to make best use of multiple processors / cores. it is called lynux. you can download it here: