Post

Rambling on Multi-Core

Suddenly, like a spring breeze overnight, under carpet-bomb advertising and media blitz, PCs and PC-based servers seem to have entered the multi-core era overnight. Dual-core computers, quad-core servers have flown into ordinary homes.

Intel and AMD have spent enormous effort promoting multi-core — market interests are certainly at play. Why multi-core? You can give plenty of reasons: improved efficiency, reduced power consumption, semiconductor technology reaching its limits, and even “the world is inherently parallel.” I won’t go into all that here. This article mainly discusses what I think multi-core might mean for programmers.

First, let’s talk about the relationship between multi-core and Moore’s Law. Moore’s original formulation: “the number of transistors on a chip doubles about every two years.” So Moore’s Law was never about processor speed — just the number of transistors. In later years, as clock speeds increased, Moore’s Law was reinterpreted as “CPU clock speed doubles every 18 months.” But now? Clock speeds have plateaued. My 2001 laptop had a 1GHz processor — if Moore’s Law held, my current laptop would have at least 8GHz. That never happened. Just when everyone was questioning how far Moore’s Law could go, multi-core appeared. Supporters rejoiced — Moore’s Law could be reinterpreted again: the number of cores doubles every 18 or 24 months. While a core isn’t a transistor, it’s close enough. Predictably, 8-core, 16-core, 32-core, 64-core, 128-core will soon hit the market. Long live Moore’s Law.

Multi-core is mainly an architecture change. What does it mean for software developers?

Multi-core brings increased processing power, mainly parallel processing. Parallel computing isn’t new in computer science. Speedup ratio, Amdahl’s Law (I can never spell his name right) are all classic parallel computing concepts. Many multi-core problems can be reduced to parallel computing problems. So multi-core hasn’t revolutionized theory — its main contribution is in engineering.

That said, talking about parallel computing and SMP is easy. Actually optimizing code for multi-core has many aspects.

I believe the first thing for programmers, like many official documents say, is to “raise ideological awareness, emancipate the mind, achieve a fundamental转变 in thinking.” This isn’t empty talk — after so many years of single-core serial thinking, many deeply ingrained notions need to go.

Example: you write a program with two threads, one high priority, one low. The OS strictly schedules by priority. On single-core, the high-priority thread executes first. But on multi-core, the two threads run simultaneously on two cores, regardless of priority.

There are many such cases — cache usage, etc. I won’t go into detail — plenty of reference material. In short: always keep “parallel” in mind during design and coding.

Continuing the official document analogy: after laying out the guiding ideology, we come to the main task. The main task of multi-core optimization is: better CPU burning. It might be too early to discuss this, but better to prepare.

Look at this picture — a screenshot of our college’s quad-core server at a particular moment. One core is at nearly 100% CPU utilization, the other three are near zero. What software was running? A Windows serial number calculator. I strongly oppose calculating serial numbers, but I’m using it as a negative example. Such “山寨 software” naturally isn’t well-designed. It only burns 1/4 of the CPU. If someone made a parallel version, on a 4-core server the time to compute a valid serial number would be slightly more than 1/4.

<img id=img20070724104219.jpeg alt=cpu src=”http://images.blogcn.com/2007/8/3/9/omale,20070803173115.jpeg” align=baseline border=0>

What does this teach us? We need to make complex algorithms and time-consuming parts of our programs multi-threaded and parallelizable, to better burn the processor. Think about code you’ve written — Java, .NET, or C++ — would it have this problem on a multi-core machine?

Back to the question — why “preparation”? Look at the lower left of the picture: thread count: 762. Even if my program doesn’t parallel-optimize and just burns one core, the other three aren’t idle — there are 761 other threads in the system using them. In current OSes, parallelism is achieved through multi-threading. So the relationship between thread count and core count is interesting.

Imagine a scenario: if the core-count version of Moore’s Law holds for n years, and Intel releases a 4096-core processor, you buy it excitedly, boot up Windows XXX (who knows what it’ll be called then). If the system still has only 762 threads, even if each thread hogs a core, you’d have over 3000 idle cores. Not efficient.

So with dual-core and quad-core today, not parallel-optimizing your code might not matter — other programs create threads too. But when the number of cores exceeds the number of threads, not optimizing isn’t just an efficiency problem — it’s a waste of Earth’s resources. Remember: the main task of multi-core optimization: multi-threading, parallel, “burn cores.”

Having talked about emancipating the mind, maybe we should “take bigger steps.” Reflect on whether “using multi-threading for parallel optimization” itself is the right approach.

Anyone with multi-threading experience knows the relationship between thread count and program complexity isn’t linear. If your program has 20+ threads sharing data, waiting, synchronizing, locking — if you don’t pass out, you must be drinking brain tonic. Scale that up: running a complex algorithm on 4096 cores, using current methods you’d need at least 4096 threads. Can you imagine calling CreateThread 4096 times in your program and managing all those threads?

Maybe we should ask: “Is multi-threading really the right approach to parallel optimization?”

I think using threads to manage parallelism is primitive. Like assembly before high-level languages — when high-level languages appeared, programmers no longer needed to memorize complex instructions or manage processor details. Software could scale. For multi-core, the real breakthrough will be a revolution in programming languages or compilers. When some language, compiler, or OS mechanism emerges that lets programmers focus on business logic while system software handles the low-level core-burning — that’s when the multi-core era truly arrives. In a word: we need implicit parallelism, not explicit management of 4000+ cores.

OpenMP gives us a glimpse. Adding #pragma omp for xxxxx above a for loop automatically parallelizes it. For programmers used to Windows Threads or pthreads, this provides a different way of thinking — and that way might be the future of parallel computing and multi-core.

Conclusion: My dual-core laptop running Windows Vista is great — running multiple Ajax pages without slowdown. I hope next semester I can add some multi-core content to a few college courses. Stay tuned.

This post is licensed under CC BY 4.0 by the author.