100-core general purpose processor

GoFlyKiteNow · Nov 2, 2009

Tilera unveils Tile GX100, the 100-core general purpose processor

If you thought Intel's plans to embed eight cores in its high-end processors were a bit too out there, you'll find that the latest processor developed by semiconductor start-up Tilera is even more of an extreme.

Packing 100 1.25GHz to 1.5GHz cores on a single chip, the Gx100 brings parallel processing to the extreme thanks to a new architecture that minimizes the bus bottleneck in today's multi-core processors.

Chip makers are constantly pushing for faster processors, but clock speeds can only be pushed so far. As a result, the semiconductor industry has opted to pack multiple processors on a single chip and dividing the workload in equal parts whenever this is possible — that's the gist of parallel computing.

Graphics processing units are a common example of a "multi-core" chip that can process hundreds of independent data streams in parallel: each stream is processed separately and the result is then output on the screen. Programmers are starting to harness the parallel data processing capabilities of GPUs, and by doing so they can often speed up their data crunching by tens or even hundreds of times.

However GPUs, especially the older models, have limited flexibility and are built for speed, not precision (if a single pixel is a little bit off color, people won't usually notice). Even though GPU makers are developing better architectures for using them as data crunchers, they still remain far from a general-purpose processor, making them able to speed up only certain tasks.

This is when the Tile Gx100 comes in. The Gx100, as Tilera chief technical officer Anant Agarwal explained, is a chip that can run off-the-shelf programs almost unmodified, offering at least four times the compute performance of an Intel Nehalem-Ex while burning a third of the power. In other terms, it makes GPU-like massive parallel processing available on a general-purpose chip.

Because the chip is general-purpose, programmers can recompile and run applications designed for Intel's x86 architecture on Tilera's processor without the need for further adaptation.

The key idea behind the design of the chip was a simplified architecture that eliminates the on-chip bus interconnect through which information must flow in most multi-core CPUs. This was replaced with a mesh network architecture, which involves placing a communication switch on each processor and arranging them in a grid-like fashion.

The bus architecture was a performance bottleneck that reduced the amount of data that could travel from the various cores and forced engineers to limit the number of cores on each chip. But because of this new architecture, Tilera says it can cram in as many as 100 cores on a processor without running into the bus-bandwidth limit.

The 100-core processor, fabricated using 40-nanometer technology, is expected to be available early next year, but won't be supported by Windows 7. For that, consumers will have to wait for Intel's 80-core version, which the IT giant promised to deliver to consumers sometime during the next five years.

TeeKee · Nov 2, 2009

No worries!!

I will slow you down!

fox_hound_33 · Nov 2, 2009

Chips with many-cores are well on their way. But a key issue is how programmers are going to exploit this massive parallelism. Presently most of the parallelization is done by a parallelizing compiler, just like what the article says. But this will never expose the raw parallel power of these chips to the application being run. For that to happen the programs need to be redesigned and written again by explicitly tap into the massive parallelism.

I don't really see many programmers being trained for parallel computing. Some key companies like IBM have actually acknowledged this shortage.Our local unis don't have any parallel programming course that i know of.

uncleyap · Nov 3, 2009

Quite crazy when I read your post. I tot, while inter-processor communication is one bottleneck, the memory-processor connection is yet another. When you have got 100 units of 64bit CPUs on a die, it is very hungry for memory bandwidth. HOW TO FEED THIS MONSTER?

http://www.tilera.com/products/TILE-Gx.php

Then I went to that page, OIC, they are telling users that it isn't really for memory hungry applications, e.g. firewall & VPN, etc, basically a pipeline processor. It isn't built-in with any L1/L2/L3 cache on die. Wow! That won't be used as a general data processing CPU then, otherwise, most of the 100 cpu core would be there sitting idle awaiting for their limited and shared memory bandwidth.

For any computing resources, it is a matter of Balance. If you push up one aspect without proprotionally increasing the others to keep the balance, there isn't much real merits. The number of CPU cores is just one factor, memory bandwidth is another, inter-processor comm is yet another, power consumption & thermal cooling are some additional factors. Only if all these factors come into a healthy balance, then we can get performance and reliability. Just purly pushing up the number of processors alone, is not very helpful.

If you see the layout of the block diagram, memory controllers of 2 banks on each side, through a cascaded array of 10X10 CPU cores, it is an archetecture for pipeline processing type. It must had been purpose bult for data encrption / firewall kind of jobs, not the general PC CPUs for general users. It basically sucks data from one bank of memory and pump out on the otheir bank of memory like a 10 gang pipeline, each line cascaded via 10 CPUs so each CPU can be a single stage of the processing.

If you compared this against Intel i7 or AMD Phenom's block diagrams, they configurations are clearly different. These AMD & Intel got lots of cache on die, north bridge on die etc.

The clear problem with the 100 core layout is that the CPU cores inside the center of array don't have very direct access to memory, except via some other CPUs. So if the task and the program isn't such that CPUs are to each to a balanced amount of task than then past on to another like a pipeline from one end (memory bank) to another, then there are lots of memory access bottlenecks in there. It can be inefficient for some CPUs to be busy-body kept having to pass data back & forth for other CPUs and in criss-crossing directions all over the places. The CPU cores towards center of the array can have no data to process if the other CPUs did not pass data over to them. So on...

In those common Intel AMD CPUs, each CPU core have INDEPENDENT access to the memory controllers, and are not so dependent on other CPUs to get access to RAM, although they can still pass data among themselves.

https://wwwsecure.amd.com/us-en/ass...ram_for_Socket_F1207White_Background_375W.jpg

Watchman · Nov 3, 2009

Don't worry keep the graphics up ! Nothing is slowing me down !

GoFlyKiteNow · Nov 3, 2009

fox_hound_33 said:
Chips with many-cores are well on their way. But a key issue is how programmers are going to exploit this massive parallelism. Presently most of the parallelization is done by a parallelizing compiler, just like what the article says. But this will never expose the raw parallel power of these chips to the application being run. For that to happen the programs need to be redesigned and written again by explicitly tap into the massive parallelism.

I don't really see many programmers being trained for parallel computing. Some key companies like IBM have actually acknowledged this shortage.Our local unis don't have any parallel programming course that i know of.

You are right. There is a acute shortage of skilled personnel
to exploit the power of parallel processing. But then again
these skills evolve as demand for such intense and powerful
applications are called in by the market place.

Computer simulations of global weather patterns, nuclear reactor
test simulations and critical analysis of fast breeder reactors,
astronomy and deep space image processing are all ideal
candidates for massive number crunching parallel processing.

But these applications are hardly the diet for the present IT industry
and consequently such huge processing power potential may remain
underutilized for some time to come.

Char_Azn · Nov 3, 2009

Actually the software developers can't keep up. We've moved on to multi core CPUs for a number of years now but the vast majority of software is only making use of single core processing. It more or less only comes in useful if U are doing a lot of times at the same time. Same goes for 64bit vs 32bit processing. Most software are still stuck on 32bit. Even if U have a 64bit CPU and OS, it doesnt really benefit much if the application U are running is designed for 32bit machines.

100-core general purpose processor

GoFlyKiteNow

Alfrescian

TeeKee

Alfrescian

fox_hound_33

Alfrescian

uncleyap

Alfrescian

Watchman

Alfrescian

GoFlyKiteNow

Alfrescian

Char_Azn

Alfrescian (Inf)

Similar threads