Printer friendly version
The Formic board from Crete, for building FPGA prototypes of Manycore Processors
03 August 2011
Foundation for Research and Technology - Hellas (FORTH), Institute of Computer Science
A new computer hardware prototyping board, designed in Crete, has been successfully tested. Sixty four copies of it are currently being built, to be subsequently interconnected to each other, in order to make a prototype of a future architecture --a half-a-thousand-processor parallel "manycore" computer. The board, called Formic, is of general interest: it is a low-cost prototyping platform that features eight (8) high-speed communication links --quite more than what one finds on other similar-cost boards.
Modern computer architecture prepares the Manycore Chips of the future, so that hundreds of processors can harmoniously cooperate at high speed, minimizing the latency and cost of their communication and synchronization. This is necessary for harnessing the performance potential of future computer chips, which will only be feasible through parallel processing with minimized overheads. To be able to develop such complicated future computer architectures, prototyping is essential.
Field-programmable gate arrays (FPGA's) have long been used to prototype new and exciting architectures. However, commercially available FPGA boards provide very limited communication connectivity --just a few high-speed serial links, at the price level around 1 K Euro; to get a few tens of links, one has to pay several thousand Euros and use bulky boards and bulky coaxial connectors.
To address this problem, in Crete, Greece, we designed our own low cost, high connectivity FPGA board. Here, at the Foundation for Research & Technology - Hellas (FORTH), Institute of Computer Science (ICS), in Heraklion, we call this board "Formic". Alpha-stage testing, performed in May, showed that the Formic design is fully functional and correct in its first version; we are now building 64 copies of it.
As shown in the photograph, Formic is a small board --just 10 cm on each side. It has eight (8) high-speed serial links available for external connections; each of them delivers 2.5 Gbits/s per direction, through convenient and inexpensive SATA connectors and cables. At the center of the board, under a passive cooler, is a large but low-cost Xilinx Spartan-6 LX150T FPGA. Around it, there is one DRAM chip (128 MBytes, DDR2, 400 MHz), and three (3) SRAM chips (1 MByte, ZBT, 167 MHz each). The board also contains power converters/regulators, crystal oscillators, a Xilinx Flash memory for FPGA configuration, a JTAG chain, and an RS-232 port for debugging. At peak memory and serial link activity, the board consumes 8 Watts. The PCB has 10 layers of half-ounce copper, separated by FR-4; the smallest holes are 0.3mm in diameter, and the smallest tracks are 5 mils (0.12mm); it was manufactured by Pan Technical, and assembled by Prisma --two Greek companies. The cost is below one thousand Euro per board.
We configure the FPGA on each board to contain eight (8) microBlaze processors, with their private L1 caches; also, the tags of the private L2 caches are in the FPGA, while their data are in the SRAM chips. The FPGA also contains per-cache DMA/prefetch engines, the 5-channel DRAM controller, and the 8 link interfaces, all connected via a 22-port crossbar. The hardware provides no coherence among the caches, but the runtime software will, at task boundaries.
Our next step, in the Computer Architecture and VLSI Systems (CARV) Laboratory of FORTH-ICS, is now to have 64 Formic boards built for us, to interconnect them in a 3D mesh using their serial links, and to also connect them to the eight A9 cortex processors in two ARM chips. The resulting heterogeneous manycore prototype will contain 520 processors: 512 "worker" cores, which will be executing application tasks, and 8 "control & scheduler" cores, executing the runtime system.
The first use of this 520-processor prototype will be to execute applications written in the OMP-SS programming model, where the programmer identifies tasks and all their I/O arguments, while the runtime software undertakes their parallelization. This work is performed in the context of the ENCORE project (http://www.encore-project.eu), funded by the European Union (FP7 STREP). In addition to this funding, ARM has provided the A9 cortex development systems, and Xilinx has donnated the 64 FPGA chips. For more information about our research in Scalable Multicore Systems, please visit: