Design and Evaluation of a Distribute Memory based Array Accelerator

関 賀 (1151211)

Facing the challenge that the power has become a first-class constraint in designing microprocessors [20], a Linear Array Pipeline Processor (LAPP) has been previously designed and implemented to boost performance under a given power budget. Specifically, LAPP extends the execution and memory access stages of a VLIW pipeline processor into a functional unit (FU) array, which can be used to map the data-flow-graph (DFG) of a loop kernel without dependence between iterations. After the mapping, the FU array can execute loop iterations simultaneously, resulting in an extreme speed-up of the loop executions. A previous study indicates that LAPP has 9x power-efficiency as compared to a many-core architecture with the same chip area. However, LAPP also has introduced several limitations by its VLIW pipeline architecture, the unified L1 cache, and the compatibility to normal VLIW instruction set architecture (ISA). In this master thesis, a new accelerator, named Energy-Aware Multi-mode Accelerator (EMAX) is introduced to address the above problems in LAPP. Specifically, EMAX uses distributed memory architecture to help achieve several flexible memory access patterns. This can help to achieve optimal mapping functionality. The synthesis results show that EMAX achieve about 43\% smaller area than LAPP when mapping a same set of programs. In addition, the building block in EMAX has been well designed to reduce the bus amount and the design complexity as well. A rough estimation indicates that EMAX achieves a 1/7 level in the wire amount, as compared to LAPP.