Facing the challenge that the power has
become a first-class constraint in designing microprocessors [20], a Linear
Array Pipeline Processor (LAPP) has been previously designed and implemented to
boost performance under a given power budget. Specifically, LAPP extends the
execution and memory access stages of a VLIW pipeline processor into a
functional unit (FU) array, which can be used to map the data-flow-graph (DFG)
of a loop kernel without dependence between iterations. After the mapping, the
FU array can execute loop iterations simultaneously, resulting in an extreme
speed-up of the loop executions. A previous study indicates that LAPP has 9x
power-efficiency as compared to a many-core
architecture with the same chip area. However, LAPP also has introduced several
limitations by its VLIW pipeline architecture, the unified L1 cache, and the
compatibility to normal VLIW instruction set architecture (ISA). In this master
thesis, a new accelerator, named Energy-Aware Multi-mode Accelerator (EMAX) is
introduced to address the above problems in LAPP. Specifically, EMAX uses
distributed memory architecture to help achieve several flexible memory access
patterns. This can help to achieve optimal mapping functionality. The synthesis
results show that EMAX achieve about 43\% smaller area than LAPP when mapping a
same set of programs. In addition, the building block in EMAX has been well
designed to reduce the bus amount and the design complexity as well. A rough
estimation indicates that EMAX achieves a 1/7 level in the wire amount, as
compared to LAPP.