Reconfigurable heterogeneous DSP/FPGA based embedded architectures for numerically intensive computing workloads
Brogioli, Michael C.
Cavallaro, Joseph R.
Doctor of Philosophy
Telecommunications and multimedia form a vast segment of the embedded systems market. Variations in standards coupled with the desire for software programmability often result in software based implementations executing on DSP cores. With the advent of data intensive media and communications workloads, computational demands of the DSP are ever increasing. Despite increases in clock rates, the computational demands of many wireless and multimedia video kernels far exceeds the available pipeline arithmetic and logic unit (ALU) resources of todays DSP devices. This thesis presents a hardware/software co-design methodology for partitioning real-time embedded multimedia applications between software programmable DSPs and hardware based FPGA coprocessors. Using a strict set of guidelines, input applications are partitioned between software executing on a programmable DSP and hardware based FPGA implementation. This methodology is applied to channel estimation firmware in 3.5G wireless receivers, as well as software based H.263 video decoders. These heterogeneous systems are prototyped using a custom simulation environment created for these studies, which models bit true cycle accurate heterogeneous embedded architectures. By partitioning performance critical kernels from software on the DSP to FPGA based loosely coupled coprocessors, significant performance gains over what is possible with modern DSP architectures are shown. This thesis also investigates the instruction and data level parallelism in modern digital signal processing and multimedia workloads, and presents a retargetable compiler infrastructure for multi-clustered VLIW style digital signal processor architectures. By recompiling existing workloads, the thesis compares the performance of aggressive hardware/software partitioning between modern DSP cores, and loosely coupled FPGA based coprocessors, and the performance of massively multi-clustered VLIW style architectures. The compiler infrastructure allows existing DSP kernels to be retargeted for user defined machine definitions. In doing this, the thesis shows that increased hardware parallelism within the DSP core can yield significant performance gains, as well as the amount of hardware necessary to compete with FPGA based performance. In conclusion, the thesis advocates application specific DSP design with increased hardware parallelism for modern signal processing and multimedia workloads, as well as loosely coupled hardware based coprocessors for truly high performance computing in these domains.