Hardware design is evolving towards manycore processors that will be used in large clusters to achieve exascale computing, and at the rack level to achieve petascale computing, however, harnessing the full power of the architecture is a challenge that software must tackle to fully realize extreme-scale computing. This challenge is prompting the exploration of new approaches to programming and execution systems. In this thesis, we argue that a two-level execution model is a relevant answer to the problem of extreme-scale computing. We propose an execution model that decomposes the specification of an application into two parts: defining at a high level what the application does, coupled with a low level implementation of how that's done. In our model, applications are designed as a set of sequential computational units whose connections are defined in a high-level, easy-to-write dataflow graph. The dataflow graph we propose --- DFGR --- specifies what the computation is rather than the implementation details, and is also designed for ease of programmability. Second, we propose the use of work-stealing runtimes for coarse-grained dataflow parallelism and doacross runtimes for fine-grained dataflow parallelism, both of which can be expressed uniformly in DFGR. We justify this approach by demonstrating the performance of DFGR on two different runtime systems: Habanero C with coarse-grained task synchronization, and the new proposed OpenMP 4.2 specification with doacross synchronization. Finally, we introduce a novel primitive for combining SPMD parallelism and task parallelism: the elastic task. Elastic tasks allow for internal SPMD parallelism within a computational unit. We provide a new scheduling algorithm for elastic tasks, prove strong theoretical guarantees in both work-sharing and work-stealing environments for this scheduling algorithm, and demonstrate that it also offers performance benefits in practice due to locality and runtime adaptability.