Seismic imaging is a geophysical technique assisting in the understanding of subsurface structure on a regional and global scale. With the development of computer technology, computationally intensive seismic algorithms have begun to gain attention in both academia and industry. These algorithms typically produce high-quality subsurface images or models, but require intensive computations for solving wave equations. Achieving high-fidelity wave simulations is challenging: first, numerical wave solutions may suffer from dispersion and dissipation errors in long-distance propagations; second, the efficiency of wave simulators is crucial for many seismic applications. High-order methods have advantages of decreasing numerical errors efficiently and hence are ideal for wave modelings in seismic problems. Various high order wave solvers have been studied for seismic imaging. One of the most popular solvers is the finite difference time domain (FDTD) methods. The strengths of finite difference methods are the computational efficiency and ease of implementation, but the drawback of FDTD is the lack of geometric flexibility. It has been shown that standard finite difference methods suffer from first order numerical errors at sharp media interfaces. In contrast to finite difference methods, discontinuous Galerkin (DG) methods, a class of high-order numerical methods built on unstructured meshes, enjoy geometric flexibility and smaller interface errors. Additionally, DG methods are highly parallelizable and have explicit semi-discrete form, which makes DG suitable for large-scale wave simulations. In this dissertation, the discontinuous Galerkin methods on hybrid meshes are developed and applied to two seismic algorithms---reverse time migration (RTM) and full waveform inversion (FWI). This thesis describes in depth the steps taken to develop a forward DG solver for the framework that efficiently exploits the element specific structure of hexahedral, tetrahedral, prismatic and pyramidal elements. In particular, we describe how to exploit the tensor-product property of hexahedral elements, and propose the use of hex-dominant meshes to speed up the computation. The computational efficiency is further realized through a combination of graphics processing unit (GPU) acceleration and multi-rate time stepping. As DG methods are highly parallelizable, we build the DG solver on multiple GPUs with element-specific kernels. Implementation details of memory loading, workload assignment and latency hiding are discussed in the thesis. In addition, we employ a multi-rate time stepping scheme which allows different elements to take different time steps. This thesis applies DG schemes to RTM and FWI to highlight the strengths of the DG methods. For DG-RTM, we adopt the boundary value saving strategy to avoid data movement on GPUs and utilize the memory load in the temporal updating procedure to produce images of higher qualities without a significant extra cost. For DG-FWI, a derivation of the DG-specific adjoint-state method is presented for the fully discretized DG system. Finally, sharp media interfaces are inverted by specifying perturbations of element faces, edges and vertices.