The stochastic gradient (SG) method can minimize an objective function composed of a large number of differentiable functions or solve a stochastic optimization problem, very quickly to a moderate accuracy. The block coordinate descent/update (BCD) method, on the other hand, handles problems with multiple blocks of variables by updating them one at a time; when the blocks of variables are (much) easier to update individually than together, BCD has a (much) lower per-iteration cost. This paper introduces a method that combines the great features of SG and BCD for problems with many components in the objective and with multiple (blocks of) variables. Specifocally, a block stochastic gradient (BSG) method is proposed for both convex and nonconvex programs. At each iteration, BSG approximates the gradient of the differentiable part of the objective by randomly sampling a small set of data or sampling a few functions in the objective, and then, using the approximate gradient, it updates all the blocks of variables in either a deterministic or a randomly shuffed order. Its convergence for convex and nonconvex cases is established in different senses. In the convex case, the proposed method has the same order of convergence rate as the SG method. In the nonconvex case, its convergence is established in terms of the expected violation of a first-order optimality condition. The proposed method was numerically tested on problems including stochastic least squares and logistic regression, which are convex, as well as low-rank tensor recovery and bilinear logistic regression, which are nonconvex. On the convex problems, it performed as well as, and often significantly better, than the SG method. On the nonconvex problems, the proposed method BSG significantly outperformed the deterministic BCD method because the latter tends to slow down or stagnate near bad local minimizers. Overall, BSG inherits the benefits of both stochastic gradient approximation and block-coordinate updates.