## Sparse Factor Analysis for Learning and Content Analytics

##### Author

Lan, Shiting

##### Date

2014-04-23##### Advisor

Baraniuk, Richard G.

##### Degree

Master of Science

##### Abstract

We develop a new model and algorithms for machine learning-based learning analytics,
which estimate a learner’s knowledge of the concepts underlying a domain, and
content analytics, which estimate the relationships among a collection of questions
and those concepts. Our model represents the probability that a learner provides
the correct response to a question in terms of three factors: their understanding of
a set of underlying concepts, the concepts involved in each question, and each question’s
intrinsic difficulty. We estimate these factors given the graded responses to
a collection of questions. The underlying estimation problem is ill-posed in general,
especially when only a subset of the questions are answered. The key observation that
enables a well-posed solution is the fact that typical educational domains of interest
involve only a small number of key concepts. Leveraging this observation, we develop
a bi-convex maximum-likelihood solution to the resulting SPARse Factor Analysis
(SPARFA) problem. We also incorporate instructor-defined tags on questions and
question text to facilitate the interpretability of the estimated factors. Experiments
with synthetic and real-world data demonstrate the efficacy of our approach.

##### Keyword

Factor analysis; Sparse probit regression; Sparse logistic regression; Bayesian
