Paper that trained a model with a GPT-2-like architecture on a synthetic math dataset: "We use a synthetic setting to demonstrate that language models can learn to solve grade-school math problems through true generalization, rather than relying on data contamination or template memorization."