Artificial Intelligence / Machine Learning
Artificial Intelligence / Machine Learning
The stunning success of modern AI systems is among the most important discoveries and technological developments of humankind. I am interested in the science of AI -- how do these systems work? Specifically, how do gradient-based optimization, the structure of data, and neural network architectures conspire to enable intelligence, reasoning, and creativity? What are their fundamental capabilities, limitations, and resource requirements?
I am also increasingly interested in to what extent AI can be capable of automated scientific discovery, and how that can be achieved.
Recent publications in these areas:
M. Barkeshli, A. Alfarano, A. Gromov, On the origin of neural scaling laws: from random graphs to natural language, arXiv:2601.10684
T. Tao and M. Barkeshli, Learning Pseudorandom Numbers with Transformers: Permuted Congruential Generators, Curricula, and Interpretability, arXiv:2510.26792 , accepted to ICLR 2026
D.S. Kalra, J. Kirchenbauer, M. Barkeshli, T. Goldstein, When Can You Get Away with Low Memory Adam? arXiv:2503.01843
T. Tao, D. Doshi, D. S. Kalra, T. He, M. Barkeshli, (How) Can Transformers Predict Pseudo-Random Numbers? ICML 2025
D.S. Kalra and M. Barkeshli, Why Warmup the Learning Rate? Underlying Mechanisms and Improvements, NeurIPS 2024
D.S. Kalra, T. He, M. Barkeshli, Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos, ICLR 2025
D. S. Kalra, M. Barkeshli, Phase diagram of early training dynamics in deep neural networks: effect of the learning rate, depth, and width, NeurIPS 2023