Avi Semler | Technical AI governance PhD student

Hi, I'm Avi! Currently, I'm a DPhil student in the AIMS CDT, with funding to work on technical AI governance by the Oxford Martin AI Governance Initiative. My interests include AI safety, dataset verification, and understanding why deep learning works.

My motivation for doing research in these areas is to contribute to bringing about outcomes where AI benefits all. Some subquestions that interest me are:

How could we have justified confidence that AI systems will behave as intended if we don't understand how they function internally?
- Can interpretability techniques be useful in practice?
- What deployment safeguards are effective?
Under what conditions can we rely on chain-of-thought to monitor models?
How can we expand affordances for AI governance?
- How can the dataset used in a training run be verified?
What is the correct characterisation of the inductive bias of deep learning setups?
- Does the optimiser matter, or is the implicit bias equivalent to uniform sampling from low-training-loss parameters?
- How can knowledge of implicit bias and training dynamics be leveraged for safety?

In the past, I studied maths and computer science at Warwick, helped run Warwick's effective altruism society, organised AI safety reading groups, and climbed a moutain or two.

I occasionally write about topics in AI and linguistics that interest me.

You can contact me by email!

Publications

Implicit Regularization of AdaDelta
Matthias Englert, Ranko Lazic, Avi Semler (equal contribution). Transactions on Machine Learning Research, 2024.
We consider the AdaDelta adaptive optimization algorithm on locally Lipschitz, positively homogeneous, and o-minimally definable neural networks, with either the exponential or the logistic loss. We consider the AdaDelta adaptive optimization algorithm on locally Lipschitz, positively homogeneous, and o-minimally definable neural networks, with either the exponential or the logistic loss. We prove that, after achieving perfect training accuracy, the resulting adaptive gradient flows converge in direction to a Karush-Kuhn-Tucker point of the margin maximization problem, i.e. perform the same implicit regularization as the plain gradient flows. We also prove that the loss decreases to zero and the Euclidean norm of the parameters increases to infinity at the same rates as for the plain gradient flows. Moreover, we consider generalizations of AdaDelta where the exponential decay coefficients may vary with time and the numerical stability terms may be different across the parameters, and we obtain the same results provided the former do not approach 1 too quickly and the latter have isotropic quotients. Finally, we corroborate our theoretical results by numerical experiments on convolutional networks with MNIST and CIFAR-10 datasets.
Image Restoration for Training Data Reconstructed from Trained Neural Networks
Matthias Englert, Ranko Lazic, Avi Semler. Withdrawn conference submission.
We investigate techniques to improve the quality of training images reconstructed from model parameters, addressing privacy concerns in deep learning. Haim et al. [NeurIPS 2022] propose a method to reconstruct training data from trained neural networks with impressive results. While their reconstructed images resemble the original training images, most of them also contain a considerable amount of noise and artifacts. This is especially true, when the network was trained on more than just a few dozen images. To address this, we view the problem as a specific image restoration task. Since the noise and artifacts are different from other types of noise (Gaussian noise, compression artifacts, blurring, or impulse noise from digital cameras), we create a new dataset specifically for the restoration of images produced by the reconstruction process proposed by Haim et al. We use this dataset consisting of about 60 million noisy reconstructions of CIFAR-10 images to train a diffusion model on the restoration task. Using this method, we obtain reconstructions that are significantly closer to the original training images measured in terms of SSIM and HaarPSI scores.
Evaluating and Interpreting Transformers for Sequence Alignment (MSc dissertation)
Avi Semler. 2025.
This dissertation explores how transformer models perform sequence alignment tasks, with a focus on understanding their internal representations and decision-making processes. This dissertation evaluates and interprets an encoder-decoder transformer model for pairwise DNA sequence alignment, using synthetic data generated from a pair Hidden Markov Model (pair HMM) as ground truth to better understand the black-box nature of deep learning approaches to alignment. We assess the transformer model’s ability to replicate optimal alignments and probability distributions defined by the pair-HMM, employing constrained beam search and comparing output distributions. Mechanistic interpretability techniques, including linear probing and LogitLens, are used to analyse internal computations including inference of evolutionary parameters. Results indicate the transformer effectively approximates pair-HMM likelihoods and distributions on synthetic data, with linear probes confirming its capacity to implicitly learn evolutionary parameters almost as accurately as is theoretically possible. This interpretability analysis offers insight into the model’s computational mechanism, enhancing confidence in the models probability distribution over alignments but casting doubt on the efficacy of beam search.

Projects

Using AI to detect emotions from images of eyes. I evaluated CLIP embeddings and multi-modal language models for this task
Agent-based simulation of congested systems such as transport, originally developed in 2023 as part of the University of Warwick Undergraduate Research Support Scheme and then vectorised with PyTorch
Simulator of the enigma machine, using Python to implement the behaviour of the military 3-wheel enigma machine.

Other things

I love learning languages, especially ones that are related and let you see how they developed from a common ancestor. English is still the only language I'd say I speak. For an introduction to historical linguistics, I'd recommend this article.
My Erdos number is 4.