irhum.github.io - Irhum’s Notes

Thoughts on Chinchilla

LLM

Scaling

How to (and not to) interpret the scaling laws

Oct 11, 2023

22 min

LoRA and Weight Decay

LLM

LoRA

LoRA doesn’t approximate a solution to full-finetuning; it solves a different (albeit similar) optimization problem

Sep 27, 2023

15 min

Neural Language Models

LLM

JAX

We look at language models parametrized by neural networks, and how they’re capable of near transfer, generalizing to sequences similar to (but not the exact same) as those in their training sets.

Nov 14, 2022

42 min

Tensor Parallelism with `jax.pjit`

JAX

Flax

LLM

Parallelism

With proper sharding, LLMs can scale far beyond the memory capacity of a single GPU/TPU. We explore the math underpinning this from the ground up, and then implement a fully working implementation with JAX/Flax.

Oct 10, 2022

41 min

Visual Notes on Spherical Harmonics

geometry

equivalence

Spherical Harmonics are a core building block of Equivariant Neural Networks. This post breaks them down by analyzing them as 3D extensions of the Fourier Series.

Oct 10, 2021

17 min

Resilience in Complex Systems

systems

Everyday human reasoning breaks down as the scale of time and space at play increases. Complex systems thinking gives us a new set of tools to better understand the chains of consequences involved, and make better decisions.

Jul 21, 2021

25 min

CUDA programming with Julia

Julia

CUDA

CUDA has a hierarchical programming model, requiring thought at the level of Grids, Blocks and Threads. We explore this directly, using our understanding to write a simple GPU accelerated addition kernel from scratch.

May 7, 2021

23 min

Categories