Swayam’s Scripts

Swayam’s Scripts

A collection of my original writings

Categories

All (8)

Book (2)

C++ (1)

Compilers (1)

Information-Theory (1)

LLM (1)

LLVM (1)

ML (2)

ML-Theory (1)

NLP (1)

NumPy (1)

Open-Source (1)

Theory (1)

Transformers (2)

[WIP] A Hitchhiker’s Guide to LLVM

Just Another Day in the Life of a SSA Variable

Compilers

LLVM

This guide is written during the release of LLVM version 21.0.0

Gradient Flow and Variance Propogation Analysis of Dynamic Tanh Layer

A Mathematical Investigation into DyT’s Potential to Mitigate the Curse of Depth in Pre-LN Transformers

ML-Theory

Transformers

The “Curse of Depth” in Pre-LN transformers, identified by Sun et al. (2025), reveals that deeper layers often function as near-identity mappings, contributing minimally to…

[WIP] The Transparent Algorithm

A bridge between formal mathematical proofs and intuitive understanding, guiding readers through the theoretical foundations of machine learning with clarity and precision.

ML

Theory

Book

Concurrent C++ A Practical Guide

A concise and practical guide to mastering C++ concurrency, covering threads, synchronization, parallelism, and debugging—all in one place.

C++

Book

Information Theory in Machine Learning: A Fun Approach

Teaching Neural Networks to Handle Their Trust Issues

ML

Information-Theory

Have you ever noticed how the best stories start with uncertainty? Like that moment when you’re watching a thriller and think you know who the culprit is, but you’re not…

Understanding Perplexity

A New Perspective on Model Uncertainty

LLM

Recently, I was reading the Chapter 5 (Pretraining) of the book “Build a Large Language Model (From Scratch)” by Sebastian Raschka. I stumbled upon an intriguing…

Numpy QuadDType: Quadruple Precision for Everyone

Quad Precision for All: Simplifying High-Accuracy Computing with numpy_quaddtype

NumPy

Open-Source

Self-Attention Mimicking Gradient Descent

NLP

Transformers

This section of paper Uncovering mesa-optimization algorithms in Transformers presents a theoretical construction where a linear self-attention layer in a Transformer…

No matching items

© 2025 Swayam Singh. All rights reserved.