Swayam’s Scripts
Swayam’s Scripts
A collection of my original writings
Categories
All
(7)
Book
(2)
C++
(1)
Information-Theory
(1)
LLM
(1)
ML
(2)
ML-Theory
(1)
NLP
(1)
NumPy
(1)
Open-Source
(1)
Theory
(1)
Transformers
(2)
Gradient Flow and Variance Propogation Analysis of Dynamic Tanh Layer
A Mathematical Investigation into DyT’s Potential to Mitigate the Curse of Depth in Pre-LN Transformers
ML-Theory
Transformers
The “Curse of Depth” in Pre-LN transformers, identified by Sun et al. (2025), reveals that deeper layers often function as near-identity mappings, contributing minimally to…
Mar 21, 2025
Swayam Singh
9 min
1,625 words
[WIP] The Transparent Algorithm
A bridge between formal mathematical proofs and intuitive understanding, guiding readers through the theoretical foundations of machine learning with clarity and precision.
ML
Theory
Book
Mar 15, 2025
Swayam Singh
1 min
35 words
Concurrent C++ A Practical Guide
A concise and practical guide to mastering C++ concurrency, covering threads, synchronization, parallelism, and debugging—all in one place.
C++
Book
Feb 9, 2025
Swayam Singh
1 min
5 words
Information Theory in Machine Learning: A Fun Approach
Teaching Neural Networks to Handle Their Trust Issues
ML
Information-Theory
Have you ever noticed how the best stories start with uncertainty? Like that moment when you’re watching a thriller and
think
you know who the culprit is, but you’re not…
Jan 9, 2025
Swayam Singh
14 min
2,650 words
Understanding Perplexity
A New Perspective on Model Uncertainty
LLM
Recently, I was reading the Chapter 5 (Pretraining) of the book
“Build a Large Language Model (From Scratch)”
by Sebastian Raschka. I stumbled upon an intriguing…
Oct 10, 2024
Swayam Singh
3 min
595 words
Numpy QuadDType: Quadruple Precision for Everyone
Quad Precision for All: Simplifying High-Accuracy Computing with numpy_quaddtype
NumPy
Open-Source
Sep 30, 2024
Swayam Singh
1 min
5 words
Self-Attention Mimicking Gradient Descent
NLP
Transformers
This section of paper
Uncovering mesa-optimization algorithms in Transformers
presents a theoretical construction where a linear self-attention layer in a Transformer…
Oct 14, 2023
Swayam Singh
9 min
1,671 words
No matching items