Deep Learning

Building a Masked Autoencoder (MAE) from Scratch in PyTorch

Learn how masked autoencoder (MAE) works and implemented in PyTorch

Computational Drug Discovery Part 3 (Subpart 2/3): AlphaFold's Evoformer Block Disassembled, A Matrix-Level Deep Dive into AlphaFold2's Core

A detailed mathematical breakdown of AlphaFold2’s Evoformer block, explaining each operation with concrete matrix algebra and dimensions - Subpart 2/3

Computational Drug Discovery Part 3 (Subpart 3/3): From Abstract Features to Atoms, AlphaFold2's Structure Module and Training

AlphaFold2’s Structure Module uses Invariant Point Attention to convert abstract Evoformer predictions into 3D atomic coordinates through iterative refinement, while multi-objective loss functions guide training - Subpart 3/3

Mask R-CNN: Extending Object Detection to Instance Segmentation

Mask R-CNN elegantly extends Faster R-CNN by adding a mask prediction branch, achieving state-of-the-art instance segmentation through simple yet effective architectural choices.

ViTDet: Plain Vision Transformer Backbones for Object Detection

ViTDet demonstrates that plain, non-hierarchical Vision Transformers can compete with hierarchical backbones for object detection through simple adaptations.

The Image is a Sequence: Dissecting the Vision Transformer (ViT)

An in-depth look at ‘An Image is Worth 16x16 Words,’ the paper that introduced the pure Vision Transformer, its architecture, novelty, limitations, and how modern models like Swin Transformer evolved from it.

The Segment Anything Model Version 1 Overview (Part 1/3)

Meta’s Segment Anything Model (SAM 1) delivers a wide variety of predictsion, detections, and segmentations with a remarkable accuracy. Part 1 from 3.

The Segment Anything Model Version 1 Overview (Part 2/3)

Meta’s Segment Anything Model (SAM 1) delivers a wide variety of predictsion, detections, and segmentations with a remarkable accuracy. Part 2 from 3.

State-of-the-Art Camouflaged Object Detection: A Brief Analysis of 2024-2025 Methods

A brief technical comparison of the five most advanced camouflaged object detection methods in 2025, including ZoomNeXt, HGINet, RAG-SEG, MoQT, and SPEGNet, with detailed analysis of their architectures.

Swin Transformer: Shifting Windows to Build Hierarchical Vision Models

This post provides a minimal PyTorch implementation of Swin Transformer for a simple image classification.

Computational Drug Discovery Part 5 (Subpart 2/3): Generative Models for De Novo Drug Design - Diffusion Models

From prediction to creation (Subpart 2/3): Understanding diffusion models for molecular generation, with detailed implementation of torsional diffusion for 3D conformation generation.

Computational Drug Discovery Part 5 (Part 3/3): Generative Models for De Novo Drug Design - Transformers

From prediction to creation (Part 3/3): : how AI generates novel drug molecules optimized for multiple objectives using autoregressive transformer architectures.

Computational Drug Discovery Part 5 (Subpart 1/3): Generative Models for De Novo Drug Design - VAE and GAN

From prediction to creation (Subpart 1/3): A quick intro to how AI generates novel drug molecules optimized for multiple objectives using VAE and GAN model architectures.

Computational Drug Discovery Part 4: Graph Neural Networks for Molecular Property Prediction

A technical deep-dive into Graph Neural Networks (GNNs) for predicting molecular properties. Learn how to construct molecular graphs, implement message passing architectures, and apply attention mechanisms to drug discovery tasks.

Computational Drug Discovery Part 3 (Subpart 1/3): AlphaFold Overview

How DeepMind’s AlphaFold2 solved the 50-year grand challenge in biology – the protein folding problem – using transformers, evolutionary information, and geometric reasoning and what it means for drug discovery - Subpart 1/3

The What,When, and Why of Mixture-of-Experts (MoE)

Unpacking Mixture-of-Experts (MoE) in LLMs: A Foundational Dive

This blog post demystifies Mixture-of-Experts (MoE) layers, a key innovation for scaling Large Language Models efficiently. We’ll trace its origins, delve into the mathematical underpinnings, and build a foundational MoE block in PyTorch, mirroring the architecture from its initial conception.

Multi-Task Learning for Automated Bone Age Assessment: Heatmap-Based Vertebral Landmark Detection Using U-Net with Pretrained EfficientNet-B2 and Auxiliary Self-Supervised Tasks

This post details the machine learning strategy—including multi-task learning, transfer learning, and heatmap-based landmark detection—used to build an AI system that automates bone age assessment from X-ray images, achieving high accuracy with limited medical data.

KV-Caching in LLMs: The Optimization That Makes Inference Practical

Learn how KV-caching makes ChatGPT respond in seconds instead of minutes. This comprehensive guide explains the quadratic complexity problem in transformers, how caching Keys and Values solves it with 10-100x speedups, and the memory trade-offs - complete with full PyTorch implementations, benchmarks, and interactive visualizations.

Comparison of activation functions showing SwiGLU's smooth gradients and gating mechanism

SwiGLU: The Activation Function Powering Modern LLMs

Discover why SwiGLU has replaced ReLU and GELU in modern transformers. This post explains the mathematical foundations, the evolution from sigmoid gates to Swish gates, and why this innovation delivers 5-8% performance gains - complete with Python implementations and interactive visualizations.

End-to-end sensor fusion and classification of atrial fibrillation using deep neural networks and smartphone mechanocardiography

This paper presents a deep learning framework for detecting atrial fibrillation (AFib) by analyzing the heart’s mechanical functioning using smartphone mechanocardiography. The model achieves high accuracy in classifying sinus rhythm, AFib, and Noise categories.