Computer Vision

Learn and apply the state-of-the-art approaches for computer vision.

Miniseries / Collections

Segmentation Models

Understanding the state-of-the-art segmentation techniques and tools in computer vision.

Camoufladge Object Detection

Understanding the state-of-the-art approaches for camoufladge object detection.

Medical Image Landmark Detection

Understanding the state-of-the-art approaches for the detection of landmakrs in medical images.

Standalone Posts

Building a Masked Autoencoder (MAE) from Scratch in PyTorch

Learn how masked autoencoder (MAE) works and implemented in PyTorch

ViTDet: Plain Vision Transformer Backbones for Object Detection

ViTDet demonstrates that plain, non-hierarchical Vision Transformers can compete with hierarchical backbones for object detection through simple adaptations.

The Image is a Sequence: Dissecting the Vision Transformer (ViT)

An in-depth look at ‘An Image is Worth 16x16 Words,’ the paper that introduced the pure Vision Transformer, its architecture, novelty, limitations, and how modern models like Swin Transformer evolved from it.

Swin Transformer: Shifting Windows to Build Hierarchical Vision Models

This post provides a minimal PyTorch implementation of Swin Transformer for a simple image classification.