Mask R-CNN Architecture

Mask R-CNN: Extending Object Detection to Instance Segmentation

Mask R-CNN elegantly extends Faster R-CNN by adding a mask prediction branch, achieving state-of-the-art instance segmentation through simple yet effective architectural choices.

October 2025 · Saeed Mehrang
Diagram showing efficient transformer architectures

Taming the Transformer: How Perceiver IO and PaCa-ViT Conquer Quadratic Complexity

A deep dive into two novel architectures, Perceiver IO and PaCa-ViT, that break the O(N^2) barrier in Transformers, enabling them to process massive inputs efficiently.

October 2025 · Saeed Mehrang
ViTDet

ViTDet: Plain Vision Transformer Backbones for Object Detection

ViTDet demonstrates that plain, non-hierarchical Vision Transformers can compete with hierarchical backbones for object detection through simple adaptations.

October 2025 · Saeed Mehrang
ViT

The Image is a Sequence: Dissecting the Vision Transformer (ViT)

An in-depth look at ‘An Image is Worth 16x16 Words,’ the paper that introduced the pure Vision Transformer, its architecture, novelty, limitations, and how modern models like Swin Transformer evolved from it.

October 2025 · Saeed Mehrang
Segment Anything Model

The Segment Anything Model Version 1 Overview (Part 1/3)

Meta’s Segment Anything Model (SAM 1) delivers a wide variety of predictsion, detections, and segmentations with a remarkable accuracy. Part 1 from 3.

October 2025 · Saeed Mehrang
Segment Anything Model

The Segment Anything Model Version 1 Overview (Part 2/3)

Meta’s Segment Anything Model (SAM 1) delivers a wide variety of predictsion, detections, and segmentations with a remarkable accuracy. Part 2 from 3.

October 2025 · Saeed Mehrang
Camouflaged Objects

State-of-the-Art Camouflaged Object Detection: A Brief Analysis of 2024-2025 Methods

A brief technical comparison of the five most advanced camouflaged object detection methods in 2025, including ZoomNeXt, HGINet, RAG-SEG, MoQT, and SPEGNet, with detailed analysis of their architectures.

October 2025 · Saeed Mehrang

Swin Transformer: Shifting Windows to Build Hierarchical Vision Models

This post provides a minimal PyTorch implementation of Swin Transformer for a simple image classification.

October 2025 · Saeed Mehrang

Multi-Task Learning for Automated Bone Age Assessment: Heatmap-Based Vertebral Landmark Detection Using U-Net with Pretrained EfficientNet-B2 and Auxiliary Self-Supervised Tasks

This post details the machine learning strategy—including multi-task learning, transfer learning, and heatmap-based landmark detection—used to build an AI system that automates bone age assessment from X-ray images, achieving high accuracy with limited medical data.

October 2025 · Saeed Mehrang