Pokémon Object Detection with Mask R-CNN

Python, TensorFlow, Mask R-CNN, OpenCV Computer Vision 2023
Pokémon Object Detection with Mask R-CNN

Project Overview

This personal project, developed during my time at the Université Paris-Est Créteil, explores the application of Mask R-CNN for custom object detection. I implemented transfer learning techniques with a pre-trained Mask R-CNN model (originally trained on the COCO dataset) to detect and segment custom objects - specifically Pikachu from Pokémon - in images. This work demonstrates how deep learning can be adapted to specialized detection tasks without requiring extensive training data.

Implementation Details

I utilized a pre-trained Mask R-CNN architecture as the foundation and fine-tuned it to recognize Pikachu from various angles and contexts. The project involved:

  • Adapting the Mask R-CNN architecture originally pre-trained on COCO dataset
  • Creating a custom dataset with labeled Pikachu images
  • Implementing transfer learning to recognize this character not present in the original dataset
  • Optimizing detection parameters for higher accuracy

[Mask R-CNN detection result showing bounding boxes and segmentation] Mask R-CNN Detection Bounding boxes Mask R-CNN Detection Segmentation

Parameter Optimization

A significant part of this project involved experimenting with various parameters to improve detection performance:

1. IoU Thresholds for Region Proposal Network (RPN)

  • Adjusted upper threshold (positive samples) and lower threshold (negative samples)
  • Found optimal performance with lower bound at 0.3 and upper bound at 0.7
  • Values between thresholds were ignored during training to improve robustness

2. ROI Box Head Architecture

  • Experimented with varying numbers of convolution filters (optimal: 4 filters)
  • Tested different configurations of fully connected layers (settled on 2 FC layers)
  • Balanced complexity against potential overfitting

3. Non-Maximum Suppression (NMS) Optimization

  • Fine-tuned pre-NMS and post-NMS proposal numbers
  • Default values: 4000 pre-NMS proposals, 1000 post-NMS proposals during training
  • NMS suppresses overlapping bounding boxes to retain only the highest confidence detections

Technical Challenges

The project presented several interesting challenges:

  • Annotation Efficiency: Creating efficient workflows for annotating custom objects
  • Transfer Learning Optimization: Finding the right balance between frozen and trainable layers
  • Hyperparameter Tuning: Systematic experimentation with detection parameters
  • Small Dataset Handling: Implementing data augmentation to maximize learning from limited examples

Results and Insights

The fine-tuned model successfully detected Pikachu in various contexts with high precision. Key findings include:

  • Transfer learning significantly reduced the amount of training data needed
  • Parameter tuning had substantial impact on detection quality
  • NMS threshold values greatly influenced the final detection results
  • The model generalized well to different poses and backgrounds of the character

Technologies Used

  • Python for implementation
  • TensorFlow and Keras for deep learning framework
  • Mask R-CNN architecture for instance segmentation
  • OpenCV for image processing and visualization
  • Data augmentation techniques for training set expansion

Future Directions

  • Expand detection to multiple Pokémon characters
  • Implement real-time detection capabilities
  • Explore instance segmentation for more detailed character analysis
  • Compare performance against newer architectures like YOLO and EfficientDet

Phone

Address

Leoben, Austria