Repository Structure and Components

BaseballCV is organized into a modular structure that promotes maintainability and ease of use. Understanding this organization helps developers effectively utilize and extend the framework’s capabilities.

Core Directory Structure

BaseballCV/
├── baseballcv/
│   ├── model/
│   │   ├── od/
│   │   │   ├── detr/
│   │   │   ├── yolo/
│   │   │   └── rfdetr/
│   │   ├── vlm/
│   │   │   ├── florence2/
│   │   │   └── paligemma2/
│   │   └── utils/
│   │       ├── model_function_utils.py
│   │       └── model_visualization_tools.py
│   ├── datasets/
│   │   ├── formats/
│   │   │   ├── datasets_coco_detection.py
│   │   │   └── datasets_jsonl_detection.py
│   │   └── processing/
│   │       └── datasets_processor.py
│   ├── functions/
│   │   ├── dataset_tools.py
│   │   ├── load_tools.py
│   │   ├── savant_scraper.py
│   │   ├── baseball_tools.py
│   │   └── utils/
│   │       ├── baseball_utils/
│   │       │   ├── distance_to_zone.py
│   │       │   └── glove_tracker.py
│   │       └── savant_utils/
│   │           ├── crawler.py
│   │           └── gameday.py
│   └── utilities/
│       ├── logger/
│       │   ├── baseballcv_logger.py
│       │   └── baseballcv_prog_bar.py
│       └── dependencies/
│           └── git_dependency_installer.py
├── datasets/
│   ├── yolo/
│   ├── COCO/
│   └── raw_photos/
├── models/
│   ├── od/
│   │   ├── YOLO/
│   │   ├── DETR/
│   │   └── RFDETR/
│   └── vlm/
│       ├── Florence2/
│       └── PaliGemma2/
├── notebooks/
├── tests/
├── docs/
├── README.md
└── LICENSE

Key Components

The repository is built around several core components that work together to provide comprehensive baseball analysis capabilities. Let’s explore each major component in detail.

Models (model/)

BaseballCV implements several state-of-the-art computer vision models, organized by type:

Object Detection (od/)

DETR implementation for precise player, ball and equipment detection
YOLOv9 for real-time object detection with improved performance
RFDETR (Receptive Field DETR) for enhanced detection capabilities
All optimized for baseball-specific scenarios
Support for both training and inference pipelines

Vision Language Models (vlm/)

Florence2 for multi-modal understanding and queries
PaliGemma2 for enhanced contextual analysis
Support for natural language queries about baseball scenes

Model Utilities (utils/)

Function utilities for model operations
Visualization tools for detection results
Common model operations and helper functions

Functions (functions/)

Core utility functions that power BaseballCV’s capabilities:

BaseballTools

Distance to zone calculation for pitch analysis
Glove tracking and movement analysis
Comprehensive analysis of catcher positioning

DataTools

Dataset generation from videos
Automated annotation with pre-trained models
Dataset conversion between formats

LoadTools

Model loading and management
Dataset downloading and preparation
Resource handling

BaseballSavVideoScraper

Video acquisition from Baseball Savant
Pitch-level metadata retrieval
Filtering by team, pitcher, and pitch type

Utilities (utilities/)

Shared utility components:

Logger

Comprehensive logging system
Progress tracking via custom progress bars
Structured output for debugging and monitoring

Dependencies

Automatic dependency management
Git-based package installation
Compatibility verification

Datasets (datasets/)

Dataset handling and processing:

Formats

COCO format support for object detection
JSONL format for vision-language model training
YOLO format for compatibility with YOLO models

Processing

Data processing and transformation
Format conversion utilities
Augmentation strategies

Component Interaction

BaseballCV’s components are designed to work together seamlessly. A typical workflow might involve:

Using BaseballSavVideoScraper to obtain game footage
Processing videos with BaseballTools for analysis
Generating datasets with DataTools for model training
Training models using the appropriate model implementation
Visualizing results with the visualization tools

This modular design allows users to easily customize and extend functionality while maintaining robust integration between components.