System Architecture¶

This document provides a comprehensive overview of the CulicidaeLab Server architecture, including system components, data flow, and technical design decisions.

Overview¶

CulicidaeLab Server is a full-stack web application built with modern Python technologies, designed for mosquito research, surveillance, and data analysis. The system follows a microservices-inspired architecture with clear separation between frontend and backend components.

High-Level Architecture¶

graph TB
    subgraph "Frontend Layer"
        UI[Solara Web UI]
        COMP[React Components]
        STATE[State Management]
    end

    subgraph "API Layer"
        FASTAPI[FastAPI Server]
        ROUTES[API Routes]
        MIDDLEWARE[CORS/Auth Middleware]
    end

    subgraph "Service Layer"
        PRED[Prediction Service]
        GEO[Geo Service]
        SPECIES[Species Service]
        DISEASE[Disease Service]
        OBS[Observation Service]
        CACHE[Cache Service]
    end

    subgraph "Data Layer"
        LANCE[LanceDB Vector DB]
        MODELS[AI Models]
        STATIC[Static Files]
    end

    subgraph "External Dependencies"
        CULICIDAE[culicidaelab Library]
        FASTAI[FastAI Models]
        HUGGINGFACE[HuggingFace Models]
    end

    UI --> FASTAPI
    COMP --> STATE
    STATE --> FASTAPI

    FASTAPI --> ROUTES
    ROUTES --> MIDDLEWARE
    MIDDLEWARE --> PRED
    MIDDLEWARE --> GEO
    MIDDLEWARE --> SPECIES
    MIDDLEWARE --> DISEASE
    MIDDLEWARE --> OBS

    PRED --> CACHE
    GEO --> CACHE
    SPECIES --> CACHE
    DISEASE --> CACHE
    OBS --> CACHE

    CACHE --> LANCE
    PRED --> MODELS
    SPECIES --> LANCE
    DISEASE --> LANCE
    OBS --> LANCE
    GEO --> LANCE

    PRED --> CULICIDAE
    MODELS --> FASTAI
    MODELS --> HUGGINGFACE

Component Architecture¶

Frontend Architecture (Solara)¶

The frontend is built using Solara, a pure Python reactive web framework that compiles to modern web technologies.

Key Components:¶

Main Application (frontend/main.py): Entry point and routing configuration
Pages (frontend/pages/): Individual page components for different features
Components (frontend/components/): Reusable UI components organized by feature
State Management (frontend/state.py): Global application state using Solara reactives
Configuration (frontend/config.py): Frontend settings, themes, and API endpoints

Frontend Data Flow:¶

sequenceDiagram
    participant User
    participant UI as Solara UI
    participant State as State Manager
    participant API as Backend API
    participant DB as LanceDB

    User->>UI: Interact with component
    UI->>State: Update reactive state
    State->>API: HTTP request
    API->>DB: Query data
    DB-->>API: Return results
    API-->>State: JSON response
    State-->>UI: Update component
    UI-->>User: Render changes

Backend Architecture (FastAPI)¶

The backend follows a layered architecture pattern with clear separation of concerns.

Layer Structure:¶

API Layer (backend/routers/): FastAPI route handlers
Service Layer (backend/services/): Business logic and data processing
Schema Layer (backend/schemas/): Pydantic models for request/response validation
Data Layer (backend/database_utils/): Database connection and query utilities

Backend Components:¶

Main Application (backend/main.py): FastAPI app initialization and configuration
Configuration (backend/config.py): Settings management with Pydantic
Dependencies (backend/dependencies.py): Dependency injection for common services
Routers: Feature-specific API endpoints
species.py: Species information and search
diseases.py: Disease data and vector relationships
prediction.py: AI-powered species prediction
geo.py: Geographic data and observations
observation.py: User observation management
filters.py: Filter options for UI components

Data Architecture¶

LanceDB Vector Database¶

The system uses LanceDB as the primary data store, providing:

Vector similarity search for species identification
Geospatial indexing for location-based queries
Schema evolution support for data model changes
High-performance queries with automatic indexing

Data Models:¶

# Core data structures
Species {
    id: str
    scientific_name: str
    common_names: dict[str, str]
    description: dict[str, str]
    images: list[str]
    distribution: dict
    vector_embedding: list[float]
}

Observation {
    id: str
    species_id: str
    location: Point
    timestamp: datetime
    confidence: float
    user_id: str
    image_path: str
}

Disease {
    id: str
    name: dict[str, str]
    description: dict[str, str]
    symptoms: dict[str, str]
    vectors: list[str]
    prevention: dict[str, str]
}

AI/ML Integration¶

Model Architecture¶

The system integrates with the culicidaelab library for AI-powered species identification:

graph LR
    subgraph "AI Pipeline"
        INPUT[Image Input]
        PREPROCESS[Preprocessing]
        CLASSIFY[Classification Model]
        DETECT[Detection Model]
        SEGMENT[Segmentation Model]
        POSTPROCESS[Post-processing]
        OUTPUT[Prediction Results]
    end

    INPUT --> PREPROCESS
    PREPROCESS --> CLASSIFY
    PREPROCESS --> DETECT
    PREPROCESS --> SEGMENT
    CLASSIFY --> POSTPROCESS
    DETECT --> POSTPROCESS
    SEGMENT --> POSTPROCESS
    POSTPROCESS --> OUTPUT

Model Types:¶

Classification Models: Species identification from images
Detection Models: Mosquito detection and localization
Segmentation Models: Precise mosquito boundary detection

Performance Considerations:¶

GPU Acceleration: CUDA support for faster inference
Model Caching: Pre-loaded models for reduced latency
Batch Processing: Efficient handling of multiple predictions
Fallback Mechanisms: CPU processing when GPU unavailable

Security Architecture¶

Authentication & Authorization¶

Currently implements basic security measures:

CORS Configuration: Controlled cross-origin access
Input Validation: Pydantic schema validation
File Upload Security: Image type and size validation
Rate Limiting: (Planned) API request throttling

Data Security¶

Input Sanitization: All user inputs validated
SQL Injection Prevention: Parameterized queries via LanceDB
File System Security: Controlled file access patterns
Environment Variables: Sensitive configuration externalized

Deployment Architecture¶

Development Environment¶

graph TB
    subgraph "Development Setup"
        DEV_FE[Frontend Dev Server<br/>localhost:8765]
        DEV_BE[Backend Dev Server<br/>localhost:8000]
        DEV_DB[Local LanceDB]
    end

    DEV_FE --> DEV_BE
    DEV_BE --> DEV_DB

Production Environment¶

graph TB
    subgraph "Production Deployment"
        LB[Load Balancer]
        FE[Frontend Server]
        BE[Backend Server]
        DB[LanceDB Storage]
        STATIC[Static File Server]
    end

    LB --> FE
    LB --> BE
    BE --> DB
    FE --> STATIC

Performance Considerations¶

Frontend Performance¶

Component Lazy Loading: On-demand component loading
State Optimization: Efficient reactive state updates
Asset Optimization: Compressed images and static files
Caching Strategy: Browser and CDN caching

Backend Performance¶

Database Indexing: Optimized LanceDB indexes
Connection Pooling: Efficient database connections
Async Processing: Non-blocking I/O operations
Response Caching: Cached API responses for static data

Monitoring & Observability¶

Application Logs: Structured logging throughout the system
Performance Metrics: Response time and throughput monitoring
Error Tracking: Comprehensive error logging and alerting
Health Checks: System health monitoring endpoints

Technology Stack Summary¶

Frontend Stack¶

Solara: Pure Python web framework
ipyleaflet: Interactive mapping components
Plotly: Data visualization
Material Design: UI component library

Backend Stack¶

FastAPI: High-performance web framework
Uvicorn: ASGI server
Pydantic: Data validation and settings
LanceDB: Vector database
culicidaelab: AI/ML library

Development Tools¶

uv: Python package management
pytest: Testing framework
Black: Code formatting
mypy: Type checking
pre-commit: Git hooks for code quality