Note
Click here to download the full example code
Using the Serve Module for Production Inference
This tutorial demonstrates how to use the serve function from the CulicidaeLab
library for high-performance, production-ready inference. The serve function
is designed to be a lightweight, fast, and safe way to run predictions.
This guide will cover:
- Speed and Safety: How the servefunction uses the ONNX backend for fast inference.
- Single Image Prediction: How to use servefor classification.
- Caching: Understand the in-memory caching for predictor instances.
- Clearing the Cache: How to clear the cache when needed.
Install the culicidaelab library if not already installed
1. Initialization and Setup
We will initialize the DatasetsManager to get some sample data.
The serve function doesn't require manual initialization of predictors.
Import necessary libraries
import matplotlib.pyplot as plt
# Import the required classes from the CulicidaeLab library
from culicidaelab import (
    DatasetsManager,
    get_settings,
    serve,
    clear_serve_cache,
)
# Get the default library settings instance
settings = get_settings()
# Initialize the services needed to manage and download data
manager = DatasetsManager(settings)
2. Loading the Test Dataset
We will use a built-in test dataset to get an image for our predictions.
print("--- Loading the 'classification' dataset's 'test' split ---")
classification_test_data = manager.load_dataset("classification", split="test")
print("Test dataset loaded successfully!")
print(f"Number of samples in the test dataset: {len(classification_test_data)}")
# Let's select one sample to work with.
classification_test_data = classification_test_data.shuffle(seed=42)
sample = classification_test_data[0]
image = sample["image"]
ground_truth_label = sample["label"]
print(f"\nSelected sample's ground truth label: '{ground_truth_label}'")
# Display the input image
plt.figure(figsize=(6, 6))
plt.imshow(image)
plt.title(f"Input Image\n(Ground Truth: {ground_truth_label})")
plt.axis("off")
plt.show()

Out:
--- Loading the 'classification' dataset's 'test' split ---
Cache hit for split config: test C:\Users\lenova\AppData\Local\culicidaelab\culicidaelab\datasets\mosquito_species_classification\4d967a30111bf29f
Test dataset loaded successfully!
Number of samples in the test dataset: 328
Selected sample's ground truth label: 'aedes_triseriatus'
C:/Users/lenova/CascadeProjects/culicidaelab/docs/en/examples/tutorial_part_5_serve_module.py:85: UserWarning: FigureCanvasAgg is non-interactive, and thus cannot be shown
  plt.show()
3. Using serve for Classification
The serve function automatically initializes the predictor with the ONNX backend
on the first call and caches it for subsequent requests.
Run classification using the serve function
print("--- Running classification for the first time (will initialize predictor) ---")
classification_result = serve(image)
# Print the top 5 predictions
print("\n--- Top 5 Classification Predictions ---")
for p in classification_result.predictions[:5]:
    print(f"{p.species_name}: {p.confidence:.2%}")
Out:
--- Running classification for the first time (will initialize predictor) ---
Initializing 'classifier' predictor for serving...
--- Top 5 Classification Predictions ---
Aedes triseriatus: 99.81%
Aedes geniculatus: 0.19%
Culiseta longiareolata: 0.00%
Aedes canadensis: 0.00%
Culex pipiens: 0.00%
4. Caching in Action
If you run the same request again, you'll notice it's much faster because the predictor is already in memory.
Run classification again to see the caching effect
print("\n--- Running classification for the second time (should be faster) ---")
classification_result_cached = serve(image, predictor_type="classifier")
# Print the top 5 predictions again
print("\n--- Top 5 Classification Predictions (from cache) ---")
for p in classification_result_cached.predictions[:5]:
    print(f"{p.species_name}: {p.confidence:.2%}")
Out:
--- Running classification for the second time (should be faster) ---
--- Top 5 Classification Predictions (from cache) ---
Aedes triseriatus: 99.81%
Aedes geniculatus: 0.19%
Culiseta longiareolata: 0.00%
Aedes canadensis: 0.00%
Culex pipiens: 0.00%
5. Clearing the Cache
If you need to free up memory or reload the predictors, you can use the
clear_serve_cache function.
Clear the cache
print("\n--- Clearing the predictor cache ---")
clear_serve_cache()
# Run classification again, it will re-initialize the predictor
print("\n--- Running classification again after clearing cache (will re-initialize) ---")
classification_result_after_clear = serve(image)
print("\n--- Top 5 Classification Predictions (after cache clear) ---")
for p in classification_result_after_clear.predictions[:5]:
    print(f"{p.species_name}: {p.confidence:.2%}")
Out:
--- Clearing the predictor cache ---
--- Running classification again after clearing cache (will re-initialize) ---
Initializing 'classifier' predictor for serving...
--- Top 5 Classification Predictions (after cache clear) ---
Aedes triseriatus: 99.81%
Aedes geniculatus: 0.19%
Culiseta longiareolata: 0.00%
Aedes canadensis: 0.00%
Culex pipiens: 0.00%
Total running time of the script: ( 0 minutes 1.507 seconds)
Download Python source code: tutorial_part_5_serve_module.py
Download Jupyter notebook: tutorial_part_5_serve_module.ipynb