ZooplanktonBench: A Geo-Aware Zooplankton Recognition and Classification Dataset from Marine Observations

University of Georgia

Abstract

Plankton are small drifting organisms found throughout the world’s oceans. One component of this plankton community is the zooplankton, which includes gelatinous animals and crustaceans (e.g. shrimp), as well as the early life stages (i.e., eggs and larvae) of many commercially important fishes. Being able to monitor zooplankton abundances accurately and understand how populations change in relation to ocean conditions is invaluable to marine science research, with potential applications to the marine seafood industry. While new imaging technologies generate massive amounts of video data of zooplankton, analyzing them using standard computer vision tools developed for general objects turns out to be highly challenging. In this work, we present the ZooplanktonBench dataset that features a rich dataset containing images and videos of zooplankton in various water ecosystems and defines benchmark tasks to detect, classify, and track them in challenging settings, including highly cluttered environments, living vs non-living classification, objects with similar shapes, and relatively small objects. Our dataset presents unique challenges and opportunities for state-of-the-art computer vision systems to evolve and improve visual understanding in a dynamic environment with huge variations.

Raw images from our dataset

Raw data from 10 meters.

Raw data from 25 meters.

Raw data from 35 meters.

Object detection results

For YOLOv8, we train the model on different datasets.
On 10 meters dataset. we have 721 images for training, 298 images for testing.
On 25 meters dataset. we have 647 images for training, 333 images for testing.
On 35 meters dataset, we have 959 images for training, 403 images for testing.
On all mix dataset(contains images from all 3 depths), we have 2337 images for training, 1034 images for testing.
For GroundingDINO, we do zero-shot object detection on the same dataset as YOLOv8.

Result from GroundingDINO, the red bounding boxes represent the ground truth while the green boxes represent the predict results.

Result from GroundingDINO, the red bounding boxes represent the ground truth while the green boxes represent the predict results.

Results from fine-tuned YOLOv8 model for doing zooplanktons classification.

Results from fine-tuned YOLOv8 model for doing zooplanktons detection.

Performance metrics

The zero-shot performance of Grounding DINO on the zooplankton fine-grained classification task under different confidence thresholds.

The zero-shot performance of Grounding DINO on living zooplankton detection task under different confidence thresholds.

Experimental results of YOLOv8 on the fine-grained zooplankton species classification task. 'mix' refers to the combined dataset from all three depths—10 meters, 25 meters, and 35 meters.

GPT-4V fine-grained classification results on ZooplanktonBench. ‘Correct’ indicates the number of images correctly classified. ‘Classification’ indicates the number of images predicted by GPT-4V, the superscript on ‘Correct’ and ‘Classification’ indicates the data sent to GPT-4V, and ‘Instances’ shows the total number of images in each class in ground truth data.

Dataset download and preparation

We provide the processed and labeled data. For the labels, please note that there are two folders named labels_classification and labels_living_detection. You should use them accordingly and not use them together. The labels are built in YOLO format, so if you want to use any other format, please adjust as needed. Check Label files usage for more details.

ZooplanktonBench: A Geo-Aware Zooplankton Recognition and Classification Dataset from Marine Observations

Abstract

Raw images from our dataset

Raw data from 10 meters.

Raw data from 10 meters.

Raw data from 25 meters.

Raw data from 25 meters.

Raw data from 35 meters.

Raw data from 35 meters.

Object detection results

Result from GroundingDINO, the red bounding boxes represent the ground truth while the green boxes represent the predict results.

Result from GroundingDINO, the red bounding boxes represent the ground truth while the green boxes represent the predict results.

Result from GroundingDINO, the red bounding boxes represent the ground truth while the green boxes represent the predict results.

Results from fine-tuned YOLOv8 model for doing zooplanktons classification.

Results from fine-tuned YOLOv8 model for doing zooplanktons detection.

A sample in our video dataset

Object tracking in our video dataset

Performance metrics

The zero-shot performance of Grounding DINO on the zooplankton fine-grained classification task under different confidence thresholds.

The zero-shot performance of Grounding DINO on living zooplankton detection task under different confidence thresholds.

Experimental results of YOLOv8 on the fine-grained zooplankton species classification task. 'mix' refers to the combined dataset from all three depths—10 meters, 25 meters, and 35 meters.

Dataset download and preparation

Folder structure