Plankton are small drifting organisms found throughout the world’s oceans. One component of this plankton community is the zooplankton, which includes gelatinous animals and crustaceans (e.g. shrimp), as well as the early life stages (i.e., eggs and larvae) of many commercially important fishes. Being able to monitor zooplankton abundances accurately and understand how populations change in relation to ocean conditions is invaluable to marine science research, with potential applications to the marine seafood industry. While new imaging technologies generate massive amounts of video data of zooplankton, analyzing them using standard computer vision tools developed for general objects turns out to be highly challenging. In this work, we present the ZooplanktonBench dataset that features a rich dataset containing images and videos of zooplankton in various water ecosystems and defines benchmark tasks to detect, classify, and track them in challenging settings, including highly cluttered environments, living vs non-living classification, objects with similar shapes, and relatively small objects. Our dataset presents unique challenges and opportunities for state-of-the-art computer vision systems to evolve and improve visual understanding in a dynamic environment with huge variations.
For YOLOv8, we train the model on different datasets.
On 10 meters dataset. we have 721 images for training, 298 images for testing.
On 25 meters dataset. we have 647 images for training, 333 images for testing.
On 35 meters dataset, we have 959 images for training, 403 images for testing.
On all mix dataset(contains images from all 3 depths), we have 2337 images for training, 1034 images for testing.
For GroundingDINO, we do zero-shot object detection on the same dataset as YOLOv8.
We provide the processed and labeled data. For the labels, please note that there are two folders named labels_classification and labels_living_detection. You should use them accordingly and not use them together. The labels are built in YOLO format, so if you want to use any other format, please adjust as needed. Check Label files usage for more details.
In general, we need to create datasets following the structures below: