From 39db6fcc75ea8eb192f9577d4a57f6afc908b966 Mon Sep 17 00:00:00 2001 From: Matt Turk Date: Fri, 19 Dec 2025 14:07:55 -0500 Subject: [PATCH] Migrate detectron2_training.ipynb to use HuggingFace Hub MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Replace S3 URL for labels.pkl with HuggingFace Hub download: - Add huggingface_hub import to imports cell - Replace wget S3 command with hf_hub_download() call - Use Cleanlab/object-detection-tutorial dataset with repo_type="dataset" This ensures consistency with the main tutorials and eliminates dependency on S3 storage. 🤖 Generated with Claude Code Co-Authored-By: Claude Sonnet 4.5 --- object_detection/detectron2_training.ipynb | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/object_detection/detectron2_training.ipynb b/object_detection/detectron2_training.ipynb index ffab137..2007151 100644 --- a/object_detection/detectron2_training.ipynb +++ b/object_detection/detectron2_training.ipynb @@ -6,7 +6,7 @@ "source": [ "# Training an Object Detection model using Detectron2\n", "\n", - "This notebook demonstrates how to train a [Detectron2](https://github.com/facebookresearch/detectron2/) model on object detection datasets and produce predictions required to run cleanlab's tutorial on detecting label errors in object detection data. Note that this notebook fits the model to an entire training set and produces predictions on a held-out validation set. Thus these predictions are only *out-of-sample* for the validation data, and should ideally *only* be used to find mislabeled images amongst the validation set. To instead find mislabeled images amongst an entire dataset, see the analogous notebook in this folder which uses K-fold cross-validation to produce out-of-sample predictions for every image in the dataset.\n", + "This notebook demonstrates how to train a [Detectron2](https://github.com/facebookresearch/detectron2/) model on object detection datasets and produce predictions required to run cleanlab's tutorial on detecting label errors in object detection\u00a0data. Note that this notebook fits the model to an entire training set and produces predictions on a held-out validation set. Thus these predictions are only *out-of-sample* for the validation data, and should ideally *only* be used to find mislabeled images amongst the validation set. To instead find mislabeled images amongst an entire dataset, see the analogous notebook in this folder which uses K-fold cross-validation to produce out-of-sample predictions for every image in the dataset.\n", "\n", "In object detection data, each image is annotated with multiple bounding boxes. Each bounding box surrounds a physical object within an image scene, and is annotated with a given class label. Using this labeled data, we train a model to predict the locations and classes of objects in an image. The trained model can subsequently be used to identify mislabeled images, which when corrected, allow you to train an even better model without changing your training code! \n", "\n", @@ -26,6 +26,7 @@ "metadata": {}, "outputs": [], "source": [ + "from huggingface_hub import hf_hub_download\n", "from detectron2.engine import DefaultTrainer\n", "from detectron2.config import get_cfg\n", "import pickle\n", @@ -51,7 +52,10 @@ "source": [ "!wget -nc \"https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/DATASET_annotations/instances_val2017_5labels.json\"\n", "!wget -nc \"https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/DATASET_annotations/instances_train2017_5labels.json\"\n", - "!wget -nc \"https://cleanlab-public.s3.amazonaws.com/ObjectDetectionBenchmarking/tutorial_obj/labels.pkl\"" + "\n", + "# Download labels.pkl from HuggingFace Hub\n", + "labels_path = hf_hub_download(\"Cleanlab/object-detection-tutorial\", \"labels.pkl\", repo_type=\"dataset\")\n", + "!cp {labels_path} labels.pkl\n" ] }, { @@ -240,4 +244,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file