Skip to content

NERC-CEH/lepisense-inferences

 
 

Repository files navigation

LepiSense Inferences

This Repository contains code to perform inferences on images to:

  • Detect and isolate objects
  • Track objects
  • Classify objects as moth or non-moth
  • Identify the order
  • Predict the species

It is intended to be built in to a Docker container and run on Amazon Web Services (AWS) Elastic Container Service (ECS). The images are expected to be located in the AWS Simple Storage Service (S3). The container should run on an Elastic Compute Cloud (EC2) instance having GPU hardware.

This has been forked from https://github.com/AMI-system/amber-inferences.

Docker Build

The container is intended to run on a machine with nVidia GPUs and includes PyTorch 2.6.0, Python 3.12, CUDA 12.4.

You can build the Docker image using

docker build -t lepisense-inferences .

The build copies the local code files, incorporating any changes you may have made which is good for dev. Make sure to push changes to the repo if they are for production.

Startup Options

The Dockerfile contains commented options for how it should start up.

Manual

To run the scripts manually, uncomment the line

CMD ["tail", "-f", "/dev/null"]

The container will start but do nothing. You can SSH to the container and then execute commands as you wish

Jupyter

To start the Jupyter server so that you can run the tutorial notebook, uncomment the lines

EXPOSE 80
CMD ["/.venv/bin/jupyter", "notebook", "--no-browser", "--allow-root", "--ip=0.0.0.0", "--port=80",  "/lepisense-inferences/examples/tutorial.ipynb"]

Automatic

To automatically run the inferencing on all the pending data, uncomment the line

CMD ["/bin/bash", "-c", " source .venv/bin/activate && python -m amber_inferences.cli.auto_inference"]

Once the work is complete the container will terminate and the AWS infrastructure should scale down to nothing.

Deployment to Amazon ECS

Before deploying this container to AWS a whole pile of infrastructure needs to be set up. The configuration of the infrastructure is captured in the lepisense-cdk repository. Follow the deployment instructions in that repo.

Push Image to Registry

Once the AWS infrastructure is built you may want to update and redeploy the code in this repo.

To deploy the image to ECS we first push it to the Amazon Elastic Container Registry (ECR)

Before pushing the container you need to authenticate with the image registry using an AWS account that has permission. You can do this using the AWS Command Line Inerface

First sign in. to your AWS account. If it is the first time you will want to

aws configure sso

otherwise it is

aws sso login --profile <your-profile-name>

You can confirm the destination repository already exists and obtain the repositoryUri using the command

aws ecr describe-repositories \
    --repository-names lepisense/inferences \
    --region eu-west-2 \
    --profile <your-profile-name> 

Authenticate with the registry using the following command

aws ecr get-login-password \
  --region eu-west-2 \
  --profile <your-profile-name> \
  | \
docker login \
  --username AWS \
  --password-stdin <repositoryUri>

Now add a tag to the image we created earlier and then push using the tag name:

docker tag lepisense-inferences <repoistoryUri>:<stage>
docker push <repsitoryUri>:<stage>

Replace <stage> with dev, test or a version number like v<major>.<minor>.<patch>for production. This might need refining in future.

Upload Models

Because the models can be large and may be subject to their own version control they are not stored in the code repository. Copy them to the S3 bucket called lepisense-models-<stage> where <stage> is one of [dev|test|prod].

At time of writing, upload the models as follows:

  • loalisation/flat_bug_M.pt
  • binary/moth-nonmoth-effv2b3_20220506_061527_30.pth
  • order/thresholdsTestTrain.csv
  • order/dhc_best_128.pth
  • species/01_japan_data_category_map.json
  • species/turing-japan_v01_resnet50_2024-11-22-17-22_state.pt

These will be downloaded to the container when needed.

Access to EC2 Instance (and ECS container)

For maintenance, you can access a command prompt on the EC2 Instance and thence the ECS container.

EC2 Instance Shell

  1. Go to the System Manager console and select Explore Nodes from the left-hand menu.
  2. Select the relevant node and click the Connect button to start a new terminal session.

ECS Container Shell

At the EC2 prompt, run docker container ls to get the id of the container started by ECS. Then run docker exec -it <ContainerId> /bin/bash to enter the docker container.

Test S3 connectivity

To test network connectivity you can docker-exec in to the container and execute curl -v s3.eu-west-2.amazonaws.com which should give a positive reply

If networking is successful then you should be able to list images in the S3 bucket using the AWS CLI. aws s3 ls s3://lepisense-images-<stage>/

Test GPU availability

Docker exec in to the container and execute nvidia-smi If this works it will list the GPU driver and CUDA version.

Now confirm that torch has been successfully configured.

python -c "import torch; print(torch.cuda.is_available())"

This should return True.

Running

As described above, when documenting how to build the Docker image, there are three ways you can configure the inference code to start: manual, jupyter, and automatic.

Manual Operation

In this mode, you open a shell to your EC2 instance and Docker-exec in to the container, as described above in the section on accessing the ECS container, whereupon you can execute commands.

You must activate the virtual environment before running these commands using

source .venv/bin/activate

Printing the deployments available

This allows you to check what deployments exist.

python -m amber_inferences.cli.deployments

Filter to

  • an organisation with --organisation_name <value>
  • a country with --country_code <value>
  • a network with --network_name <value>
  • a deployment with --deployment_name <value>
  • a device type with --devicetype_name <value> where <value> is substituted by the value to filter with. Several filters can be combined.

Additionally,

  • --no-active lists inactive deployments rather than active ones.
  • --deleted lists deleted deployments.

Printing the inference jobs available

This allows you to find which deployments have files waiting to be processed.

python -m amber_inferences.cli.inference_jobs

The filter options are the same as for printing deployments.

Additionally,

  • --completed lists completed inference jobs rather than pending.
  • --deleted lists deleted deployments.
  • --limit <value> limits the number of rows returned.
  • --verbose to print verbose statements.

Generating keys for inference

This generates a list of files to process for a device on a date.

python -m amber_inferences.cli.generate_keys \
  --inference_id <value>

A value for inference_id, obtained from the list of jobs, is required.

An optional parameter is

  • --output_file <value> Output file of S3 keys. Default is /tmp/lepisense/s3_keys.txt.

Performing the inferences

This processes a list of image files, identifying moths to species level. It outputs a results file which lists each detection in each image.

python -m amber_inferences.cli.perform_inferences \
  --inference_id <value>

A value for inference_id, obtained from the list of jobs, is required. The list of S3 keys to process should have been created by the generate-keys script.

Optional parameters include

  • --json_file <value> Input file of S3 keys. Default is '/tmp/lepisense/s3_keys.txt'
  • --output_dir <value> Default is '/tmp/lepisense/'
  • --result_file <value> Output file of results. Default is '/tmp/lepisense/results.csv'
  • --remove_image Default is false
  • --save_crops Default is false
  • --localisation_model_name <value> Default is 'flat_bug_M.pt'
  • --box_threshold <value> Default is 0.00
  • --binary_model <value> Default is moth-nonmoth-effv2b3_20220506_061527_30.pth
  • --order_model <value> Default is dhc_best_128.pth
  • --order_thresholds <value> Default is thresholdsTestTrain.csv
  • --species_model <value> Default is turing-uk_v03_resnet50_2024-05-13-10-03_state.pt
  • --species_labels <value> Default is 03_uk_data_category_map.json
  • --top_n_species <value> Default is 5
  • --skip_processed If re-running a job that was interrupted, whether to skip over files that had already been processed. Default is false.
  • --verbose Whether to print extra information about progress. Default is false.

Adding tracking information to results.

This attempts to connect detections in consecutive images.

python -m amber_inferences.cli.get_tracks

Optional parameters include

  • --tracking_threshold <value> Threshold for the track cost. Default is 1
  • --result_file <value> The file to process and append tracking to. Default is '/tmp/lepisense/results.csv'
  • --verbose Whether to print extra information about progress. Default is false.

Preserving the inference results

The results of inferencing are stored locally in the EC2 instance and will disappear with it so we should save them back to S3 using the following.

python -m amber_inferences.cli.save_results \
  --inference_id <value>

An optional parameter is

  • --result_file <value> Output file of results. Default is /tmp/lepisense/results.csv.

Process all outstanding jobs

You can also process all outstanding inference jobs with one command. In automatic mode, all that happens is that this command is executed on a schedule.

python -m amber_inferences.cli.auto_inference

Optional parameters include

  • --verbose Whether to print extra information about progress. Default is false.

Following the Jupyter Notebook Example

With everything deployed and running you can use your browser to access the tutorial notebook. The url you need to access it can be found from the load balancer information. Go to the EC2 console and select Load Balancers from the list in the right-hand column. Select the relevant load balancer and copy the DNS name. Paste it in to your browser, prefix with http:// and go.

You should arrive at a login page requesting token. To obtain this, SSH to the relevant instance. If the instance has just started, list the docker containers and then and enter docker logs <container_id>. The log will display lines like the following:

To access the server, open this file in a browser:
    file:///root/.local/share/jupyter/runtime/jpserver-1-open.html
Or copy and paste one of these URLs:
    http://ip-172-31-27-174.eu-west-2.compute.internal:80/tree?token=4e33a3b89

The token you require is in the url shown in the log.

If your instance has been up sometime the log will be full of cruft. To then obtain the token, docker-exec in to the container and execute the command jupyter server list. You can copy the token from the output.

Copy the token and paste it in to the browser.

Automatic Operation

To automate the inferencing we create a schedule to start the ECS task which will run until there are no more images to the process. When the task is complete the number of containers should scale to zero and that should cause the EC2 instances to scale to zero. The lepisense-cdk deployment will set this up.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 56.8%
  • Jupyter Notebook 37.3%
  • Shell 5.2%
  • Dockerfile 0.7%