Skip to content

Conversation

@srikartondapu
Copy link

Description

This PR adds support for the Detections API v1/text/contents protocol, enabling NeMo Guardrails to communicate with external detector services that implement this standardized interface (e.g., TrustyAI guardrails-detectors, FMS Guardrails Orchestrator detectors).

Key Changes:

  1. Base Interface Pattern: Introduced BaseDetectorClient abstract class that eliminates code duplication when supporting multiple detector API protocols. Common logic (HTTP communication, session management, authentication, error handling) is shared, while API-specific logic (request/response formats) is isolated in subclass implementations.

  2. Detections API Client: Implemented DetectionsAPIClient that handles:

    • Request format: {"contents": [text], "detector_params": {}}
    • Response parsing: Nested array structure [[{detection1}, detection2}]]
    • Multiple detections per text with threshold-based filtering
    • Rich metadata extraction (spans, categories, confidence scores)
  3. Configuration Support: Added DetectionsAPIConfig to RailsConfigData enabling ConfigMap-driven detector management without code changes.

  4. Action Functions: Implemented detections_api_check_all_detectors() and detections_api_check_detector() for NeMo rails.co integration with parallel execution and proper error separation (system errors vs content violations).

  5. Comprehensive Documentation: Added deployment guide with Granite Guardian HAP example, testing instructions, and guide for adding new detectors.

Design Benefits:

  • Extensible: Add new API protocols by implementing build_request() and parse_response() methods only
  • No code duplication: Shared orchestration, HTTP, and error handling across all detector types
  • Configuration-driven: Add/remove detectors via ConfigMap updates

Testing Performed:

  • Deployed Granite Guardian HAP detector using TrustyAI guardrails-detectors
  • Verified safe content passes through to LLM
  • Verified harmful content blocked by detector (jailbreak, harm, unethical_behavior detection)

Related Issue(s)

Addresses the need for standardized detector API integration to support multiple detector service protocols (Detections API, KServe V1, future protocols) through a unified, extensible architecture.

Checklist

  • I've read the CONTRIBUTING guidelines.
  • I've updated the documentation if applicable.
  • I've added tests if applicable.
  • @m-misiura for review

- Implement base interface pattern for extensible detector clients
- Add DetectionsAPIClient for v1/text/contents protocol
- Support configuration-driven detector management via ConfigMap
- Add comprehensive documentation and deployment guide
@srikartondapu srikartondapu force-pushed the feature/detections-api-integration branch from 3c0323a to 548e85a Compare December 1, 2025 16:53
Copy link
Collaborator

@m-misiura m-misiura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should run pre-commit to make this code adhere to the NeMo style

import logging
from typing import Any, Dict, List

from .base import BaseDetectorClient, DetectorResult
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

module imports are hanled inconsistenly, e.g. here they are relative but in actions.py, they are absolute, e.g.

from nemoguardrails.library.detector_clients.base import DetectorResult
from nemoguardrails.library.detector_clients.detections_api import DetectionsAPIClient

Returns:
DetectorResult with parsed detection outcome
"""
if http_status != 200:
Copy link
Collaborator

@m-misiura m-misiura Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this distinguish between e.g. 404: Detector not found and 422: Validation error (invalid request)?

if http_status != 200:
            return DetectorResult(
                allowed=False,
                score=0.0,
                reason=f"HTTP {http_status} error",
                label="ERROR",
                detector=self.detector_name,
                metadata={"http_status": http_status}
            )

see Detector API spec: https://foundation-model-stack.github.io/fms-guardrails-orchestrator/docs/api/openapi_detector_api.yaml

scores = [d.get("score", 0.0) for d in detections]
return max(scores) if scores else 0.0

def _calculate_average_score(self, detections: List[Dict[str, Any]]) -> float:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how informative it is to calculate average score across detectors; it might be best to remove _calculate_average_score

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the defined Colang flow definition is correct or if there is something wrong with the implementation

Re-running the same message gives me inconsistent outputs, e.g. sometimes

  • variant 1
{"messages":[{"role":"assistant","content":"I'm sorry, but I couldn't process your request due to the following reason: Blocked by 2 Detections API detector(s): toxic-prompt-roberta-detector, ibm-hap-38m-detector. Please feel free to ask something else or try rephrasing your question."}]
  • variant 2:
{"messages":[{"role":"assistant","content":"Sorry, but I'm unable to assist with that request."}]}%  
  • variant 3
{"messages":[{"role":"assistant","content":"This prompt is blocked by 2 Detections API detector(s): toxic-prompt-roberta-detector, ibm-hap-38m-detector"}]}%       

and so on; please investigate

}
)

def _extract_detections_from_response(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there could be opportunities to potentially simplify _extract_detections_from_response since I am not sure if the API can return a flat array, but please check


return response

def _calculate_highest_score(self, detections: List[Dict[str, Any]]) -> float:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this really necessary or is it possible to use e.g. in-built max function instead of the custom one?

"average_score": average_score,
"individual_scores": individual_scores,
"highest_detection": highest_detection,
"detections": filtered_detections
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L176 seems inconsistent with L149?

if metadata is needed, would it be better to have a consistent format where you always display all detectors and then just say pass / fail per detector?

result = await client.detect(text)
return result

except Exception as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this dead code as detections_api.py already has try/except that always returns DetectorResult?

if isinstance(user_message, dict):
user_message = user_message.get("content", "")

detections_api_detectors = getattr(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a safer pattern or proper check might be worthwhile to implement

f"{list(detections_api_detectors.keys())}"
)

tasks_with_names = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why store tuples then extract with task[1] and tasks_with_names[i][0]?

if not config:
return {"allowed": False, "reason": "No configuration"}

user_message = context.get("user_message", "")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this mean I can only set this up as in input guardrail?

what if I would like to also configure this fr other message types?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems to me that only input guardrails have been implemented; it would be good to also have this working as output guardrails and work on more than just user_message; perhaps check out how other providers handle this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a HTTP session leak in this file? please investigate

Copy link
Collaborator

@m-misiura m-misiura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please go through all the comments; most importantly:

  1. colang flow in the user guide does not appear to be quite correct (there is stochasticity in the outputs received when sending a request with the same input text
  2. I think the current implementation only works on inputs, consider how this could be extended by look at other providers
  3. there are some redundancies and unnecessary code; consider removing any dead code
  4. metadata -- I am not sure if all fields are needed especially things like average score across detectors
  5. pre commit should be run on all files
  6. consider adding some unit tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants