diff --git a/docs/user-guides/detections-api-integration.md b/docs/user-guides/detections-api-integration.md new file mode 100644 index 000000000..c12798540 --- /dev/null +++ b/docs/user-guides/detections-api-integration.md @@ -0,0 +1,1137 @@ +# Detections API Integration for NeMo Guardrails + +## Overview + +This integration enables NeMo Guardrails to communicate with external detector services that implement the Detections API v1/text/contents protocol, providing a standardized interface for content safety checking without requiring detector logic within NeMo. + +**Key Features:** +- **Protocol-agnostic architecture**: Base interface pattern supports multiple detector API protocols (Detections API, KServe V1, future APIs) +- **Configuration-driven**: Add/remove detectors via ConfigMap updates only +- **Service-based detection**: Detectors run as independent microservices with rich metadata +- **Extensible design**: Add support for new API protocols by implementing two methods (request builder and response parser) +- **No code duplication**: Common HTTP, error handling, and orchestration logic shared across all detector types + +## Architecture + + User Input → NeMo Guardrails → Detections API Detector Services → vLLM (if safe) → Response + +**Components:** +- **NeMo Guardrails** (CPU) - Orchestration and flow control +- **Detections API Detectors** (CPU/GPU) - Content safety services implementing v1/text/contents protocol (this guide demonstrates Granite Guardian HAP as an example) +- **vLLM** (GPU) - LLM inference + +### Design: Base Interface Pattern + +This integration introduces a base interface architecture that eliminates code duplication when supporting multiple detector API protocols. + +**File Structure:** +``` +nemoguardrails/library/detector_clients/ +├── base.py # BaseDetectorClient interface (shared logic) +├── detections_api.py # Detections API v1/text/contents client +├── actions.py # NeMo action functions +└── __init__.py # Python package marker +``` + +**Why This Design:** + +Traditional approach would duplicate HTTP logic, error handling, and orchestration for each new API protocol. The base interface isolates what varies (request/response formats) from what stays constant (HTTP communication, error handling). + +**What's Shared (in base.py):** +- HTTP session management with connection pooling +- Authentication header handling (per-detector and global fallback) +- Timeout and error handling +- Standard `DetectorResult` model + +**What's API-Specific (in detections_api.py):** +- Request format: `{"contents": [text], "detector_params": {}}` +- Response parsing: Nested array structure `[[{detection1}, detection2}]]` +- Detection aggregation logic (multiple detections per text) +- Threshold and filtering logic + +**Benefits:** +- Add new API protocol = implement 2 methods (`build_request`, `parse_response`) +- No code changes to add detectors (ConfigMap only) +- Same orchestration logic for all detector types +- Extensible for future protocols (OpenAI Moderation, Perspective API, etc.) + +## Prerequisites + +- OpenShift cluster with KServe installed +- Access to Quay.io or container registry for pulling images +- vLLM deployment for LLM inference (or alternative OpenAI-compatible endpoint) + +## Requirements + +**This integration communicates with external services implementing the Detections API v1/text/contents protocol.** + +The Detections API provides structured detection results with rich metadata (spans, categories, confidence scores) rather than raw model outputs. Services must implement the standardized request/response format described below. + +### API Contract + +This integration uses **Detections API v1/text/contents protocol**. + +**Protocol:** REST API with detector-specific routing via headers + +**Requirements:** +- Endpoint path: `/api/v1/text/contents` +- Request header: `detector-id` specifying which detector to invoke +- Request body: `{"contents": ["text"], "detector_params": {}}` +- Response: Nested array of detection objects `[[{detection1}, {detection2}, ...]]` + +**Request Format:** +```json +POST /api/v1/text/contents +Header: detector-id: granite-guardian-hap + +{ + "contents": ["text to analyze"], + "detector_params": {} +} +``` + +**Response Format:** +```json +[[ + { + "start": 0, + "end": 20, + "detection_type": "pii", + "detection": "EmailAddress", + "score": 0.95, + "text": "matching text span", + "evidence": {}, + "metadata": {} + } +]] +``` + +Each detection includes: +- `start`, `end`: Character span indices in input text +- `detection_type`: Broad category (pii, toxicity, etc.) +- `detection`: Specific detection class +- `score`: Confidence score (0.0-1.0) +- `text`: Detected text span + +## How It Works + +### Detection Flow + +1. User sends message to NeMo Guardrails via HTTP POST to `/v1/chat/completions` +2. NeMo loads configuration from ConfigMap and triggers input safety flow defined in `rails.co` +3. `detections_api_check_all_detectors()` action executes, running all configured detectors in parallel +4. For each detector: + - `DetectionsAPIClient` builds request: `{"contents": [text], "detector_params": {}}` + - HTTP POST sent to detector service with `detector-id` header + - Detector service processes text and returns structured detections + - Parser extracts detections from nested array response `[[...]]` +5. Each detection is evaluated: + - If `detection.score >= threshold`: Detection triggers blocking + - Multiple detections per text are supported + - Highest scoring detection determines overall score +6. Results aggregation: + - System errors (timeouts, connection failures): Request blocked, tracked in `unavailable_detectors` + - Content violations: Request blocked, tracked in `blocking_detectors` with full metadata + - All pass: Request proceeds to vLLM for generation +7. Response returned to user (blocked message or LLM-generated response) + +### Base Interface Pattern + +The integration uses object-oriented design to eliminate code duplication across different detector API protocols. + +**BaseDetectorClient (Abstract Class):** +```python +class BaseDetectorClient(ABC): + @abstractmethod + async def detect(text: str) -> DetectorResult + + @abstractmethod + def build_request(text: str) -> dict + + @abstractmethod + def parse_response(response: dict, http_status: int) -> DetectorResult + + # Shared implementations: + async def _call_endpoint(...) # HTTP communication + def _handle_error(...) # Error handling +``` + +**DetectionsAPIClient (Implementation):** +```python +class DetectionsAPIClient(BaseDetectorClient): + def build_request(text: str) -> dict: + # Detections API specific format + return {"contents": [text], "detector_params": {}} + + def parse_response(response: dict, http_status: int) -> DetectorResult: + # Parse [[{detection1}, {detection2}]] + # Apply threshold filtering + # Return standardized DetectorResult +``` + +**Adding New API Protocol:** + +To support a new detector API (e.g., OpenAI Moderation, Perspective API): +1. Create new client class inheriting from `BaseDetectorClient` +2. Implement `build_request()` for the API's request format +3. Implement `parse_response()` for the API's response format +4. Add `@action()` decorated functions in `actions.py` that use the new client +5. Reuse all HTTP, auth, error handling from base class + +### Detection Logic + +**Multiple Detections Handling:** + +Detections API services can return multiple detections for a single text (e.g., two email addresses, one SSN). The parser: +1. Extracts all detections from nested array structure +2. Filters detections by threshold: `score >= threshold` +3. Blocks if **ANY** detection meets threshold (fail-safe approach) +4. Returns highest score as primary score +5. Includes all detection details in metadata for auditing + +**Example:** +``` +Input: "Email me at test@example.com or call 555-1234" + +Response: [[ + {detection: "EmailAddress", score: 0.99}, + {detection: "PhoneNumber", score: 0.85} +]] + +With threshold=0.5: +- Both detections >= 0.5 +- Content blocked +- Primary score: 0.99 (highest) +- Label: "pii:EmailAddress" (highest scoring detection) +- Metadata includes both detections +``` + +**Score Aggregation:** +- `score`: Highest individual detection score +- `metadata.average_score`: Average of all filtered detections +- `metadata.detection_count`: Number of detections above threshold +- `metadata.individual_scores`: All scores for analysis + +### Error Handling + +The system distinguishes between infrastructure errors and content violations. + +**System Errors:** +- HTTP errors (404, 500, 503) +- Network timeouts +- Invalid response formats +- Result: `allowed=False`, `label="ERROR"` or `"TIMEOUT"` +- Tracked in `unavailable_detectors` list +- User message: "Service temporarily unavailable" + +**Content Violations:** +- Successful detection with score >= threshold +- Result: `allowed=False`, `label="{type}:{detection}"` +- Tracked in `blocking_detectors` list with full metadata +- User message: Details which detectors blocked and scores + +**Multiple Detector Failures:** + +When running multiple detectors, the system provides comprehensive feedback showing all blocking detectors and any unavailable services, enabling both user communication and operational monitoring. + +## Deployment Guide + +### Prerequisites + +- OpenShift cluster with KServe installed +- Namespace: `` (this guide uses examples with placeholder) +- Access to Quay.io for pulling images +- vLLM or other OpenAI-compatible LLM endpoint for generation + +**This integration requires external Detections API services to be deployed.** + +Services must implement the v1/text/contents protocol with the request/response format described in the Requirements section. + +### Deployment Options + +**Option A: Using TrustyAI Guardrails Detectors (Recommended)** + +Deploy detectors from the [guardrails-detectors repository](https://github.com/trustyai-explainability/guardrails-detectors) which provides production-ready HuggingFace-based detectors implementing the Detections API protocol. + +**Option B: Deploy Your Own Detections API Service** + +Implement a service that exposes `/api/v1/text/contents` endpoint following the API contract. Refer to the guardrails-detectors repository for reference implementations. + +This guide demonstrates Option A with Granite Guardian HAP detector. + +### Step 1: Deploy Granite Guardian HAP Detector + +Granite Guardian requires model storage via MinIO (S3-compatible object storage running in-cluster) and uses a PVC-based approach to download and serve the model. + +**Why MinIO:** KServe expects S3-compatible storage for models. MinIO provides this locally without external dependencies, enabling disconnected cluster deployments. + +#### Deploy Model Storage and MinIO + +Create `granite-guardian-storage.yaml`: +```yaml +apiVersion: v1 +kind: Service +metadata: + name: minio-guardrails-guardian +spec: + ports: + - name: minio-client-port + port: 9000 + protocol: TCP + targetPort: 9000 + selector: + app: minio-guardrails-guardian +--- +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: guardrails-models-claim-guardian +spec: + accessModes: + - ReadWriteOnce + volumeMode: Filesystem + resources: + requests: + storage: 100Gi +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: guardrails-container-deployment-guardian + labels: + app: minio-guardrails-guardian +spec: + replicas: 1 + selector: + matchLabels: + app: minio-guardrails-guardian + template: + metadata: + labels: + app: minio-guardrails-guardian + maistra.io/expose-route: 'true' + name: minio-guardrails-guardian + spec: + volumes: + - name: model-volume + persistentVolumeClaim: + claimName: guardrails-models-claim-guardian + initContainers: + - name: download-model + image: quay.io/rgeada/llm_downloader:latest + securityContext: + fsGroup: 1001 + command: + - bash + - -c + - | + model="ibm-granite/granite-guardian-3.0-2b" + echo "Starting download of ${model}" + /tmp/venv/bin/huggingface-cli download $model --local-dir /mnt/models/huggingface/$(basename $model) + echo "Download complete!" + resources: + limits: + memory: "2Gi" + cpu: "2" + volumeMounts: + - mountPath: "/mnt/models/" + name: model-volume + containers: + - args: + - server + - /models + env: + - name: MINIO_ACCESS_KEY + value: THEACCESSKEY + - name: MINIO_SECRET_KEY + value: THESECRETKEY + image: quay.io/trustyai/modelmesh-minio-examples:latest + name: minio + securityContext: + allowPrivilegeEscalation: false + capabilities: + drop: + - ALL + seccompProfile: + type: RuntimeDefault + volumeMounts: + - mountPath: "/models/" + name: model-volume +--- +apiVersion: v1 +kind: Secret +metadata: + name: aws-connection-minio-data-connection-guardrails-guardian + labels: + opendatahub.io/dashboard: 'true' + opendatahub.io/managed: 'true' + annotations: + opendatahub.io/connection-type: s3 + openshift.io/display-name: Minio Data Connection +data: + AWS_ACCESS_KEY_ID: VEhFQUNDRVNTS0VZ + AWS_DEFAULT_REGION: dXMtc291dGg= + AWS_S3_BUCKET: aHVnZ2luZ2ZhY2U= + AWS_S3_ENDPOINT: aHR0cDovL21pbmlvLWd1YXJkcmFpbHMtZ3VhcmRpYW46OTAwMA== + AWS_SECRET_ACCESS_KEY: VEhFU0VDUkVUS0VZ +type: Opaque +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: user-one +--- +kind: RoleBinding +apiVersion: rbac.authorization.k8s.io/v1 +metadata: + name: user-one-view +subjects: + - kind: ServiceAccount + name: user-one +roleRef: + apiGroup: rbac.authorization.k8s.io + kind: ClusterRole + name: view +``` + +Deploy: +```bash +oc apply -f granite-guardian-storage.yaml -n +``` + +Monitor model download (takes 5-10 minutes for ~5GB model): +```bash +oc logs -f deployment/guardrails-container-deployment-guardian -n -c download-model +``` + +Wait for "Download complete!" message. + +Verify MinIO is running: +```bash +oc get pods -n | grep guardrails-container +``` + +Expected: Pod shows `2/2 Running` (init container completed, MinIO running) + +#### Deploy ServingRuntime for Granite Guardian + +Create `granite-guardian-runtime.yaml`: +```yaml +apiVersion: serving.kserve.io/v1alpha1 +kind: ServingRuntime +metadata: + name: guardrails-detector-runtime-guardian + annotations: + openshift.io/display-name: Guardrails Detector ServingRuntime for KServe + opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]' + labels: + opendatahub.io/dashboard: 'true' +spec: + annotations: + prometheus.io/port: '8000' + prometheus.io/path: '/metrics' + multiModel: false + supportedModelFormats: + - autoSelect: true + name: guardrails-detector-huggingface + containers: + - name: kserve-container + image: quay.io/rh-ee-mmisiura/guardrails-detector-huggingface:3d51741 + command: + - uvicorn + - app:app + args: + - "--workers" + - "1" + - "--host" + - "0.0.0.0" + - "--port" + - "8000" + - "--log-config" + - "/common/log_conf.yaml" + env: + - name: MODEL_DIR + value: /mnt/models + - name: HF_HOME + value: /tmp/hf_home + - name: DETECTOR_NAME + valueFrom: + fieldRef: + fieldPath: metadata.name + ports: + - containerPort: 8000 + protocol: TCP + resources: + requests: + memory: "18Gi" + cpu: "1" + limits: + memory: "20Gi" + cpu: "2" +``` + +Deploy: +```bash +oc apply -f granite-guardian-runtime.yaml -n +``` + +Verify: +```bash +oc get servingruntime -n | grep guardian +``` + +Expected: `guardrails-detector-runtime-guardian` appears in list + +#### Deploy Granite Guardian InferenceService + +Create `granite-guardian-isvc.yaml`: +```yaml +apiVersion: serving.kserve.io/v1beta1 +kind: InferenceService +metadata: + name: guardrails-detector-ibm-guardian + labels: + opendatahub.io/dashboard: 'true' + annotations: + openshift.io/display-name: guardrails-detector-ibm-guardian + security.opendatahub.io/enable-auth: 'true' + serving.knative.openshift.io/enablePassthrough: 'true' + sidecar.istio.io/inject: 'true' + sidecar.istio.io/rewriteAppHTTPProbers: 'true' + serving.kserve.io/deploymentMode: RawDeployment +spec: + predictor: + maxReplicas: 1 + minReplicas: 1 + model: + modelFormat: + name: guardrails-detector-huggingface + name: '' + runtime: guardrails-detector-runtime-guardian + storage: + key: aws-connection-minio-data-connection-guardrails-guardian + path: granite-guardian-3.0-2b +``` + +Deploy: +```bash +oc apply -f granite-guardian-isvc.yaml -n +``` + +Wait for predictor pod to start and load model (3-5 minutes): +```bash +oc get pods -n | grep guardrails-detector-ibm-guardian + +# Watch logs +oc logs -f -n $(oc get pods -n -l serving.kserve.io/inferenceservice=guardrails-detector-ibm-guardian -o name | head -1) -c kserve-container +``` + +Expected log output: +``` +Model type detected: causal_lm +Application startup complete. +Uvicorn running on http://0.0.0.0:8000 +``` + +Verify InferenceService is ready: +```bash +oc get inferenceservice guardrails-detector-ibm-guardian -n +``` + +Expected: `READY = True` + +**Note:** Granite Guardian runs on CPU by default. Inference takes 30-120 seconds per request. For production, consider deploying on GPU nodes or increasing timeout configuration. + +### Step 2: Deploy vLLM Inference Service + +vLLM uses a PVC-based approach to pre-download the Phi-3-mini model. This avoids runtime dependencies on HuggingFace and uses Red Hat's official AI Inference Server image. + +Create `vllm-inferenceservice.yml`: +```yaml +apiVersion: v1 +kind: PersistentVolumeClaim +metadata: + name: phi3-model-pvc +spec: + accessModes: + - ReadWriteOnce + resources: + requests: + storage: 20Gi +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: phi3-model-downloader +spec: + replicas: 1 + selector: + matchLabels: + app: phi3-downloader + template: + metadata: + labels: + app: phi3-downloader + spec: + initContainers: + - name: download-model + image: quay.io/rgeada/llm_downloader:latest + command: + - bash + - -c + - | + echo "Downloading Phi-3-mini" + /tmp/venv/bin/huggingface-cli download microsoft/Phi-3-mini-4k-instruct --local-dir /mnt/models/phi3-mini + echo "Download complete!" + volumeMounts: + - name: model-storage + mountPath: /mnt/models + containers: + - name: placeholder + image: registry.access.redhat.com/ubi9/ubi-minimal:latest + command: ["sleep", "infinity"] + volumes: + - name: model-storage + persistentVolumeClaim: + claimName: phi3-model-pvc +--- +apiVersion: serving.kserve.io/v1beta1 +kind: InferenceService +metadata: + name: vllm-phi3 +spec: + predictor: + containers: + - name: kserve-container + image: registry.redhat.io/rhaiis/vllm-cuda-rhel9:3 + args: + - --model=/mnt/models/phi3-mini + - --host=0.0.0.0 + - --port=8080 + - --served-model-name=phi3-mini + - --max-model-len=4096 + - --gpu-memory-utilization=0.7 + - --trust-remote-code + - --dtype=half + env: + - name: HF_HOME + value: /tmp/hf_cache + volumeMounts: + - name: model-storage + mountPath: /mnt/models + readOnly: true + resources: + limits: + nvidia.com/gpu: 1 + cpu: "6" + memory: "24Gi" + requests: + nvidia.com/gpu: 1 + cpu: "2" + memory: "8Gi" + volumes: + - name: model-storage + persistentVolumeClaim: + claimName: phi3-model-pvc +``` +Deploy: + +```bash +oc apply -f vllm-inferenceservice.yml -n +``` + +Monitor model download progress: + +```bash +oc logs -n -l app=phi3-downloader -c download-model -f +``` + +Wait for "Download complete!" message. The Phi-3-mini model is approximately 8GB and may take 3-5 minutes to download. +Verify vLLM is running: + +```bash +oc get inferenceservice vllm-phi3 -n +oc get pods -n | grep vllm-phi3 +``` + +Expected: `vllm-phi3` InferenceService shows `READY = True` and pod shows `1/1 Running`. + +### Step 3: Deploy NeMo Guardrails ConfigMap + +The ConfigMap contains detector configurations and flow definitions. Detectors are registered in the `detections_api_detectors` section with their endpoint URLs and detection parameters. + +Create `nemo-detections-configmap.yaml`: +```yaml +apiVersion: v1 +kind: ConfigMap +metadata: + name: nemo-detections-configmap +data: + config.yaml: | + rails: + config: + detections_api_detectors: + granite_hap: + inference_endpoint: "http://guardrails-detector-ibm-guardian-predictor..svc.cluster.local:8000/api/v1/text/contents" + detector_id: "granite-guardian-hap" + threshold: 0.5 + timeout: 120 + detector_params: {} + input: + flows: + - check_input_safety_detections_api + models: + - type: main + engine: vllm_openai + model: phi3-mini + parameters: + openai_api_base: http://vllm-phi3-predictor..svc.cluster.local:8080/v1 + openai_api_key: sk-dummy-key + instructions: + - type: general + content: | + You are a helpful AI assistant. + + rails.co: | + define flow check_input_safety_detections_api + $input_result = execute detections_api_check_all_detectors + + if $input_result.unavailable_detectors + bot refuse with message $input_result.reason + stop + + if not $input_result.allowed + bot refuse with message $input_result.reason + stop + + define bot refuse with message $msg + $msg +``` + +**Configuration Fields:** +- `inference_endpoint`: Full URL to detector's `/api/v1/text/contents` endpoint +- `detector_id`: Identifier sent in `detector-id` header (detector-specific) +- `threshold`: Minimum score to trigger blocking (0.0-1.0) +- `timeout`: Request timeout in seconds (increase for CPU-based detectors) +- `detector_params`: Optional detector-specific parameters (sent in request body) + +**Important:** +- Timeout should be 120+ seconds for CPU-based detectors like Granite Guardian +- Replace `` with your actual namespace +- `detector_id` must match what the detector service expects + +Deploy: +```bash +oc apply -f nemo-detections-configmap.yaml -n +``` + +Verify: +```bash +oc get configmap nemo-detections-configmap -n +``` + +### Step 4: Deploy NeMo Guardrails Server + +Create `nemo-deployment.yaml`: +```yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: nemo-guardrails-server +spec: + replicas: 1 + selector: + matchLabels: + app: nemo-guardrails + template: + metadata: + labels: + app: nemo-guardrails + spec: + containers: + - name: nemo-guardrails + image: quay.io/rh-ee-stondapu/trustyai-nemo:latest + imagePullPolicy: Always + env: + - name: CONFIG_ID + value: production + - name: OPENAI_API_KEY + value: sk-dummy-key + - name: DETECTIONS_API_KEY + value: "your-global-token" + ports: + - containerPort: 8000 + volumeMounts: + - name: config-volume + mountPath: /app/config/production + resources: + requests: + cpu: "500m" + memory: "1Gi" + limits: + cpu: "2" + memory: "4Gi" + volumes: + - name: config-volume + configMap: + name: nemo-detections-configmap +--- +apiVersion: v1 +kind: Service +metadata: + name: nemo-guardrails-server +spec: + selector: + app: nemo-guardrails + ports: + - port: 8000 + targetPort: 8000 + type: ClusterIP +--- +apiVersion: route.openshift.io/v1 +kind: Route +metadata: + name: nemo-guardrails-server +spec: + port: + targetPort: 8000 + tls: + termination: edge + insecureEdgeTerminationPolicy: Allow + to: + kind: Service + name: nemo-guardrails-server +``` + +Deploy: +```bash +oc apply -f nemo-deployment.yaml -n +``` + +Get the external route URL: +```bash +YOUR_ROUTE="http://$(oc get route nemo-guardrails-server -n -o jsonpath='{.spec.host}')" +echo "NeMo Guardrails URL: $YOUR_ROUTE" +``` + +Verify NeMo is running: +```bash +oc get pods -n | grep nemo-guardrails-server +``` + +Expected: Pod shows `1/1 Running` + +Check logs to confirm detector loaded: +```bash +oc logs -n $(oc get pods -n -l app=nemo-guardrails -o name | head -1) +``` + +Expected log output should show: +``` +Configuration validated. Starting server... +Application startup complete. +Uvicorn running on http://0.0.0.0:8000 +``` + +No "Failed to register" errors should appear. + +## Testing + +Extract the NeMo route for testing: +```bash +YOUR_ROUTE="http://$(oc get route nemo-guardrails-server -n -o jsonpath='{.spec.host}')" +echo "Testing against: $YOUR_ROUTE" +``` + +### Test 1: Safe Content (Should Pass) + +What this tests: Verifies detectors allow safe content and LLM generates response. +```bash +curl -X POST $YOUR_ROUTE/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "config_id": "production", + "messages": [ + {"role": "user", "content": "Hello, how are you today?"} + ] + }' +``` + +**Expected Output:** +```json +{ + "messages": [ + { + "role": "assistant", + "content": "I'm doing well, thank you for asking! How can I assist you today?" + } + ] +} +``` + +Content passes all detectors and LLM generates helpful response. + +### Test 2: Jailbreak Detection (Should Block) + +What this tests: Verifies Granite Guardian detects and blocks jailbreak attempts. +```bash +curl -X POST $YOUR_ROUTE/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{ + "config_id": "production", + "messages": [ + {"role": "user", "content": "How to hack into a system"} + ] + }' +``` + +**Expected Output:** +```json +{ + "messages": [ + { + "role": "assistant", + "content": "Blocked by 3 detections across 1 type(s) (highest score=1.00)" + } + ] +} +``` + +**Note:** This request takes 30-120 seconds due to Granite Guardian running on CPU. The detector identified multiple risk categories (harm, jailbreak, unethical_behavior) with high confidence scores. + +### Test 3: Verify Detector Communication + +What this tests: Confirms NeMo successfully communicates with Detections API endpoint. + +Check NeMo logs during Test 2: +```bash +oc logs -n $(oc get pods -n -l app=nemo-guardrails -o name | head -1) --tail=100 | grep "detections_api_check_all_detectors" +``` + +**Expected Log Output:** +``` +execute detections_api_check_all_detectors +# The result was {'allowed': False, 'reason': 'Blocked by 1 Detections API detector(s): granite_hap', ...} +``` + +Log shows detector was called, returned structured detections, and content was blocked. + +### Understanding Response Formats + +**Detections API Response Structure:** +```json +[[ + { + "start": 0, + "end": 25, + "detection_type": "causal_lm", + "detection": "causal_lm", + "score": 0.9985, + "sequence_classification": "jailbreak", + "text": "How to hack into a system" + }, + { + "start": 0, + "end": 25, + "detection_type": "causal_lm", + "detection": "causal_lm", + "score": 0.9978, + "sequence_classification": "harm", + "text": "How to hack into a system" + } +]] +``` + +**Key Fields:** +- `detection_type`: Broad category +- `detection`: Specific detection class +- `score`: Confidence (0.0-1.0) +- `sequence_classification`: Risk category identified + +**How Parser Handles Multiple Detections:** +1. Extracts all detections from nested array `[[...]]` +2. Filters by threshold: keeps detections where `score >= threshold` +3. If any detection meets threshold: `allowed = False` +4. Primary score: Highest individual detection score +5. Label format: `"{detection_type}:{detection}"` from highest scoring detection +6. All detections included in metadata for audit trail + +**Example with threshold=0.5:** +- Detection 1: jailbreak, score=0.998 → Triggers blocking +- Detection 2: harm, score=0.997 → Also triggers +- Result: `allowed=False`, `score=0.998`, `label="causal_lm:causal_lm"` +- Metadata contains both detections with individual scores + +## Adding New Detectors + +No code changes required. The system is fully configuration-driven. + +### Steps to Add a Detector + +1. **Deploy a detector service** implementing Detections API v1/text/contents protocol +2. **Determine the detector-id** required by the service +3. **Choose appropriate threshold** for your use case +4. **Add detector configuration** to NeMo ConfigMap +5. **Apply ConfigMap and restart** NeMo to load new detector + +### Example: Adding a New Detector + +This example shows adding a hypothetical toxicity detector to complement Granite Guardian. + +**Step 1: Deploy Detector Service** + +Follow the detector service's deployment instructions. For TrustyAI guardrails-detectors, use the repository's deployment files similar to Granite Guardian. + +**Step 2: Test Detector Endpoint** + +Identify the detector-id and test the endpoint directly: +```bash +# Port forward to detector service +oc port-forward -n svc/your-detector-predictor 8000:8000 + +# Test with sample content +curl -X POST http://localhost:8000/api/v1/text/contents \ + -H "detector-id: your-detector-id" \ + -H "Content-Type: application/json" \ + -d '{"contents": ["test content"], "detector_params": {}}' +``` + +Examine the response to understand: +- What `detector-id` value to use +- Detection score ranges +- What constitutes a detection (for threshold tuning) + +**Step 3: Add to ConfigMap** + +Edit `nemo-detections-configmap.yaml` and add your detector: +```yaml +detections_api_detectors: + granite_hap: + # ... existing detector ... + + your_detector: # Detector name (used in logs and error messages) + inference_endpoint: "http://your-detector-predictor..svc.cluster.local:8000/api/v1/text/contents" + detector_id: "your-detector-id" + threshold: 0.7 + timeout: 30 + detector_params: {} +``` + +**Step 4: Apply and Restart** +```bash +oc apply -f nemo-detections-configmap.yaml -n +oc rollout restart deployment/nemo-guardrails-server -n +``` + +**Step 5: Test New Detector** +```bash +curl -X POST $YOUR_ROUTE/v1/chat/completions \ + -H "Content-Type: application/json" \ + -d '{"config_id": "production", "messages": [{"role": "user", "content": "content that triggers your detector"}]}' +``` + +Check logs to verify detector executed: +```bash +oc logs -n $(oc get pods -n -l app=nemo-guardrails -o name | head -1) --tail=50 | grep "your_detector" +``` + +### Determining Configuration Values + +**Threshold Selection:** +- Start with `0.5` (moderate sensitivity) +- Test with sample content +- Increase (e.g., 0.7) to reduce false positives +- Decrease (e.g., 0.3) to catch more potential issues +- Monitor blocking rates and adjust + +**Timeout Selection:** +- CPU-based detectors: 60-120 seconds +- GPU-based detectors: 10-30 seconds +- Network latency considerations: Add 5-10 seconds buffer +- Monitor actual response times in logs + +**detector_params:** +- Consult detector service documentation +- Used for detector-specific configuration +- Passed through to detector service in request body +- Example: `{"language": "en", "categories": ["pii", "toxicity"]}` + +## Authentication (Optional) + +Detections API services can be secured with authentication to restrict access. + +### Prerequisites for Authentication + +Authentication requires one of: +- Service Mesh (Istio) with Authorino (for OpenShift AI/OpenDataHub deployments) +- API Gateway with authentication capabilities +- Alternative authentication mechanism (OAuth proxy, etc.) + +### Enabling Authentication on Detector Services + +Authentication configuration depends on your detector deployment method and infrastructure. + +**For TrustyAI Guardrails Detectors with OpenShift AI:** + +Add authentication annotations to InferenceService: +```yaml +apiVersion: serving.kserve.io/v1beta1 +kind: InferenceService +metadata: + name: guardrails-detector-ibm-guardian + annotations: + security.opendatahub.io/enable-auth: 'true' + serving.kserve.io/deploymentMode: RawDeployment + serving.knative.openshift.io/enablePassthrough: 'true' + sidecar.istio.io/inject: 'true' +spec: + # ... rest of spec +``` + +**Note:** Authentication annotations vary by cluster infrastructure. Consult your cluster administrator for correct configuration. + +### Configuring NeMo Authentication to Detectors + +NeMo supports both global authentication tokens and per-detector tokens with automatic fallback. + +**Option 1: Global Token (All Detectors)** + +Set environment variable in NeMo deployment: +```yaml +env: + - name: CONFIG_ID + value: production + - name: DETECTIONS_API_KEY + value: "your-bearer-token" +``` + +All detectors without explicit `api_key` will use this token. + +**Option 2: Per-Detector Tokens** + +Specify in ConfigMap: +```yaml +detections_api_detectors: + granite_hap: + inference_endpoint: "..." + detector_id: "granite-guardian-hap" + api_key: "granite-specific-token" + threshold: 0.5 + + other_detector: + inference_endpoint: "..." + detector_id: "other-id" + # No api_key specified - falls back to DETECTIONS_API_KEY env var + threshold: 0.7 +``` + +**Token Priority:** Per-detector `api_key` → Global `DETECTIONS_API_KEY` env var → No authentication + +**Getting Tokens:** +```bash +# For OpenShift service accounts +oc sa get-token -n + +# For OpenShift AI secured services +oc whoami -t +``` \ No newline at end of file diff --git a/nemoguardrails/library/detector_clients/__init__.py b/nemoguardrails/library/detector_clients/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/nemoguardrails/library/detector_clients/actions.py b/nemoguardrails/library/detector_clients/actions.py new file mode 100644 index 000000000..1e658133f --- /dev/null +++ b/nemoguardrails/library/detector_clients/actions.py @@ -0,0 +1,253 @@ +""" +NeMo action functions for Detections API integration. +""" + +import asyncio +import logging +from typing import Any, Dict, List, Optional + +from pydantic import BaseModel, Field +from nemoguardrails.actions import action + +from nemoguardrails.library.detector_clients.base import DetectorResult +from nemoguardrails.library.detector_clients.detections_api import DetectionsAPIClient + +log = logging.getLogger(__name__) + + +class AggregatedDetectorResult(BaseModel): + """Aggregated result from multiple detectors""" + allowed: bool = Field(description="Whether content passed all detectors") + reason: str = Field(description="Summary of detection results") + blocking_detectors: List[DetectorResult] = Field( + default_factory=list, + description="Detectors that blocked content" + ) + allowing_detectors: List[DetectorResult] = Field( + default_factory=list, + description="Detectors that approved content" + ) + detector_count: int = Field(description="Total number of detectors run") + unavailable_detectors: Optional[List[str]] = Field( + default=None, + description="Detectors that encountered system errors" + ) + + +async def _run_detections_api_detector( + detector_name: str, + detector_config: Any, + text: str +) -> DetectorResult: + """ + Execute single Detections API detector. + + Internal helper function used by action functions. + + Args: + detector_name: Name of the detector + detector_config: DetectionsAPIConfig object + text: Input text to analyze + + Returns: + DetectorResult with detection outcome + """ + try: + client = DetectionsAPIClient(detector_config, detector_name) + result = await client.detect(text) + return result + + except Exception as e: + log.error(f"{detector_name} error: {e}") + return DetectorResult( + allowed=False, + score=0.0, + reason=f"{detector_name} not reachable: {str(e)}", + label="ERROR", + detector=detector_name, + metadata={"error": str(e)} + ) + + +@action() +async def detections_api_check_all_detectors( + context: Optional[Dict] = None, + config: Optional[Any] = None, + **kwargs +) -> Dict[str, Any]: + """ + Run all configured Detections API detectors in parallel. + + This is the main action function called by NeMo rails.co flows. + + Args: + context: NeMo context dict containing user_message, config, etc. + config: NeMo config object + **kwargs: Additional keyword arguments + + Returns: + Dict representation of AggregatedDetectorResult + """ + if context is None: + context = {} + + if not config: + config = context.get("config") + + if not config: + return {"allowed": False, "reason": "No configuration"} + + user_message = context.get("user_message", "") + if isinstance(user_message, dict): + user_message = user_message.get("content", "") + + detections_api_detectors = getattr( + config.rails.config, + 'detections_api_detectors', + {} + ) + + if not detections_api_detectors: + return { + "allowed": True, + "reason": "No Detections API detectors configured" + } + + log.info( + f"Running {len(detections_api_detectors)} Detections API detectors: " + f"{list(detections_api_detectors.keys())}" + ) + + tasks_with_names = [ + (name, _run_detections_api_detector(name, config_obj, user_message)) + for name, config_obj in detections_api_detectors.items() + ] + + results = await asyncio.gather( + *[task[1] for task in tasks_with_names], + return_exceptions=True + ) + + system_errors = [] + content_blocks = [] + allowing = [] + + for i, result in enumerate(results): + detector_name = tasks_with_names[i][0] + + if isinstance(result, Exception): + log.error(f"{detector_name} exception: {result}") + error_result = DetectorResult( + allowed=False, + score=0.0, + reason=f"Exception: {result}", + label="ERROR", + detector=detector_name, + metadata={"error": str(result)} + ) + system_errors.append(error_result) + elif result.label == "ERROR": + system_errors.append(result) + elif not result.allowed: + content_blocks.append(result) + else: + allowing.append(result) + + if system_errors: + unavailable = [e.detector for e in system_errors] + reason = ( + f"System error: {len(system_errors)} Detections API detector(s) " + f"unavailable - {', '.join(unavailable)}" + ) + log.warning(reason) + + return AggregatedDetectorResult( + allowed=False, + reason=reason, + unavailable_detectors=unavailable, + blocking_detectors=content_blocks, + allowing_detectors=allowing, + detector_count=len(detections_api_detectors) + ).dict() + + overall_allowed = len(content_blocks) == 0 + + if overall_allowed: + reason = f"Approved by all {len(allowing)} Detections API detectors" + else: + detector_names = [d.detector for d in content_blocks] + reason = ( + f"Blocked by {len(content_blocks)} Detections API detector(s): " + f"{', '.join(set(detector_names))}" + ) + + log.info(f"Detections API: {'ALLOWED' if overall_allowed else 'BLOCKED'}: {reason}") + + return AggregatedDetectorResult( + allowed=overall_allowed, + reason=reason, + blocking_detectors=content_blocks, + allowing_detectors=allowing, + detector_count=len(detections_api_detectors) + ).dict() + + +@action() +async def detections_api_check_detector( + context: Optional[Dict] = None, + config: Optional[Any] = None, + detector_name: str = "mock_pii", + **kwargs +) -> Dict[str, Any]: + """ + Run specific Detections API detector by name. + + Args: + context: NeMo context dict + config: NeMo config object + detector_name: Name of detector to run + **kwargs: Additional keyword arguments + + Returns: + Dict representation of DetectorResult + """ + if context is None: + context = {} + + if not config: + config = context.get("config") + + if not config: + return {"allowed": False, "reason": "No configuration"} + + user_message = context.get("user_message", "") + if isinstance(user_message, dict): + user_message = user_message.get("content", "") + + detections_api_detectors = getattr( + config.rails.config, + 'detections_api_detectors', + {} + ) + + if detector_name not in detections_api_detectors: + return {"allowed": True, "score": 0.0, "label": "NOT_CONFIGURED"} + + detector_config = detections_api_detectors[detector_name] + + if detector_config is None: + return {"allowed": True, "score": 0.0, "label": "NONE"} + + result = await _run_detections_api_detector( + detector_name, + detector_config, + user_message + ) + + log.info( + f"Detections API {detector_name}: " + f"{'allowed' if result.allowed else 'blocked'} " + f"(score={result.score:.3f})" + ) + + return result.dict() \ No newline at end of file diff --git a/nemoguardrails/library/detector_clients/base.py b/nemoguardrails/library/detector_clients/base.py new file mode 100644 index 000000000..95045a0a7 --- /dev/null +++ b/nemoguardrails/library/detector_clients/base.py @@ -0,0 +1,191 @@ +""" +Base interface for detector clients. +All detector implementations must inherit from this class. +""" + +from abc import ABC, abstractmethod +from typing import Any, Dict, Optional +import os +import asyncio +import aiohttp +import logging + +from pydantic import BaseModel, Field + +log = logging.getLogger(__name__) + +# Global HTTP session for connection pooling +_http_session: Optional[aiohttp.ClientSession] = None +_session_lock = asyncio.Lock() + + +class DetectorResult(BaseModel): + """Standardized result from detector execution""" + allowed: bool = Field(description="Whether content is allowed") + score: float = Field(description="Detection confidence score (0.0-1.0)") + reason: str = Field(description="Human-readable explanation") + label: str = Field(description="Detection label or category") + detector: str = Field(description="Detector name") + metadata: Optional[Dict[str, Any]] = Field(default=None, description="Additional detection metadata") + + +class BaseDetectorClient(ABC): + """ + Abstract base class for all detector clients. + Defines the interface that all detector implementations must follow. + """ + + def __init__(self, config: Any, detector_name: str): + """ + Initialize detector client with configuration. + + Args: + config: Detector-specific configuration object + """ + self.config = config + self.detector_name = detector_name + self.endpoint = getattr(config, 'inference_endpoint', '') + self.timeout = getattr(config, 'timeout', 30) + self.api_key = getattr(config, 'api_key', None) + + @abstractmethod + async def detect(self, text: str) -> DetectorResult: + """ + Main entry point for detection. + Orchestrates the detection flow: build request -> call endpoint -> parse response. + + Args: + text: Input text to analyze + + Returns: + DetectorResult with detection outcome + """ + pass + + @abstractmethod + def build_request(self, text: str) -> Dict[str, Any]: + """ + Build API-specific request payload. + + Args: + text: Input text to analyze + + Returns: + Request payload dict in API-specific format + """ + pass + + @abstractmethod + def parse_response(self, response: Any, http_status: int) -> DetectorResult: + """ + Parse API-specific response into standardized DetectorResult. + + Args: + response: API response data + http_status: HTTP status code from response + + Returns: + DetectorResult with parsed detection outcome + """ + pass + + async def _call_endpoint( + self, + endpoint: str, + payload: Dict[str, Any], + timeout: int, + headers: Optional[Dict[str, str]] = None + ) -> tuple[Any, int]: + """ + Make HTTP POST request to detector endpoint. + Shared implementation for all detector types. + + Args: + endpoint: API endpoint URL + payload: Request payload + timeout: Request timeout in seconds + headers: Optional HTTP headers + + Returns: + Tuple of (response_data, http_status_code) + + Raises: + Exception: On HTTP errors or timeouts + """ + global _http_session + + # Lazy session initialization + if _http_session is None: + async with _session_lock: + if _http_session is None: + _http_session = aiohttp.ClientSession() + + # Build headers + request_headers = {"Content-Type": "application/json"} + if headers: + request_headers.update(headers) + + # Add auth if configured (per-detector key or global env var) + token = self.api_key or os.getenv("DETECTIONS_API_KEY") + if token: + request_headers["Authorization"] = f"Bearer {token}" + + + timeout_config = aiohttp.ClientTimeout(total=timeout) + + try: + async with _http_session.post( + endpoint, + json=payload, + headers=request_headers, + timeout=timeout_config + ) as response: + http_status = response.status + + if http_status == 200: + response_data = await response.json() + return response_data, http_status + else: + error_text = await response.text() + raise Exception(f"HTTP {http_status}: {error_text}") + + except asyncio.TimeoutError: + raise Exception(f"Request timeout after {timeout}s") + except aiohttp.ClientError as e: + raise Exception(f"HTTP client error: {str(e)}") + + def _handle_error(self, error: Exception, detector_name: str) -> DetectorResult: + """ + Convert exceptions into DetectorResult with error state. + Shared error handling for all detector types. + + Args: + error: Exception that occurred + detector_name: Name of detector for error reporting + + Returns: + DetectorResult indicating system error (blocked state) + """ + error_message = str(error) + + # Check if it's an HTTP error + if "HTTP 404" in error_message or "HTTP 500" in error_message or "HTTP 503" in error_message: + label = "ERROR" + reason = f"{detector_name} unavailable: {error_message}" + elif "timeout" in error_message.lower(): + label = "TIMEOUT" + reason = f"{detector_name} timeout: {error_message}" + else: + label = "ERROR" + reason = f"{detector_name} error: {error_message}" + + log.error(f"{detector_name} error: {error}") + + return DetectorResult( + allowed=False, + score=0.0, + reason=reason, + label=label, + detector=detector_name, + metadata={"error": error_message} + ) \ No newline at end of file diff --git a/nemoguardrails/library/detector_clients/detections_api.py b/nemoguardrails/library/detector_clients/detections_api.py new file mode 100644 index 000000000..b47660cc4 --- /dev/null +++ b/nemoguardrails/library/detector_clients/detections_api.py @@ -0,0 +1,287 @@ +""" +Detections API v1/text/contents client implementation. +Handles communication with FMS-style detection endpoints. +""" + +import logging +from typing import Any, Dict, List + +from .base import BaseDetectorClient, DetectorResult + +log = logging.getLogger(__name__) + + +class DetectionsAPIClient(BaseDetectorClient): + """ + Client for Detections API v1/text/contents endpoint. + + Expected API format: + - Request: POST with detector-id header, {"contents": [text], "detector_params": {}} + - Response: [[{detection1}, {detection2}, ...]] + """ + + def __init__(self, config: Any, detector_name: str): + """ + Initialize Detections API client. + + Args: + config: DetectionsAPIConfig with endpoint, detector_id, threshold, etc. + """ + super().__init__(config, detector_name) + self.detector_id = getattr(config, 'detector_id', '') + self.threshold = getattr(config, 'threshold', 0.5) + self.detector_params = getattr(config, 'detector_params', {}) + + if not self.detector_id: + raise ValueError("detector_id is required for DetectionsAPIClient") + + async def detect(self, text: str) -> DetectorResult: + """ + Run detection on input text. + + Args: + text: Input text to analyze + + Returns: + DetectorResult with detection outcome + """ + try: + payload = self.build_request(text) + headers = {"detector-id": self.detector_id} + + response_data, http_status = await self._call_endpoint( + endpoint=self.endpoint, + payload=payload, + timeout=self.timeout, + headers=headers + ) + + result = self.parse_response(response_data, http_status) + + log.info( + f"{self.detector_name}: {'allowed' if result.allowed else 'blocked'} " + f"(score={result.score:.3f}, " + f"detections={result.metadata.get('detection_count', 0) if result.metadata else 0})" + ) + + return result + + except Exception as e: + return self._handle_error(e, self.detector_name) + + def build_request(self, text: str) -> Dict[str, Any]: + """ + Build Detections API request payload. + + Args: + text: Input text to analyze + + Returns: + Request dict: {"contents": [text], "detector_params": {...}} + """ + return { + "contents": [text], + "detector_params": self.detector_params + } + + def parse_response(self, response: Any, http_status: int) -> DetectorResult: + """ + Parse Detections API response into DetectorResult. + + Response format: [[{detection1}, {detection2}, ...]] + Each detection: {start, end, text, detection_type, detection, score, evidence, metadata} + + Args: + response: API response data + http_status: HTTP status code + + Returns: + DetectorResult with parsed detection outcome + """ + if http_status != 200: + return DetectorResult( + allowed=False, + score=0.0, + reason=f"HTTP {http_status} error", + label="ERROR", + detector=self.detector_name, + metadata={"http_status": http_status} + ) + + if not isinstance(response, list): + return DetectorResult( + allowed=False, + score=0.0, + reason="Invalid response format: expected list", + label="INVALID_RESPONSE", + detector=self.detector_name, + metadata={"response_type": type(response).__name__} + ) + + detections = self._extract_detections_from_response(response) + + if not detections: + return DetectorResult( + allowed=True, + score=0.0, + reason="No detections found", + label="NONE", + detector=self.detector_name, + metadata={"detection_count": 0} + ) + + filtered_detections = [ + d for d in detections + if d.get("score", 0.0) >= self.threshold + ] + + if not filtered_detections: + return DetectorResult( + allowed=True, + score=self._calculate_highest_score(detections), + reason=f"All detections below threshold {self.threshold}", + label="BELOW_THRESHOLD", + detector=self.detector_name, + metadata={ + "detection_count": len(detections), + "detections_above_threshold": 0, + "average_score": self._calculate_average_score(detections), + "detections": detections + } + ) + + highest_detection = self._get_highest_score_detection(filtered_detections) + highest_score = highest_detection.get("score", 0.0) + + detection_type = highest_detection.get("detection_type", "unknown") + detection_name = highest_detection.get("detection", "unknown") + label = f"{detection_type}:{detection_name}" + + reason = self._build_reason_message(filtered_detections) + individual_scores = [d.get("score", 0.0) for d in filtered_detections] + average_score = self._calculate_average_score(filtered_detections) + + return DetectorResult( + allowed=False, + score=highest_score, + reason=reason, + label=label, + detector=self.detector_name, + metadata={ + "detection_count": len(filtered_detections), + "total_detections": len(detections), + "average_score": average_score, + "individual_scores": individual_scores, + "highest_detection": highest_detection, + "detections": filtered_detections + } + ) + + def _extract_detections_from_response( + self, + response: List + ) -> List[Dict[str, Any]]: + """ + Extract detections from nested array structure. + + Response format: [[{detection1}, {detection2}]] + + Args: + response: API response list + + Returns: + Flat list of detection dicts + """ + if not response: + return [] + + if isinstance(response[0], list): + return response[0] + + return response + + def _calculate_highest_score(self, detections: List[Dict[str, Any]]) -> float: + """ + Get highest score from detections. + + Args: + detections: List of detection dicts + + Returns: + Highest score value + """ + if not detections: + return 0.0 + + scores = [d.get("score", 0.0) for d in detections] + return max(scores) if scores else 0.0 + + def _calculate_average_score(self, detections: List[Dict[str, Any]]) -> float: + """ + Calculate average score from detections. + + Args: + detections: List of detection dicts + + Returns: + Average score value + """ + if not detections: + return 0.0 + + scores = [d.get("score", 0.0) for d in detections] + return sum(scores) / len(scores) if scores else 0.0 + + def _get_highest_score_detection( + self, + detections: List[Dict[str, Any]] + ) -> Dict[str, Any]: + """ + Get detection with highest score. + + Args: + detections: List of detection dicts + + Returns: + Detection dict with highest score + """ + if not detections: + return {} + + return max(detections, key=lambda d: d.get("score", 0.0)) + + def _build_reason_message(self, detections: List[Dict[str, Any]]) -> str: + """ + Build human-readable reason message from detections. + + Args: + detections: List of detection dicts + + Returns: + Formatted reason string + """ + count = len(detections) + + if count == 0: + return "No detections found" + + if count == 1: + det = detections[0] + detection_type = det.get("detection_type", "unknown") + detection_name = det.get("detection", "unknown") + score = det.get("score", 0.0) + return ( + f"Blocked by {detection_type}:{detection_name} " + f"(score={score:.2f})" + ) + + detection_types = set( + d.get("detection_type", "unknown") for d in detections + ) + highest = self._get_highest_score_detection(detections) + highest_score = highest.get("score", 0.0) + + return ( + f"Blocked by {count} detections across {len(detection_types)} type(s) " + f"(highest score={highest_score:.2f})" + ) \ No newline at end of file diff --git a/nemoguardrails/rails/llm/config.py b/nemoguardrails/rails/llm/config.py index 6e463f963..dfe289776 100644 --- a/nemoguardrails/rails/llm/config.py +++ b/nemoguardrails/rails/llm/config.py @@ -839,6 +839,66 @@ def get_validator_config(self, name: str) -> Optional[GuardrailsAIValidatorConfi return _validator return None +class KServeDetectorConfig(BaseModel): + """Configuration for single KServe detector.""" + + inference_endpoint: str = Field( + description="The KServe API endpoint for the detector" + ) + model_name: Optional[str] = Field( + default=None, + description="The name of the KServe model" + ) + threshold: float = Field( + default=0.5, + description="Probability threshold for detection" + ) + timeout: int = Field( + default=30, + description="HTTP request timeout in seconds" + ) + api_key: Optional[str] = Field( + default=None, + description="Bearer token for authenticating to this detector. If not specified, uses KSERVE_API_KEY environment variable." + ) + safe_labels: List[int] = Field( + default_factory=lambda: [0], + description="Class indices considered safe" + ) + +class DetectionsAPIConfig(BaseModel): + """Configuration for Detections API v1/text/contents detector.""" + + inference_endpoint: str = Field( + description="Detections API endpoint URL (e.g., http://service.com/v1/text/contents)" + ) + + detector_id: str = Field( + description="Detector ID to send in detector-id header (e.g., dummy-en-pii-v1)" + ) + + threshold: float = Field( + default=0.5, + ge=0.0, + le=1.0, + description="Detection threshold (0.0-1.0). Block if any detection score >= threshold" + ) + + timeout: int = Field( + default=30, + gt=0, + description="Request timeout in seconds" + ) + + api_key: Optional[str] = Field( + default=None, + description="Optional API key for authentication (Bearer token)" + ) + + detector_params: Optional[Dict[str, Any]] = Field( + default_factory=dict, + description="Optional detector-specific parameters to send in request" + ) class TrendMicroRailConfig(BaseModel): """Configuration data for the Trend Micro AI Guard API""" @@ -945,6 +1005,16 @@ class RailsConfigData(BaseModel): description="Configuration for Guardrails AI validators.", ) + kserve_detectors: Optional[Dict[str, KServeDetectorConfig]] = Field( + default_factory=dict, + description="Dynamic registry of KServe detectors. Keys are detector names, values are detector configurations." + ) + + detections_api_detectors: Optional[Dict[str, DetectionsAPIConfig]] = Field( + default_factory=dict, + description="Dynamic registry of Detections API detectors. Keys are detector names, values are detector configurations." + ) + trend_micro: Optional[TrendMicroRailConfig] = Field( default_factory=TrendMicroRailConfig, description="Configuration for Trend Micro.",