Skip to content

feat(aws-observability): add aws-observability power#60

Open
gcacace wants to merge 5 commits intokirodotdev:mainfrom
gcacace:main
Open

feat(aws-observability): add aws-observability power#60
gcacace wants to merge 5 commits intokirodotdev:mainfrom
gcacace:main

Conversation

@gcacace
Copy link
Contributor

@gcacace gcacace commented Feb 6, 2026

Description of changes:

This PR introduces the AWS Observability Power, a comprehensive observability platform that integrates CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, and AWS Documentation access into a unified Kiro power.

Key Components:

  1. Power Configuration (POWER.md)

    • Comprehensive documentation covering all observability capabilities
    • Integration of 4 MCP servers: CloudWatch, Application Signals, CloudTrail, and AWS Documentation
    • Detailed setup instructions with prerequisites and configuration examples
    • Quick start examples and common workflow patterns
    • Codebase observability analysis capability for identifying instrumentation gaps
    • "When to load" guidance for each steering file moved into POWER.md for discoverability
  2. MCP Server Configuration (mcp.json)

    • CloudWatch MCP server for logs, metrics, and alarms
    • Application Signals MCP server for APM and distributed tracing
    • CloudTrail MCP server for security auditing
    • AWS Documentation MCP server for reference access
    • Configurable AWS profile and region settings
  3. Steering Files (8 comprehensive guides):

    • incident-response.md - Complete incident response framework with 6 phases (detection, investigation, mitigation, recovery, RCA, postmortem)
    • alerting-setup.md - Intelligent alarm configuration with recommended thresholds and best practices
    • application-signals-setup.md - Application Signals enablement guide and setup workflows
    • log-analysis.md - CloudWatch Logs Insights query patterns and syntax reference
    • performance-monitoring.md - Application Signals APM monitoring, SLO management, and trace analysis
    • security-auditing.md - CloudTrail security monitoring, compliance auditing, and incident investigation
    • observability-gap-analysis.md - Automated codebase analysis to identify observability gaps across logging, metrics, tracing, and error handling with multi-language support
    • cloudtrail-data-source-selection.md - Decision tree and priority strategy for CloudTrail data source selection (CloudTrail Lake → CloudWatch Logs → Lookup Events API), including query translation examples across all three sources

Features:

  • Unified observability across logs, metrics, traces, and security events
  • Automated pattern detection and anomaly analysis
  • SLO-based monitoring and alerting
  • Comprehensive incident response workflows
  • Security auditing and compliance tracking
  • Integration patterns for correlating multiple data sources
  • Codebase observability gap analysis with multi-language support (Python, Java, JS/TS, Go, Ruby, C#/.NET)
  • CloudTrail data source priority strategy for optimal query performance and cost

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

- `logs:*` for CloudWatch Logs operations
- `cloudwatch:*` for CloudWatch Metrics, Alarms, and Application Signals
- `xray:*` for distributed tracing
- `cloudtrail:LookupEvents` for CloudTrail queries
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Missing prerequisite: Application Signals enablement

The existing cloudwatch-application-signals power requires this as a prerequisite:

Application Signals enabled in your AWS account (Getting started with Application Signals)

Without this, the awslabs.cloudwatch-applicationsignals-mcp-server will fail silently. This should be listed as a prerequisite.

Also, the broad cloudwatch:* permission doesn't cover application-signals:* API actions (e.g., application-signals:ListServices, application-signals:GetService, application-signals:ListServiceOperations, etc.), nor synthetics:GetCanary/synthetics:GetCanaryRuns for canary analysis, nor s3:GetObject/s3:ListBucket for canary artifacts, nor iam:GetRole/iam:ListAttachedRolePolicies/iam:GetPolicy/iam:GetPolicyVersion for the enablement guide. See the existing power's POWER.md for the full 23-action IAM policy.

- Error rate and latency tracking (P50, P90, P95, P99)
- Automatic service discovery
- SLO compliance monitoring and error budget tracking
- Enablement guide for setup assistance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing key Application Signals capabilities

This section is missing several important capabilities that are documented in the existing power:

  1. 100% Trace Visibilitysearch_transaction_spans queries OpenTelemetry spans via CloudWatch Logs Insights with 100% sampling (vs X-Ray's 5%). This is a major differentiator.
  2. Canary Failure Analysisanalyze_canary_failures provides root cause investigation for CloudWatch Synthetics canaries.
  3. Primary Audit Toolsaudit_services, audit_slos, audit_service_operations are the recommended entry points for all investigation workflows. They support wildcard targeting and multi-auditor analysis (7 auditor types: slo, operation_metric, trace, log, dependency_metric, top_contributor, service_quota).

These are the most powerful tools in the Application Signals MCP server and should be prominently featured.

Comment on lines 119 to 129
### Available Tools

1. **list_services**: Get all monitored services
2. **get_service_metrics**: Retrieve metrics for a specific service
3. **get_service_operations**: List operations within a service
4. **get_operation_metrics**: Get metrics for specific operation
5. **list_slos**: View configured SLOs
6. **get_slo_status**: Check SLO compliance
7. **search_traces**: Find traces matching criteria
8. **get_trace_details**: View complete trace information
9. **get_service_map**: Visualize service dependencies
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All 9 tool names listed here are incorrect — none of these exist in any of the 4 MCP servers

I verified against the actual READMEs in awslabs/mcp. Here's the mapping:

Listed (non-existent) Actual tool in cloudwatch-applicationsignals-mcp-server
list_services list_monitored_services
get_service_metrics query_service_metrics
get_service_operations list_service_operations
get_operation_metrics No equivalent — use audit_service_operations
list_slos list_slos ✅ (this one is correct)
get_slo_status get_slo or audit_slos
search_traces search_transaction_spans (100% sampled) or query_sampled_traces (5% sampled)
get_trace_details No equivalent — trace data is returned by search_transaction_spans
get_service_map No equivalent — service map info is part of audit output

Critically, the three primary audit tools are missing entirely: audit_services, audit_slos, audit_service_operations. These are the recommended entry points for every investigation workflow in the Application Signals MCP server.

Also missing from this steering file:

  • Target format reference — the complex JSON target format required by audit tools (e.g., [{"Type":"service","Data":{"Service":{"Type":"Service","Name":"*"}}}])
  • Auditor selection guide — which of the 7 auditor types to use for different scenarios
  • Transaction Search query patterns — error analysis, latency analysis, dependency calls, GenAI token usage
  • X-Ray filter expressions — for query_sampled_traces
  • Pagination (next_token) guidance — wildcard patterns process in batches

The existing cloudwatch-application-signals/steering/steering.md has all of this and could serve as a reference.

Comment on lines 434 to 438
**Step 2: Check Application Signals**
```
Query: list_services(sort_by="error_rate")
Result: "api-service" has 15% error rate (normal: 0.1%)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_services does not exist in any of the 4 MCP servers. The correct tool is list_monitored_services (from cloudwatch-applicationsignals-mcp-server), or better yet, use the primary audit tool:

audit_services(service_targets='[{"Type":"service","Data":{"Service":{"Type":"Service","Name":"*"}}}]')

Comment on lines 448 to 452
**Step 4: Analyze Traces**
```
Query: search_traces(service="api-service", error=true)
Result: Traces show timeout calling database
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

search_traces does not exist in any of the 4 MCP servers. The correct tools are:

  • search_transaction_spans — 100% sampled OpenTelemetry spans via CloudWatch Logs Insights
  • query_sampled_traces — 5% sampled X-Ray traces

Example with the real tool:

search_transaction_spans(query_string='FILTER attributes.aws.local.service = "api-service" and attributes.http.status_code >= 500 | LIMIT 20')

**First Step**: Always start by getting the official enablement guide from AWS:

```
Use the aws-application-signals MCP server's get_enablement_guide tool to retrieve the latest setup instructions and requirements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCP server name referenced here is incorrect. "aws-application-signals MCP server" should be awslabs.cloudwatch-applicationsignals-mcp-server.

Also, the get_enablement_guide tool has required parameters that should be documented:

  • service_platform (required): ec2, ecs, lambda, or eks
  • service_language (required): python, nodejs, java, or dotnet
  • iac_directory (required): Absolute path to IaC code
  • app_directory (required): Absolute path to application code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants