feat(aws-observability): add aws-observability power#60
feat(aws-observability): add aws-observability power#60gcacace wants to merge 5 commits intokirodotdev:mainfrom
Conversation
aws-observability/POWER.md
Outdated
| - `logs:*` for CloudWatch Logs operations | ||
| - `cloudwatch:*` for CloudWatch Metrics, Alarms, and Application Signals | ||
| - `xray:*` for distributed tracing | ||
| - `cloudtrail:LookupEvents` for CloudTrail queries |
There was a problem hiding this comment.
NIT: Missing prerequisite: Application Signals enablement
The existing cloudwatch-application-signals power requires this as a prerequisite:
Application Signals enabled in your AWS account (Getting started with Application Signals)
Without this, the awslabs.cloudwatch-applicationsignals-mcp-server will fail silently. This should be listed as a prerequisite.
Also, the broad cloudwatch:* permission doesn't cover application-signals:* API actions (e.g., application-signals:ListServices, application-signals:GetService, application-signals:ListServiceOperations, etc.), nor synthetics:GetCanary/synthetics:GetCanaryRuns for canary analysis, nor s3:GetObject/s3:ListBucket for canary artifacts, nor iam:GetRole/iam:ListAttachedRolePolicies/iam:GetPolicy/iam:GetPolicyVersion for the enablement guide. See the existing power's POWER.md for the full 23-action IAM policy.
| - Error rate and latency tracking (P50, P90, P95, P99) | ||
| - Automatic service discovery | ||
| - SLO compliance monitoring and error budget tracking | ||
| - Enablement guide for setup assistance |
There was a problem hiding this comment.
Missing key Application Signals capabilities
This section is missing several important capabilities that are documented in the existing power:
- 100% Trace Visibility —
search_transaction_spansqueries OpenTelemetry spans via CloudWatch Logs Insights with 100% sampling (vs X-Ray's 5%). This is a major differentiator. - Canary Failure Analysis —
analyze_canary_failuresprovides root cause investigation for CloudWatch Synthetics canaries. - Primary Audit Tools —
audit_services,audit_slos,audit_service_operationsare the recommended entry points for all investigation workflows. They support wildcard targeting and multi-auditor analysis (7 auditor types:slo,operation_metric,trace,log,dependency_metric,top_contributor,service_quota).
These are the most powerful tools in the Application Signals MCP server and should be prominently featured.
| ### Available Tools | ||
|
|
||
| 1. **list_services**: Get all monitored services | ||
| 2. **get_service_metrics**: Retrieve metrics for a specific service | ||
| 3. **get_service_operations**: List operations within a service | ||
| 4. **get_operation_metrics**: Get metrics for specific operation | ||
| 5. **list_slos**: View configured SLOs | ||
| 6. **get_slo_status**: Check SLO compliance | ||
| 7. **search_traces**: Find traces matching criteria | ||
| 8. **get_trace_details**: View complete trace information | ||
| 9. **get_service_map**: Visualize service dependencies |
There was a problem hiding this comment.
All 9 tool names listed here are incorrect — none of these exist in any of the 4 MCP servers
I verified against the actual READMEs in awslabs/mcp. Here's the mapping:
| Listed (non-existent) | Actual tool in cloudwatch-applicationsignals-mcp-server |
|---|---|
list_services |
list_monitored_services |
get_service_metrics |
query_service_metrics |
get_service_operations |
list_service_operations |
get_operation_metrics |
No equivalent — use audit_service_operations |
list_slos |
list_slos ✅ (this one is correct) |
get_slo_status |
get_slo or audit_slos |
search_traces |
search_transaction_spans (100% sampled) or query_sampled_traces (5% sampled) |
get_trace_details |
No equivalent — trace data is returned by search_transaction_spans |
get_service_map |
No equivalent — service map info is part of audit output |
Critically, the three primary audit tools are missing entirely: audit_services, audit_slos, audit_service_operations. These are the recommended entry points for every investigation workflow in the Application Signals MCP server.
Also missing from this steering file:
- Target format reference — the complex JSON target format required by audit tools (e.g.,
[{"Type":"service","Data":{"Service":{"Type":"Service","Name":"*"}}}]) - Auditor selection guide — which of the 7 auditor types to use for different scenarios
- Transaction Search query patterns — error analysis, latency analysis, dependency calls, GenAI token usage
- X-Ray filter expressions — for
query_sampled_traces - Pagination (
next_token) guidance — wildcard patterns process in batches
The existing cloudwatch-application-signals/steering/steering.md has all of this and could serve as a reference.
| **Step 2: Check Application Signals** | ||
| ``` | ||
| Query: list_services(sort_by="error_rate") | ||
| Result: "api-service" has 15% error rate (normal: 0.1%) | ||
| ``` |
There was a problem hiding this comment.
list_services does not exist in any of the 4 MCP servers. The correct tool is list_monitored_services (from cloudwatch-applicationsignals-mcp-server), or better yet, use the primary audit tool:
audit_services(service_targets='[{"Type":"service","Data":{"Service":{"Type":"Service","Name":"*"}}}]')
| **Step 4: Analyze Traces** | ||
| ``` | ||
| Query: search_traces(service="api-service", error=true) | ||
| Result: Traces show timeout calling database | ||
| ``` |
There was a problem hiding this comment.
search_traces does not exist in any of the 4 MCP servers. The correct tools are:
search_transaction_spans— 100% sampled OpenTelemetry spans via CloudWatch Logs Insightsquery_sampled_traces— 5% sampled X-Ray traces
Example with the real tool:
search_transaction_spans(query_string='FILTER attributes.aws.local.service = "api-service" and attributes.http.status_code >= 500 | LIMIT 20')
| **First Step**: Always start by getting the official enablement guide from AWS: | ||
|
|
||
| ``` | ||
| Use the aws-application-signals MCP server's get_enablement_guide tool to retrieve the latest setup instructions and requirements. |
There was a problem hiding this comment.
The MCP server name referenced here is incorrect. "aws-application-signals MCP server" should be awslabs.cloudwatch-applicationsignals-mcp-server.
Also, the get_enablement_guide tool has required parameters that should be documented:
service_platform(required):ec2,ecs,lambda, oreksservice_language(required):python,nodejs,java, ordotnetiac_directory(required): Absolute path to IaC codeapp_directory(required): Absolute path to application code
Description of changes:
This PR introduces the AWS Observability Power, a comprehensive observability platform that integrates CloudWatch Logs, Metrics, Alarms, Application Signals (APM), CloudTrail security auditing, and AWS Documentation access into a unified Kiro power.
Key Components:
Power Configuration (POWER.md)
MCP Server Configuration (mcp.json)
Steering Files (8 comprehensive guides):
incident-response.md- Complete incident response framework with 6 phases (detection, investigation, mitigation, recovery, RCA, postmortem)alerting-setup.md- Intelligent alarm configuration with recommended thresholds and best practicesapplication-signals-setup.md- Application Signals enablement guide and setup workflowslog-analysis.md- CloudWatch Logs Insights query patterns and syntax referenceperformance-monitoring.md- Application Signals APM monitoring, SLO management, and trace analysissecurity-auditing.md- CloudTrail security monitoring, compliance auditing, and incident investigationobservability-gap-analysis.md- Automated codebase analysis to identify observability gaps across logging, metrics, tracing, and error handling with multi-language supportcloudtrail-data-source-selection.md- Decision tree and priority strategy for CloudTrail data source selection (CloudTrail Lake → CloudWatch Logs → Lookup Events API), including query translation examples across all three sourcesFeatures:
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.