Skip to content

Support

Platform Engineering includes five engineering teams, each responsible for the health and maintenance of several systems. These systems range in criticality, and each team provides support accordingly.

Business Hours: Monday-Friday, 9:00 AM to 8:00 PM ET

Business hours support:

During business hours, or for issues that are Moderate severity or lower, create a support request. Platform Engineering teams handle these types of requests during normal business hours.

Non-business hours support:

When an incident occurs outside of business hours, please page the appropriate team in Opsgenie. To determine which Platform Engineering schedule to page, see Teams, schedules, and supported systems

Note

We understand that situations may occur that require exceptions. If a lower severity issue or system is blocking incident resolution, don't hesitate to page the appropriate team.

To create an alert in Opsgenie:

  1. Click the Alerts tab in Opsgenie.
  2. Click Create alert.
  3. Provide a detailed message and set the Responder to the appropriate Opsgenie team.
  4. Click Create.

Incident severity levels and support process

Severity Description Support Window Contact Method
Critical Widespread, complete loss of functionality, or severe or persistent performance degradation.
24/7 Business hours: Open a support request in your cloud-platform-<workload> Slack channel.
Non-business hours: Create an alert in Opsgenie to page the on-call engineer.
High Substantial reduction in performance or impacts multiple users.
24/7 Business hours: [Open a Support Request] in your cloud-platform-<workload> Slack channel.
Non-business hours: Create an alert in Opsgenie to page the on-call engineer.
Moderate Potential instability or moderate reduction in performance.
Business hours Open a support request in your cloud-platform-<workload> Slack channel
Low Inconvenient, but doesn't prevent users from continuing to work.
Business hours Open a support request in your cloud-platform-<workload> Slack channel
Feature Requests Suggestions, requests for improvements, or feedback on existing features. Please use for general inquires, such as workload access. Has no impact on performance or existing functionality. Business hours Open a feature request

Teams, schedules, and supported systems

Escalate directly to our partners

In case of emergencies where critical (internal or external) partner services are in a degraded or failing state:

Otherwise, page the schedule(s) below as indicated in these instructions.

Cloud Foundation

System Severity Impact
IAM Permissions High Inability to deploy, rollback, or alter production-impacting features & infrastructure
AWS Network High Core network-as-service infrastructure stability issues (DNS, ENI Counts, VPC-related issues) that are production-impacting
IAM Permissions Low Inability to build, deploy, or provision resources for non-production-impacting changes and features

Delivery

System Severity Impact
GitHub Actions High Inability to perform builds or deployments
Artifactory High Inability to builds or deploy new versions
Base Images Low Build or deployment failures

DevEx

System Severity Impact
SDE High Inability to login to AWS from workstation
CP Docs Moderate Inability to access Cloud Platform documentation, including troubleshooting guides
SonarQube Low Build failures for services running quality checks
Sourcegraph Low Inability for some (internal) services and engineers to execute code search queries
ghe-team-sync Low Users lacking desired permissions in GitHub
Rancher Desktop Low Inability to run containers on workstations

Enablement

System Severity Impact
Region Evacuation High PSO is unable to execute region evacuation
JWT Key Management High Many core services depend on this system to retrieve public and private JWTs. Failure will manifest as issues downloading JWTs or deployment failures on some user services, core services and commerce-services.
IaC libraries Low Deployment failures related to outdated or incorrect use of abc-workloads, abc-cdk-constructs, or abc-cdk-lib libraries
Slack support bot Low Engineering teams unable to open Cloud Platform support requests

Observability

System Severity Impact
Datadog High Loss of observability, monitoring, and alerting capabilities

GitHub

Note

This is for system failures of the type described in Impact below. When necessary, page Delivery or DevEx for other GitHub-related topics.

System Severity Impact
GitHub Critical Engineers unable to pull, push, or review code. Engineers unable to perform builds or deployments.

Additional resources