Access logs to instances containing PHI are maintained via infrastructure, application and operating system logging mechanisms. Monitoring, audit controls and system activity review is documented and complies with 45 CFR 164.308(a)(5)(ii)(C), 45 CFR 164.312(b) and 45 CFR 164.308(a)(1)(ii)(D).

Uptime, monitoring, and alerts

Tidepool monitors systems proactively for the following concerns, though this is not an exhaustive list. Tidepool continuously evaluates environment and risk criteria and updates monitoring and alerting based on risk-based analysis.

Network Performance - latency, response time, errors
System Performance - CPU, memory, disk, network usage
Application Performance - latency, errors, critical conditions
Security - anomalous connections, suspicious connections, intrusion detection, admin activity, logins/logouts/lockouts, audit, policy changes
Capacity - system resource usage, overhead, disk usage, failover and redundancy

Monitoring tools and services

DataDog
- service availability/uptime
- system availability metrics
MongoDB Atlas
- performance - slow queries and application performance
- security monitoring - access changes
- availability - cluster operations and performance
Prometheus/Alert Manager and Grafana (Kubernetes)
- logging/metrics/system health
- custom alerting
Sumo Logic
- aggregate and archive logs
- logging/metrics/system health
- custom alerting
AWS Services
- CloudTrail - captures application, access, audit and activity logs
- CloudWatch - alerting on events
- SNS - distribute notifications to services and humans (email, web hooks)
- SQS - distribute notifications via queueing mechanisms
- Config - monitor and detect changes in configuration or posture
- GuardDuty - threat detection
- Inspector - automated security analysis of network, configuration, security posture
- SecurityHub - monitors and aggregates posture and threat information from these AWS services:
  - GuardDuty
  - Inspector
  - Identity and Access Management (IAM) Access Analyzer
  - Firewall Manager

An "on-call" rotation schedule is maintained to ensure that there is always a primary and multiple backup employees to respond to potential issues, 24x7.

Tidepool is a fully distributed and remote company, employing engineers in multiple Time Zones. As a result, an engineer is always available.

Based on Pingdom (legacy uptime monitoring platform), DataDog, and Statuspage metrics, Tidepool has maintained 100% user-facing uptime of our production environment over the last year, and over 99.9% uptime since inception.

Individual instances are only taken down momentarily for rolling software installations. User app and API requests continue to be fulfilled by redundant instances and updates are rolled back via automation in case of deployment problems

Alerting

Tidepool integrates multiple alerting mechanisms for notification of problems and issues via ChatOps, email, cell phone/SMS:

PagerDuty - On-call scheduling, alerting, and incident tracking
Sumo Logic - log aggregation, metrics, and alerts
AWS - metrics, usage, and alerting
DataDog - MongoDB Atlas performance and security monitoring
Prometheus/AlertManager - Kubernetes alerts
Slack - Alert delivery/notification from PagerDuty, Sumo Logic, DataDog, MongoDB
Atlassian StatusPage - public alerting to anyone interested in Tidepool system status

In accordance with legal, statutory, and regulatory compliance obligations, the availability, quality, and adequate capacity and resources are planned, prepared, and measured to deliver the required system performance.