Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Access logs to instances containing PHI are maintained via infrastructure, application and operating system logging mechanisms. Monitoring, audit controls and system activity review is documented and complies with 45 CFR 164.308(a)(5)(ii)(C), 45 CFR 164.312(b) and 45 CFR 164.308(a)(1)(ii)(D).

Uptime, monitoring, and alerts

Tidepool monitors systems proactively for the following concerns, though this is not an exhaustive list. Tidepool continuously evaluates environment and risk criteria and updates monitoring and alerting based on risk-based analysis.

  • Network Performance - latency, response time, errors

  • System Performance - CPU, memory, disk, network usage

  • Application Performance - latency, errors, critical conditions

  • Security - anomalous connections, suspicious connections, intrusion detection, admin activity, logins/logouts/lockouts, audit, policy changes

  • Capacity - system resource usage, overhead, disk usage, failover and redundancy

Monitoring tools and services

  • DataDog

    • service availability/uptime

    • system availability metrics

  • MongoDB Atlas

    • performance - slow queries and application performance

    • security monitoring - access changes

    • availability - cluster operations and performance

  • Prometheus/Alert Manager and Grafana (Kubernetes)

    • logging/metrics/system health

    • custom alerting

  • Sumo Logic

    • aggregate and archive logs

    • logging/metrics/system health

    • custom alerting

  • AWS Services

    • CloudTrail - captures application, access, audit and activity logs

    • CloudWatch - alerting on events

    • SNS - distribute notifications to services and humans (email, web hooks)

    • SQS - distribute notifications via queueing mechanisms

    • Config - monitor and detect changes in configuration or posture

    • GuardDuty - threat detection

    • Inspector - automated security analysis of network, configuration, security posture

    • SecurityHub - monitors and aggregates posture and threat information from these AWS services:

      • GuardDuty

      • Inspector

      • Identity and Access Management (IAM) Access Analyzer

      • Firewall Manager

An "on-call" rotation schedule is maintained to ensure that there is always a primary and multiple backup employees to respond to potential issues, 24x7.

Tidepool is a fully distributed and remote company, employing engineers in multiple Time Zones. As a result, an engineer is always available.

Based on Pingdom (legacy uptime monitoring platform), DataDog, and Statuspage metrics, Tidepool has maintained 100% user-facing uptime of our production environment over the last year, and over 99.9% uptime since inception.

Individual instances are only taken down momentarily for rolling software installations. User app and API requests continue to be fulfilled by redundant instances and updates are rolled back via automation in case of deployment problems

Alerting

Tidepool integrates multiple alerting mechanisms for notification of problems and issues via ChatOps, email, cell phone/SMS:

  • PagerDuty - On-call scheduling, alerting, and incident tracking

  • Sumo Logic - log aggregation, metrics, and alerts

  • AWS - metrics, usage, and alerting

  • DataDog - MongoDB Atlas performance and security monitoring

  • Prometheus/AlertManager - Kubernetes alerts

  • Slack - Alert delivery/notification from PagerDuty, Sumo Logic, DataDog, MongoDB

  • Atlassian StatusPage - public alerting to anyone interested in Tidepool system status


In accordance with legal, statutory, and regulatory compliance obligations, the availability, quality, and adequate capacity and resources are planned, prepared, and measured to deliver the required system performance.

  • No labels