Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

Access logs to instances containing PHI are maintained via infrastructure, application and operating system logging mechanisms. Monitoring, audit controls and system activity review is documented and complies with 45 CFR 164.308(a)(5)(ii)(C), 45 CFR 164.312(b) and 45 CFR 164.308(a)(1)(ii)(D).

Live Search
spaceKey@self
additionalpage excerpt
placeholderSearch this space

Uptime, monitoring, and alerts

Alerting and Status

Tidepool integrates multiple internal alerting mechanisms for notification of problems and issues via ChatOps, email, cell phone/SMS. For Public status information, please see:

  • Atlassian StatusPage - public alerting to anyone interested in Tidepool system status

  • An "on-call" rotation schedule for engineers is maintained to ensure that there is always a primary and multiple backup employee to respond to potential issues, 24x7.

Logging

Tidepool implements remote logging to a HIPAA-compliant service for all application, security, audit, and compliance logs.

Monitoring

Tidepool monitors systems proactively for the following concerns, though this is not an exhaustive list. Tidepool continuously evaluates environment and risk criteria and updates monitoring and alerting based on risk-based analysis.

  • Network Performance - latency, response time, errors

  • System Performance - CPU, memory, disk, network usage

  • Application Performance - latency, errors, critical conditions

  • Security - anomalous connections, suspicious connections, intrusion detection, admin activity, logins/logouts/lockouts, audit, policy changes, logging

  • Capacity - system resource usage, overhead, disk usage, failover and redundancy

Monitoring tools and services

  • DataDog

    • service availability/uptime

    • system availability metrics

  • MongoDB Atlas

    • performance - slow queries and application performance

    • security monitoring - access changes

    • availability - cluster operations and performance

  • Prometheus/Alert Manager and Grafana (Kubernetes)

    • logging/metrics/system health

    • custom alerting

  • Sumo Logic

    • aggregate and archive logs

    • logging/metrics/system health

    • custom alerting

  • AWS Services

  • CloudTrail - captures application, access, audit and activity logs

  • CloudWatch - alerting on events

  • SNS - distribute notifications to services and humans (email, web hooks)

  • SQS - distribute notifications via queueing mechanisms

  • Config - monitor and detect changes in configuration or posture

  • GuardDuty - threat detection

  • Inspector - automated security analysis of network, configuration, security posture

  • SecurityHub - monitors and aggregates posture and threat information from these AWS services:

    • GuardDuty

    • Inspector

    • Identity and Access Management (IAM) Access Analyzer

    • Firewall Manager

An "on-call" rotation schedule is maintained to ensure that there is always a primary and multiple backup employees to respond to potential issues, 24x7.

Tip

Tidepool is a fully distributed and remote company, employing engineers in multiple Time Zones. As a result, an engineer is always available.

Based on Pingdom (legacy uptime monitoring platform), DataDog, and Statuspage metrics, our monitoring tools, Tidepool has maintained 100% user-facing uptime of our production environment over the last year, and over 99.9% uptime since inception.

Individual instances are only taken down momentarily for rolling software installations. User app No downtime for software/system updates

  • Under normal circumstances, all User application and API requests continue to be fulfilled by redundant instances and updates are rolled back via automation in case of deployment problems

Alerting

Tidepool integrates multiple alerting mechanisms for notification of problems and issues via ChatOps, email, cell phone/SMS:

  • PagerDuty - On-call scheduling, alerting, and incident tracking

  • Sumo Logic - log aggregation, metrics, and alerts

  • AWS - metrics, usage, and alerting

  • DataDog - MongoDB Atlas performance and security monitoring

  • Prometheus/AlertManager - Kubernetes alerts

  • Slack - Alert delivery/notification from PagerDuty, Sumo Logic, DataDog, MongoDB

  • Atlassian StatusPage - public alerting to anyone interested in Tidepool system status.


In accordance with legal, statutory, and regulatory compliance obligations, the availability, quality, and adequate capacity and resources are planned, prepared, and measured to deliver the required system performance.

Page Tree