Skip to content

06.01 Operational Visibility and Reliability

Status: draft for discussion

1. Goal

This document describes business expectations for reliability, issue investigation, and operational transparency.

It does not define exact metrics, dashboards, or monitoring implementation.

2. What Does Not Work Today

Today, when errors happen, it is hard to understand:

  • what exactly went wrong;
  • at which step the transaction failed;
  • whether the issue is in our system, provider, merchant request, or configuration;
  • whether webhook was sent to merchant;
  • who changed what in Back Office.

The new system must make transaction and configuration flows investigable.

3. Reliability Expectation

Business expectations:

  • merchant can create deposits when the system is available;
  • internal processes do not block merchant flow without need;
  • provider issues are separated from platform issues;
  • if provider is unavailable, routing tries available fallback options;
  • if processing is impossible, transaction receives a clear result and reason.

4. Transaction Investigation

For each transaction, it must be clear:

  • when it was created;
  • which merchant, brand, and payment method participated;
  • which routing decisions were made;
  • which final result was received;
  • whether webhook was sent to merchant;
  • whether manual correction happened;
  • whether transaction is a migrated legacy transaction.

Merchant users see only safe information.

Platform users can see more internal information if their role allows it.

5. Configuration Investigation

For important configuration changes, it must be clear:

  • who changed the configuration;
  • when it happened;
  • which entity was changed;
  • which non-secret values changed;
  • which version was active at the time of transaction.

Secret values are not exposed.

6. Alerts and Monitoring

The team must be able to see:

  • success rate drop;
  • webhook delivery issues;
  • suspicious provider callbacks;
  • system issues affecting transaction processing.

Technical implementation of monitoring and alerts is proposed by development and operations teams.

7. Audit Logs

Audit log must help answer:

  • who changed data;
  • what changed;
  • when it happened.

Audit visibility depends on business access.

Audit log does not expose secrets.

8. What the Development Team Decides

The development team proposes monitoring implementation, alert rules, transaction tracing, audit storage, and reliability approach.

Комментарии

Комментариев пока нет.