Skip to main content

Application Layer Stability Guarantee

JitAi supports industry-standard and advanced stability assurance measures at the application layer.

Application layer updates are characterized by relatively localized impact, high update frequency, and sensitivity to user experience. This requires balancing stability with the ability to iterate rapidly.

tip
  • ๐ŸŽฏ Localized Impact: Updates to single or few applications with controllable risk
  • โšก Rapid Iteration: Supports frequent updates in response to business needs
  • ๐Ÿ‘ค User Control: Users choose when to upgrade, reducing forced upgrade risk
  • ๐Ÿ”„ Independent Deployment: Updates don't affect other applications, enabling fault isolation

Progressive validation processโ€‹

Multiple runtime environmentsโ€‹

Create multiple runtime environments in the JitAi operations platform, adopting a progressive validation workflow: Test Environment โ†’ Beta Environment โ†’ Production Environment.

Environment configuration strategyโ€‹

Test Environment Configuration

Purpose: Functional validation and basic performance testing

๐Ÿ”ง Environment Characteristics

  • Data Source: Simulated or desensitized data
  • Traffic Source: Testing team and developers
  • Resource Allocation: Medium-scale resources for functional testing

โœ… Validation Focus

  • Business logic correctness
  • User interface and interaction experience
  • Basic performance and response time
  • Integration with other systems

Version management and release strategyโ€‹

Release StageVersion StatusValidation CyclePass CriteriaFailure Handling
App RepositoryDevelopment completedCode reviewCode standards + functional completenessReturn to development for fixes
Test EnvironmentFunctional testing1-2 daysFunctional correctness + basic performanceReturn to development stage
Beta EnvironmentPre-production3-5 daysReal data compatibility + production performanceAnalyze data issues
Production EnvironmentProductionContinuous monitoringStability metrics + user experienceCanary rollback

Canary release mechanismโ€‹

Node-level canary releaseโ€‹

In the JitAi cluster architecture, one JitNode serves as the load balancer, controlling traffic distribution. The runtime environment entry address resolves to this node.

Controlling canary release processโ€‹

Assessing stability and availabilityโ€‹

Canary releases require simultaneous assessment of two dimensions: stability and availability.

  • Stability: Technical metrics such as error rates and response times
  • Availability: Business function uptime and user experience metrics
Canary StageCanary NodesTraffic RatioObservation PeriodStability StandardAvailability StandardException Handling
Initial canary1 node5%2 hoursError rate < 0.01%Business availability > 99.9%Set traffic weight to 0%
Small-scale expansion2 nodes20%4 hoursError rate < 0.005%Business availability > 99.95%Set traffic weight to 0%
Medium scale50% of nodes50%8 hoursError rate < 0.001%Business availability > 99.98%Immediate rollback or set traffic to 0%
Full releaseAll nodes100%Continuous monitoringSystem stableBusiness functioning normallyEmergency rollback
Traffic zeroing mechanism

When canary nodes exhibit abnormal behavior, immediately set their traffic weight to 0% for instant fault isolation:

  • ๐Ÿšจ Instant Response: Cut off traffic to abnormal nodes without waiting for rollback deployment
  • ๐Ÿ›ก๏ธ User Protection: Ensures user requests aren't routed to problematic nodes
  • ๐Ÿ”„ Quick Recovery: Traffic can be rapidly restored once issues are resolved
  • ๐Ÿ“Š Data Retention: Nodes remain running for analysis and debugging

Operating canary release processโ€‹

Standard release workflow:

  1. Select canary node: Choose one node as the initial canary
  2. Adjust traffic weight: Set the node's traffic weight to 5%
  3. Deploy new version: Deploy the new application version on the canary node
  4. Start monitoring: Enable comprehensive monitoring and alerting
  5. Dual assessment: Simultaneously assess stability and availability metrics
  6. Execute decision: Determine next steps based on assessment results
  7. Gradual expansion: Progressively increase canary nodes and traffic ratio after stabilization
  8. Complete release: Upgrade all nodes and restore normal traffic distribution

Exception handling workflow:

Traffic zeroing steps:

  1. Anomaly detection: Monitoring system detects stability or availability metric anomalies
  2. Instant isolation: Set canary node traffic weight to 0% (takes < 10 seconds)
  3. Status confirmation: Verify user traffic has completely switched to stable nodes
  4. Problem diagnosis: Analyze and debug issues in the isolated state
  5. Fix verification: Validate functionality after resolving problems
  6. Traffic recovery: Gradually restore traffic allocation to the node after verification

Observabilityโ€‹

info

Observability features are currently under development and will be available soon.

Integrating with OpenTelemetry and APM ecosystemโ€‹

The JitAi Application Runtime Platform supports OpenTelemetry, the industry-standard framework for observability. OpenTelemetry plays an essential role in technology evolution, ecosystem integration, and industry best practices.

JitAI AssistantBeta
Powered by JitAI