Distributed Inventory Management at Scale
Building mission-critical real-time inventory systems serving as the single source of truth for multi-city marketplace operations
Inventory management in a distributed marketplace isn't just about tracking items—it's about building the foundational system that every part of the business depends on. When customers search for vehicles, hosts manage their fleet, and multiple business verticals make real-time decisions, they all rely on one truth: the inventory system.
This is the technical story of a distributed inventory management system that handled 300K+ daily operations across 25 cities, serving as the authoritative source of truth while maintaining real-time consistency and graceful conflict resolution.
The Mission-Critical Systems Challenge
Inventory systems in marketplaces face a unique engineering challenge: they must serve as the single source of truth for multiple stakeholders with different requirements, all while maintaining real-time accuracy under high concurrency.
System Requirements: Serve as authoritative inventory source for customers (search/booking), vendors (fleet management), and business verticals (analytics/operations) while handling 300K+ daily operations with sub-second response times.
Scale and Complexity:
- 300K+ daily operations with 50:1 read-to-write ratio
- 25 cities with geo-distributed operations
- 10K+ vehicles with real-time availability tracking
- Multiple stakeholders requiring different data views and update patterns
- Zero tolerance for stale data in customer-facing applications
Stakeholder Requirements
Customer-Facing Systems:
- Search Results: Real-time availability with accurate inventory status
- Booking Flow: Immediate inventory reservation with conflict detection
- Dynamic Updates: Live availability changes during browsing sessions
Vendor/Host Operations:
- Fleet Management: Real-time vehicle status updates (available, booked, maintenance)
- Availability Control: Immediate inventory blocking for maintenance or personal use
- Booking Notifications: Instant updates when vehicles are reserved
Business Verticals:
- Analytics: Consistent inventory metrics across all reporting systems
- Operations: Real-time fleet utilization and availability insights
- Finance: Accurate revenue attribution and utilization calculations
Distributed Concurrency Architecture
The core challenge in distributed inventory management is handling simultaneous operations from multiple actors while maintaining data consistency and preventing race conditions.
Multi-Level Concurrency Control
The system employs a layered approach to concurrency control, combining JVM-level synchronization with distributed atomic operations:
CONCURRENCY CONTROL ARCHITECTURE
APPLICATION LAYER
+----------------+ +----------------+ +----------------+ +----------------+
| Customer |--->| ReentrantLock |--->| Booking |--->| Status |
| Booking | | (JVM Level) | | Validation | | Update |
| Request | | | | | | |
+----------------+ +----------------+ +----------------+ +----------------+
| | |
v v v
+----------------+ +----------------+ +----------------+ +----------------+
| Host Fleet |--->| Thread |--->| Business |--->| Conflict |
| Management | | Synchronization| | Logic | | Resolution |
+----------------+ +----------------+ +----------------+ +----------------+
DISTRIBUTED LAYER
+----------------+ +----------------+ +----------------+ +----------------+
| Redis |--->| Lua Script |--->| Atomic |--->| Database |
| Coordination | | Execution | | Operations | | Propagation |
+----------------+ +----------------+ +----------------+ +----------------+
ReentrantLock for JVM-Level Synchronization
At the application level, ReentrantLock provides fine-grained concurrency control for critical sections involving inventory state changes:
Locking Strategy: Vehicle-level locks prevent simultaneous booking attempts for the same inventory item while allowing concurrent operations on different vehicles to proceed without blocking.
Concurrency Control Logic:
1. Acquire ReentrantLock for specific vehicle_id
2. Validate inventory state and business rules
3. Execute booking/blocking/status change logic
4. Trigger atomic Redis update via Lua script
5. Release lock after successful operation
Lock Granularity: Per-vehicle locking minimizes contention while ensuring consistency
Lua Scripts for Atomic Distributed Operations
Redis Lua scripts provide atomic execution of complex operations that span multiple data structures and business logic:
Atomic Operations: Lua scripts handle vehicle booking, completion, accident blocking, maintenance scheduling, and status transitions as single atomic operations in Redis.
Lua Script Responsibilities:
- Vehicle Booking: Atomic reservation with availability validation and status updates
- Booking Completion: Status transitions with timeline updates and availability restoration
- Accident/Maintenance Blocking: Immediate inventory removal with proper status tracking
- Status Synchronization: Ensuring Redis state consistency across multiple data structures
Example Lua Script Logic (Vehicle Booking):
-- Atomic booking operation in Redis
local vehicle_id = KEYS[1]
local booking_details = ARGV[1]
-- Check current availability
local current_status = redis.call('HGET', 'vehicle:' .. vehicle_id, 'status')
if current_status ~= 'available' then
return {err = 'Vehicle not available'}
end
-- Atomic status update
redis.call('HSET', 'vehicle:' .. vehicle_id, 'status', 'booked')
redis.call('HSET', 'vehicle:' .. vehicle_id, 'booking_id', booking_details)
redis.call('SADD', 'booked_vehicles', vehicle_id)
redis.call('SREM', 'available_vehicles', vehicle_id)
-- Update city-level availability counters
redis.call('HINCRBY', 'city_inventory', city_id, -1)
return {ok = 'Booking successful'}
CQRS and Event-Driven Architecture
With a 50:1 read-to-write ratio, the system employs Command Query Responsibility Segregation (CQRS) to optimize for different access patterns while maintaining strong consistency where required.
Read-Write Separation Strategy
Architectural Decision: Separate read and write paths with dedicated infrastructure for each pattern, enabling optimization for high-volume reads while ensuring write consistency and durability.
CQRS ARCHITECTURE
WRITE PATH
+----------------+ +----------------+ +----------------+ +----------------+
| Write |--->| Business |--->| Concurrency |--->| Redis |
| Commands | | Validation | | Control | | Update |
| (Booking/ | | | | (Lock + Lua) | | (Atomic) |
| Status Change) | | | | | | |
+----------------+ +----------------+ +----------------+ +----------------+
|
v
+----------------+
| Event |
| Publishing |
| (Status |
| Changes) |
+----------------+
|
v
READ PATH +----------------+
+----------------+ +----------------+ +----------------+ | Database |
| Read |--->| Read |--->| Cached |<--| Sync |
| Queries | | Replicas | | Responses | | (Eventual) |
| (Search/ | | (Optimized | | (High | | |
| Availability) | | for Reads) | | Performance) | | |
+----------------+ +----------------+ +----------------+ +----------------+
Event-Driven Synchronization
The system maintains consistency between Redis (operational data store) and PostgreSQL (persistent storage) through event-driven synchronization rather than polling mechanisms:
Event Flow Architecture:
- Write Operations: Update Redis immediately for real-time availability
- Event Publishing: Publish status change events after successful Redis updates
- Database Sync: Consume events to update PostgreSQL with eventual consistency
- Read Operations: Serve from optimized read replicas with strong consistency guarantees
Optimistic Business Model for Conflict Resolution
Rather than implementing strict pessimistic locking that could impact availability, the system employs an optimistic approach aligned with business reality.
Strategic Trade-off: Accept potential double-booking scenarios and resolve conflicts gracefully through user communication and alternative vehicle offers, prioritizing business continuity over perfect technical consistency.
Conflict Resolution Strategy
Optimistic Acceptance Model:
- Accept Bookings: Allow potentially conflicting bookings rather than rejecting requests
- Detect Conflicts: Identify double-booking or unavailability issues post-acceptance
- Graceful Resolution: Offer alternative vehicles with user communication and incentives
- Business Continuity: Convert potential rejections into customer retention opportunities
Conflict Resolution Flow:
1. Accept booking request optimistically
2. Process through normal booking validation
3. Detect conflicts during post-processing validation
4. If conflict detected:
a. Maintain user engagement (don't cancel immediately)
b. Search for equivalent or upgraded vehicle alternatives
c. Present options with potential incentives/upgrades
d. Allow user choice rather than forced cancellation
5. Track conflict resolution success rates for system optimization
Business Impact of Optimistic Model
Revenue Protection: Converting 70%+ of potential booking conflicts into successful alternative bookings through proactive customer service rather than technical rejection.
This approach recognizes that in marketplace businesses, customer acquisition and retention often outweigh perfect technical consistency, especially when conflicts can be resolved satisfactorily.
API Design for Multi-Consumer Patterns
Supporting diverse stakeholders requires carefully designed APIs that optimize for different usage patterns while maintaining a consistent data model.
Consumer-Specific API Design
Customer-Facing APIs:
- Search API: Optimized for bulk availability queries with geographic filtering
- Real-time Availability: WebSocket connections for live inventory updates during browsing
- Booking API: Immediate reservation with optimistic conflict handling
Vendor/Host APIs:
- Fleet Management: Batch operations for managing multiple vehicle statuses
- Status Control: Immediate blocking/unblocking for maintenance or personal use
- Revenue Tracking: Real-time utilization and booking analytics
Business Vertical APIs:
- Analytics APIs: Aggregate data queries optimized for reporting and dashboards
- Operations APIs: City-wide inventory insights and utilization metrics
- Integration APIs: Standardized data access for downstream business systems
API Design Patterns:
• Real-time APIs: WebSocket connections for live updates
• Batch APIs: Bulk operations for administrative tasks
• Analytics APIs: Aggregated data with appropriate caching
• Integration APIs: Standardized formats for cross-system compatibility
Rate Limiting: Consumer-specific limits based on usage patterns
Authentication: Role-based access with fine-grained permissions
Strong Consistency Requirements
Despite the optimistic business model, certain aspects of the inventory system require strong consistency to maintain user trust and business integrity.
Non-Negotiable Consistency: Customer-facing search results must reflect real-time inventory status. Displaying unavailable vehicles as available damages user experience and business credibility.
Consistency Guarantees
Strong Consistency Domains:
- Search Results: Real-time availability reflection in customer-facing applications
- Booking Validation: Immediate inventory status verification during reservation flow
- Host Dashboard: Current fleet status for host decision-making
Eventual Consistency Domains:
- Analytics Reporting: Slight delays acceptable for aggregate metrics and dashboards
- Historical Data: Audit trails and long-term analytics can tolerate sync delays
- Cross-System Integration: Downstream systems can handle eventual consistency patterns
Read Replica Strategy
All stakeholders read from designated read replicas that maintain strong consistency for customer-facing operations while providing eventual consistency for analytical workloads:
Read Replica Architecture:
• Primary Read Replica: Strong consistency for customer-facing operations
• Analytics Replica: Optimized for complex queries with eventual consistency
• Geographic Replicas: City-specific replicas for performance optimization
• Cross-Region Replicas: Disaster recovery and geographic distribution
Replication Strategy:
• Synchronous replication for customer-facing reads
• Asynchronous replication for analytics and reporting
• Automatic failover with consistency verification
Performance Results and System Impact
Mission-Critical Reliability: Achieved 99.9%+ availability while serving as the authoritative inventory source for entire marketplace operations across 25 cities.
Performance Characteristics:
- Throughput: 300K+ daily operations with sub-second response times
- Concurrency: Handled simultaneous operations from multiple business verticals
- Consistency: Strong consistency for customer-facing reads, eventual consistency for analytics
- Conflict Resolution: 70%+ success rate in converting booking conflicts to alternative bookings
Business Impact:
- Revenue Protection: Optimistic booking model prevented revenue loss from technical rejections
- Operational Efficiency: Single source of truth eliminated data inconsistencies across business units
- User Experience: Real-time availability updates maintained customer trust and engagement
- System Reliability: Mission-critical uptime requirements met consistently across all markets
Key Engineering Insights
Business-aligned technical decisions outperform perfect consistency: Optimistic conflict resolution with graceful degradation provides better business outcomes than strict technical consistency in marketplace environments.
CQRS enables optimization for different stakeholder patterns: Separating read and write paths allows optimization for high-volume customer reads while maintaining consistency for critical operations.
Multi-level concurrency control scales with complexity: Combining JVM-level locks with distributed atomic operations provides both performance and consistency at different system layers.
Evolution and Scaling Considerations
Geographic Distribution: As marketplace operations expand globally, the system requires evolution toward geo-distributed consistency models with regional autonomy and cross-region synchronization.
Real-time Personalization: Future iterations could incorporate real-time inventory allocation based on user preferences and booking patterns, optimizing availability presentation for individual users.
Predictive Availability: Advanced systems could leverage machine learning to predict inventory availability patterns and proactively manage conflicts before they occur.
← Back to All Writing