Traditional fraud detection looks at transactions in isolation. But fraud doesn't happen in isolation - it spreads through networks of connected users, shared devices, and coordinated attacks.
This is the technical story of building a graph-based fraud detection system that processed 110+ million nodes in real-time, achieving 70%+ high-risk detection accuracy while maintaining sub-second response times for booking decisions.
The Fraud Network Problem
Fraud in digital marketplaces follows predictable patterns: stolen credit cards get shared among networks of bad actors, device fingerprints appear across multiple fake accounts, and fraudulent booking behaviors cluster around specific geographic or temporal patterns.
Core Insight: Fraudsters don't operate in isolation. They form networks connected by shared devices, payment methods, phone numbers, locations, and referral chains. Traditional rule-based systems miss these network effects entirely.
Our approach models the entire user ecosystem as a graph where:
- Nodes represent users with rich attribute sets (user_id, phone, email, cards, device fingerprints, location data)
- Edges represent relationships (shared devices, referrals, similar booking patterns, geographic proximity)
- Node classification evolves in real-time as fraud patterns emerge and propagate through the network
Graph Construction and Data Model
Building a fraud detection graph at this scale requires careful consideration of both data model design and performance characteristics.
Node Attributes and Feature Engineering
Each user node contains multiple layers of identity and behavioral signals:
// Node structure in Neo4j
CREATE (user:User {
user_id: "abc123",
phone_hash: "ph_89a7b2c1",
email_hash: "em_4f5a6b7c",
device_fingerprint: "dev_aa1bb2cc",
sim_fingerprint: "sim_8877aabb",
registration_location: [lat, lng],
signup_timestamp: datetime(),
account_age_days: 45,
booking_count: 12,
cancellation_rate: 0.08,
risk_score: 0.23,
classification: "unknown" // red, green, unknown
})
Edge Relationships: Connections form based on shared attributes and behavioral patterns:
// Relationship types
(:User)-[:SHARES_DEVICE]->(:User) // Same device fingerprint
(:User)-[:SHARES_PHONE]->(:User) // Same phone number
(:User)-[:REFERRED_BY]->(:User) // Referral chain
(:User)-[:SIMILAR_LOCATION]->(:User) // Geographic proximity
(:User)-[:SHARES_PAYMENT]->(:User) // Same payment method
(:User)-[:BOOKING_PATTERN]->(:User) // Similar timing/behavior
The New User Challenge
100 million first-time users present the greatest fraud risk - they lack historical behavior data but represent the primary attack vector for stolen credentials and coordinated fraud attempts.
Behavioral Bootstrapping for New Users:
- Device fingerprinting: Primary connection signal for graph insertion
- Geographic anomalies: Registration location vs booking location analysis
- Booking patterns: Short-term bookings with immediate start times = high risk
- App usage analysis: Rushed behavior vs normal exploration patterns
- Channel analysis: Referral source and acquisition pathway
- Temporal signals: Time of day, day of week patterns
Multi-Algorithm Fraud Detection
Rather than relying on a single detection method, the system employs multiple graph algorithms simultaneously, each targeting different fraud patterns:
Algorithm Ensemble
1. Community Detection
Identifies tightly connected clusters of users who may be coordinating fraudulent activities. Fraud rings often exhibit high intra-cluster connectivity through shared devices, payment methods, or referral patterns.
2. Shortest Path Analysis
Measures degrees of separation from known fraudulent nodes. Users connected to confirmed fraudsters through short paths receive elevated risk scores based on relationship proximity.
3. Centrality Measures
Identifies key players in potential fraud networks using betweenness centrality and PageRank-style algorithms. Users who serve as bridges between multiple suspicious clusters warrant investigation.
4. Connected Components
Discovers isolated fraud clusters that operate independently from the main user graph. These often represent organized fraud attempts using completely fabricated identities.
Ensemble Scoring and Weight Learning
Individual algorithm outputs combine through learned weights based on historical fraud data:
def calculate_ensemble_risk_score(user_node):
# Individual algorithm scores
community_score = detect_fraud_communities(user_node)
path_score = shortest_path_to_fraudsters(user_node)
centrality_score = calculate_centrality_risk(user_node)
isolation_score = connected_component_analysis(user_node)
# Learned weights from historical fraud cases
weights = {
'community': 0.35,
'path': 0.25,
'centrality': 0.20,
'isolation': 0.20
}
# Weighted ensemble
final_score = (
community_score * weights['community'] +
path_score * weights['path'] +
centrality_score * weights['centrality'] +
isolation_score * weights['isolation']
)
return normalize_score(final_score)
Weight Learning: Supervised learning on historical theft data determines optimal algorithm weights. The system continuously updates these weights as new fraud patterns emerge and algorithm effectiveness evolves.
Performance Architecture at Scale
Achieving sub-second response times on graphs with 110+ million nodes requires sophisticated performance optimization strategies.
Score-Based Partitioning Strategy
Key Insight: "Thieves have a habit of repeat offense." This behavioral pattern enables performance optimization through risk-based data partitioning.
The system partitions the graph based on user risk scores rather than traditional sharding approaches:
+-------------------+ +-------------------+ +-------------------+
| HOT PARTITION | | WARM PARTITION | | COLD PARTITION |
| High-Risk Users | | Medium-Risk Users | | Clean Users |
+-------------------+ +-------------------+ +-------------------+
| - Known fraudsters| | - New users | | - Verified users |
| - Repeat offenders| | - Partial matches | | - Long history |
| - Recent flags | | - Moderate scores | | - Low risk scores |
| | | | | |
| In-Memory Caching | | SSD Storage | | Cold Storage |
| Sub-10ms queries | | ~50ms queries | | ~200ms queries |
+-------------------+ +-------------------+ +-------------------+
LFU Caching with Fraud Pattern Awareness
Traditional LRU caching doesn't optimize for fraud detection patterns. The system implements Least Frequently Used (LFU) caching with domain-specific optimizations:
class FraudAwareLFUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = {}
self.frequencies = {}
self.fraud_boost_factor = 2.0
def get_user_risk(self, user_id):
if user_id in self.cache:
# Boost frequency for high-risk users
base_freq = self.frequencies[user_id]
risk_score = self.cache[user_id]['risk_score']
if risk_score > HIGH_RISK_THRESHOLD:
self.frequencies[user_id] = base_freq * self.fraud_boost_factor
else:
self.frequencies[user_id] = base_freq + 1
return self.cache[user_id]
# Cache miss - fetch from graph database
return self.fetch_and_cache(user_id)
Neo4j Performance Optimization
Graph database optimization for real-time fraud detection requires multiple strategies:
// Pre-computed risk scores as node properties
MATCH (u:User)
SET u.risk_score = calculate_ensemble_score(u)
SET u.last_updated = datetime()
// Strategic indexing for hot queries
CREATE INDEX user_device_idx FOR (u:User) ON (u.device_fingerprint)
CREATE INDEX user_risk_idx FOR (u:User) ON (u.risk_score)
CREATE INDEX user_phone_idx FOR (u:User) ON (u.phone_hash)
// Optimized fraud detection query
MATCH (suspect:User {user_id: $userId})
MATCH (suspect)-[:SHARES_DEVICE|SHARES_PHONE*1..2]-(connected:User)
WHERE connected.risk_score > 0.7
RETURN suspect.risk_score,
collect(connected.user_id) as high_risk_connections
LIMIT 50
System Architecture Overview
The fraud detection system integrates multiple components for real-time risk assessment and graph maintenance:
FRAUD DETECTION SYSTEM ARCHITECTURE
USER EVENTS & DATA INGESTION
+-------------+ +-------------+ +-------------+ +-------------+
| New Booking |--->| User |--->| Kafka |--->| Feature |
| Events | | Behavior | | Streaming | | Extraction |
+-------------+ | Tracking | +-------------+ +-------------+
+-------------+ | |
| v
+-------------+ +-------------+ | +-------------+
| Device |--->| Identity | | | Graph Node |
| Fingerprint | | Resolution | | | Properties |
+-------------+ +-------------+ | +-------------+
| |
+-------------+ +-------------+ | v
| External |--->| Fraud | | +-------------+
| Fraud | | Reports | | | Neo4j |
| Reports | +-------------+ | | Database |
+-------------+ | +-------------+
| |
REAL-TIME PROCESSING | |
v v
+-------------+ +-------------+ +-------------+
| Risk |<---| Multi-Algo |<---| Graph |
| Assessment | | Ensemble | | Queries |
| Engine | +-------------+ +-------------+
+-------------+ | ^
| | |
v v |
+-------------+ +-------------+ |
| Booking | | Graph |------------+
| Decision | | Updates |
+-------------+ +-------------+
PERFORMANCE LAYER
+-------------------+ +-------------------+ +-------------------+
| HOT PARTITION | | WARM PARTITION | | COLD PARTITION |
| High-Risk Users | | Medium-Risk Users | | Clean Users |
| In-Memory Cache | | SSD Storage | | Cold Storage |
| <10ms response | | ~50ms response | | ~200ms response |
+-------------------+ +-------------------+ +-------------------+
Real-Time Graph Updates and Streaming
The system maintains graph consistency while processing continuous updates from booking events, fraud discoveries, and user behavior changes.
Event-Driven Graph Maintenance
def handle_fraud_discovery(fraudster_user_id):
"""Handle real-time fraud discovery and graph updates"""
# 1. Update fraudster node properties
update_node_classification(fraudster_user_id, "red")
# 2. Find all connected nodes within 2 degrees
connected_users = find_connected_nodes(
fraudster_user_id,
max_depth=2
)
# 3. Re-compute risk scores for connected nodes
for user_id in connected_users:
new_risk_score = calculate_ensemble_risk_score(user_id)
update_node_risk_score(user_id, new_risk_score)
# 4. Update partitioning if risk level changed significantly
if risk_level_changed(user_id, new_risk_score):
migrate_partition(user_id, new_risk_score)
# 5. Invalidate relevant cache entries
invalidate_cache_for_users(connected_users)
# 6. Trigger alerts for newly high-risk users
trigger_review_queue(get_high_risk_users(connected_users))
Streaming Feature Updates
User behavior changes continuously update graph features through Kafka streams:
@kafka_consumer('user-behavior-events')
def update_behavioral_features(event):
user_id = event['user_id']
behavioral_updates = {
'last_booking_time': event['timestamp'],
'booking_count': increment_counter(user_id, 'bookings'),
'cancellation_rate': calculate_cancellation_rate(user_id),
'location_variance': update_location_variance(user_id, event['location'])
}
# Update Neo4j node properties
update_user_properties(user_id, behavioral_updates)
# Re-evaluate risk if significant behavior change
if significant_change_detected(behavioral_updates):
new_risk_score = calculate_ensemble_risk_score(user_id)
propagate_risk_updates(user_id, new_risk_score)
Fraud Pattern Recognition
Device Sharing Networks
Pattern: Multiple accounts using the same device fingerprint, often with rapid account creation and immediate booking attempts.
Detection: Graph analysis reveals star-pattern connectivity where one device node connects to many user nodes with recent registration timestamps.
Referral Chain Exploitation
Pattern: Fraudsters create fake referral chains to exploit signup bonuses or referral rewards, often using slight variations of personal information.
Detection: Connected component analysis identifies isolated clusters of users with suspicious referral patterns and minimal external connectivity.
Geographic Clustering
Pattern: Coordinated attacks from specific geographic regions, often targeting high-value inventory during peak demand periods.
Detection: Spatial clustering algorithms identify unusual concentrations of new accounts with similar booking patterns in localized geographic areas.
Performance Results and Business Impact
Detection Accuracy: 70%+ high-risk detection rate with false positive rates maintained below 5% through ensemble algorithm balancing and continuous threshold tuning.
System Performance Characteristics:
- Query Response Time: Sub-second risk assessment for 95% of booking requests
- Graph Update Latency: New fraud discoveries propagate to connected nodes within 30 seconds
- Throughput: Processing 50,000+ booking risk assessments per hour during peak periods
- Scalability: Linear performance scaling through score-based partitioning and caching strategies
Business Impact:
- Fraud Loss Reduction: Prevented estimated 85% of potential fraud losses through proactive detection
- Investigation Efficiency: Reduced manual review burden by 60% through automated high-confidence classifications
- User Experience: Maintained seamless booking experience for legitimate users while blocking fraudulent attempts
- Network Effect Discovery: Uncovered previously unknown fraud networks through graph analysis, leading to systematic cleanup of organized fraud attempts
Key Engineering Insights
Graph-based approaches reveal fraud patterns invisible to traditional methods: Individual transaction analysis misses coordinated attacks and shared infrastructure that becomes obvious in graph representation.
Performance optimization must align with domain patterns: Score-based partitioning and fraud-aware caching leverage behavioral insights ("repeat offenders") for system optimization.
Real-time graph updates require careful consistency management: Balancing immediate fraud response with system performance requires sophisticated event-driven architectures and update propagation strategies.
Evolution and Future Considerations
Algorithm Adaptation: Fraud patterns evolve continuously, requiring dynamic algorithm weights and new detection methods. The ensemble approach enables adding new algorithms without disrupting existing detection capabilities.
Privacy and Compliance: Graph-based analysis of user relationships raises privacy considerations that require careful balance between fraud detection effectiveness and user data protection.
Adversarial Resistance: As fraudsters adapt to detection methods, the system requires continuous evolution to stay ahead of new attack patterns and evasion techniques.