Violation Weights & Scoring System
VoxelAI's ChatWarden uses a sophisticated scoring system that combines violation severity, weights, and special modifiers to determine appropriate punishments.
Sophisticated Scoring
Core Concepts
Base Score Formula
Base Score = Violation Severity ×
Violation Weight ×
Special ModifiersFinal Score Formula
Final Score = Base Score ×
Escalation MultiplierViolation Weights
Violation Weight Configuration
violation_weights: spam: 0.5 # Lower weight due to frequency toxicity: 1.0 # Standard weight harassment: 1.4 # Higher due to targeting hate_speech: 2.0 # Highest weight profanity: 0.8 # Moderate weight
Severity Levels (1-5)
Minor Violation
Accidental/unintentional, minimal impact, first occurrence, easy to correct
Low Violation
Mild disruption, some intent, minor impact, quick to resolve
Medium Violation
Clear intent, moderate impact, repeated behavior, needs intervention
High Violation
Strong intent, significant impact, pattern of behavior, requires action
Severe Violation
Malicious intent, maximum impact, coordinated action, immediate response
Escalation Calculation Methods
New Severity-Weighted Method
Severity-Weighted Escalation
Escalation based on cumulative severity of violations
escalation: calculation_method: "severity" base_multiplier: 1.0 severity_factor: 0.1 max_multiplier: 3.0 Formula: Escalation = 1.0 + (0.1 × Sum of Recent Severities) Example: Recent violations: severity 3, 2, 4 Escalation = 1.0 + (0.1 × 9) = 1.9×
Advantages: More accurate escalation reflecting violation severity
Fair: Minor violations don't escalate as much as severe ones
Responsive: Severe violations immediately increase future escalation
Special Modifiers
Spam Modifier
Applied to consolidated spam violations
spam_modifier: 1.5Batch Modifiers
For multiple violations in batch
batch_modifiers: enabled: true threshold: 5 # Messages in batch multiplier: 1.2 # Score multiplier
Time-based Modifiers
For rapid or persistent violations
time_modifiers: rapid_repeat: 1.3 # Quick repeated violations persistent: 1.4 # Violations over time
Scoring Examples
Simple Spam
Toxic Behavior
Hate Speech
Threshold Mapping
Score to Punishment
punishment_score_thresholds: warn: 1.0 # Even low scores mute: 3.0 # Medium severity tempban: 8.0 # High severity ban: 20.0 # Extreme cases
Example Mappings
Configuration Tips
Weight Balancing Guidelines
• Keep weights relative to each other
• Consider community impact of each violation type
• Test combinations thoroughly
• Monitor punishment distribution
Best Practices
Regular Review
• Monitor punishment distribution
• Check false positive rates
• Review edge cases
• Adjust as needed
Testing
• Use test environment
• Try edge cases
• Verify escalation
• Check combinations
Documentation
• Log changes
• Track effectiveness
• Document rationale
• Share findings
Troubleshooting
Scores Too High
• Lower weights for problematic violation types
• Reduce special modifiers
• Adjust punishment thresholds upward
• Check escalation multipliers
Scores Too Low
• Increase weights for under-punished violations
• Add appropriate modifiers
• Lower punishment thresholds
• Review severity assignments
Inconsistent Results
• Check calculation logic
• Verify weight configurations
• Test edge cases manually
• Enable detailed logging