Violation Weights & Scoring System

Advanced

VoxelAI's ChatWarden uses a sophisticated scoring system that combines violation severity, weights, and special modifiers to determine appropriate punishments.

Core Concepts

Base Score Formula

Base Score = Violation Severity × 
             Violation Weight × 
             Special Modifiers

Final Score Formula

Final Score = Base Score × 
              Escalation Multiplier

Violation Weights

Violation Weight Configuration

violation_weights:
  spam: 0.5        # Lower weight due to frequency
  toxicity: 1.0    # Standard weight
  harassment: 1.4  # Higher due to targeting
  hate_speech: 2.0 # Highest weight
  profanity: 0.8   # Moderate weight

Severity Levels (1-5)

1

Minor Violation

Level 1

Accidental/unintentional, minimal impact, first occurrence, easy to correct

2

Low Violation

Level 2

Mild disruption, some intent, minor impact, quick to resolve

3

Medium Violation

Level 3

Clear intent, moderate impact, repeated behavior, needs intervention

4

High Violation

Level 4

Strong intent, significant impact, pattern of behavior, requires action

5

Severe Violation

Level 5

Malicious intent, maximum impact, coordinated action, immediate response

Escalation Calculation Methods

Severity-Weighted Escalation

Escalation based on cumulative severity of violations

escalation:
  calculation_method: "severity"
  base_multiplier: 1.0
  severity_factor: 0.1
  max_multiplier: 3.0

Formula:
Escalation = 1.0 + (0.1 × Sum of Recent Severities)

Example:
Recent violations: severity 3, 2, 4
Escalation = 1.0 + (0.1 × 9) = 1.9×

Advantages: More accurate escalation reflecting violation severity

Fair: Minor violations don't escalate as much as severe ones

Responsive: Severe violations immediately increase future escalation

Special Modifiers

Spam Modifier

Applied to consolidated spam violations

spam_modifier: 1.5

Batch Modifiers

For multiple violations in batch

batch_modifiers:
  enabled: true
  threshold: 5      # Messages in batch
  multiplier: 1.2   # Score multiplier

Time-based Modifiers

For rapid or persistent violations

time_modifiers:
  rapid_repeat: 1.3  # Quick repeated violations
  persistent: 1.4    # Violations over time

Scoring Examples

Simple Spam

Severity:2 (low)
Weight:0.5 (spam)
Modifier:1.5 (spam)

Base Score:1.5
2 × 0.5 × 1.5 = 1.5

Toxic Behavior

Severity:4 (high)
Weight:1.0 (toxicity)
Modifier:1.0 (none)

Base Score:4.0
4 × 1.0 × 1.0 = 4.0

Hate Speech

Severity:5 (severe)
Weight:2.0 (hate speech)
Modifier:1.0 (none)

Base Score:10.0
5 × 2.0 × 1.0 = 10.0

Threshold Mapping

Score to Punishment

punishment_score_thresholds:
  warn: 1.0     # Even low scores
  mute: 3.0     # Medium severity
  tempban: 8.0  # High severity
  ban: 20.0     # Extreme cases

Example Mappings

Score 1.5
Warning
Score 4.0
Mute
Score 9.0
Tempban
Score 21.0
Ban

Configuration Tips

Weight Balancing Guidelines

• Keep weights relative to each other

• Consider community impact of each violation type

• Test combinations thoroughly

• Monitor punishment distribution

Best Practices

Regular Review

• Monitor punishment distribution

• Check false positive rates

• Review edge cases

• Adjust as needed

Testing

• Use test environment

• Try edge cases

• Verify escalation

• Check combinations

Documentation

• Log changes

• Track effectiveness

• Document rationale

• Share findings

Troubleshooting

Scores Too High

• Lower weights for problematic violation types

• Reduce special modifiers

• Adjust punishment thresholds upward

• Check escalation multipliers

Scores Too Low

• Increase weights for under-punished violations

• Add appropriate modifiers

• Lower punishment thresholds

• Review severity assignments

Inconsistent Results

• Check calculation logic

• Verify weight configurations

• Test edge cases manually

• Enable detailed logging