Violation Detection System

AI-Powered

VoxelAI's ChatWarden uses advanced AI to detect and classify chat violations in real-time with sophisticated context understanding.

Violation Types

Spam

Weight: 0.5

• Repeated identical messages

• Excessive caps

• Chat flooding

• Server advertising

Toxicity

Weight: 1.0

• General insults

• Personal attacks

• Negative behavior

• Threats

Harassment

Weight: 1.4

• Targeted behavior

• Persistent attacks

• Following/stalking

• Coordinated targeting

Hate Speech

Weight: 2.0

• Discriminatory language

• Slurs

• Targeted group attacks

• Extremist content

Profanity

Weight: 0.8

• Inappropriate language

• Sexual content

• Explicit material

• Contextual swearing

AI Analysis Process

Batch Processing Configuration

batch_processing:
  enabled: true
  chunk_size: 75               # Messages per batch
  process_interval_seconds: 15  # Processing frequency
  max_batch_delay_minutes: 1    # Maximum wait time
  queue_limit: 300             # Maximum queued messages

Severity Levels

Spam

Weight: 0.5
1

2-3 repeated messages

2

4-5 repeated messages

3

6+ repeated messages

4

Extreme flooding (10+ messages)

5

Bot-like behavior or advertising

Toxicity

Weight: 1.0
1

Mild criticism

2

General insults

3

Personal attacks

4

Severe attacks/threats

5

Extreme toxicity campaigns

Harassment

Weight: 1.4
1

Mild targeting

2

Clear targeting pattern

3

Persistent harassment

4

Severe harassment/threats

5

Extreme coordinated harassment

Hate Speech

Weight: 2.0
1

Mild discriminatory language

2

Clear discriminatory intent

3

Targeted hate speech

4

Violent hate speech

5

Extreme hate speech/calls for violence

Profanity

Weight: 0.8
1

Mild swearing

2

Moderate profanity

3

Directed profanity

4

Extreme profanity

5

Graphic sexual content

Confidence Thresholds

Threshold Configuration

Minimum confidence required for violation detection

confidence_thresholds:
  toxicity: 0.8       # High threshold
  harassment: 0.85    # Very high
  hate_speech: 0.9    # Highest threshold
  spam: 0.8          # High threshold
  profanity: 0.75    # Moderate
  custom_rules: 0.8   # High threshold

False Positive Prevention

Gaming Context

• Understands competitive banter

• Recognizes gaming terminology

• Considers game events

• Adapts to server culture

Friendly Banter

• Distinguishes intent

• Considers relationships

• Analyzes conversation flow

• Respects friend dynamics

Message Context

• Conversation flow

• Player relationships

• Game events

• Server events

Custom Rules

Server-Specific Rules

Add custom violation types for your server

custom_rules:
  enabled: true
  rules:
    - name: "server_promotion"
      pattern: "play.competitor.com"
      severity: 4
      confidence: 1.0
    - name: "forbidden_words"
      pattern: ["word1", "word2"]
      severity: 2
      confidence: 0.9

Monitoring & Debugging

Debug Mode

core:
  debug_mode: true
  log_violations: true

Enables detailed logging for analysis and debugging

Statistics Command

/voxelai stats

• Detection rates

• False positive rates

• Most common violations

• AI response times

Best Practices

Regular Monitoring

• Check violation patterns

• Review false positives

• Adjust thresholds as needed

• Monitor system performance

Configuration Tuning

• Start with high confidence thresholds

• Adjust based on server needs

• Balance sensitivity vs. accuracy

• Test changes gradually

Staff Training

• Understand detection logic

• Know how to handle appeals

• Monitor for abuse patterns

• Regular system reviews