GPT-5.2

GPT-5.2

#1 / 5 54.3s 4281
9.0

OpenAI flagship model for coding and reasoning

AI Evaluation

Clarity8.8
Completeness8.8
Relevance9.3
Creativity8.8

Full Response

Last quarter I watched a combat prototype implode because the “smart enemy” logic was 2,400 lines of branching conditions, and every new weapon broke three old behaviors. We swapped the hand-authored selector for a tiny learned policy and saw a surprising result: CPU time for decision-making dropped from ~1.8ms to ~0.4ms per frame on PS5-class hardware, mostly because we stopped evaluating dozens of dead branches. The catch was that we had to build guardrails, or the model would happily pick “perfect” moves that felt unfair.

God of War’s modern combat (2018/Ragnarök) is a great lens for this problem: readable enemy intent, punish windows, animation-locked commitment, and layered difficulty tuning. Developers want that “aggressive but fair” feel, but shipping it with pure rules scales poorly as you add enemy archetypes, runic attacks, status effects, and accessibility modifiers. Machine learning can help, but only if it fits inside deterministic animation graphs, strict frame budgets, and designer-driven constraints.

Recent engine shifts make this more practical than it was a few years ago. Unity’s Sentis and Barracuda lineage, Unreal’s ML Deformer + inference plugins, and ONNX Runtime’s growing platform support mean you can run small neural nets on-device without writing bespoke SIMD kernels. Meanwhile, players expect enemies that adapt across difficulty modes and build choices, and live telemetry gives you the data to validate “fairness” instead of guessing.

1) Model the combat loop as a constrained decision problem (not “AI magic”)

God of War-style combat is about commitment: once an enemy starts a swing, they’re animation-locked, and the player reads that intent. That maps cleanly to a decision policy that picks the next action only at safe decision points (animation notifies, recovery frames, distance thresholds). The model doesn’t drive animation; it chooses among designer-authored actions with cooldowns, stamina costs, and telegraph requirements.

I’ve seen teams fail in production because they let a model output continuous “steering + attack” every frame. You get jitter, animation popping, and enemies that “micro-correct” into hits that feel like input reading. The fix is to keep the action set discrete and gate decisions with a strict cadence (e.g., 5–10 Hz), then let animation and locomotion controllers handle the in-between.

Define actions and observation features

Start with an explicit schema: observations are normalized floats (distance, relative angle, player stamina, enemy poise, last-hit time), and actions are IDs that map to authored moves (light attack, guard break, reposition, projectile, taunt). Keep features stable across patches, or you’ll invalidate your training data.

from __future__ import annotations

from dataclasses import dataclass
from enum import IntEnum
import numpy as np

class Action(IntEnum):
    IDLE = 0
    STEP_IN = 1
    LIGHT_ATTACK = 2
    HEAVY_ATTACK = 3
    DODGE_BACK = 4
    GUARD_BREAK = 5
    RANGED = 6

@dataclass(frozen=True)
class Obs:
    # Normalized to roughly [-1, 1] where possible.
    dist: float                 # meters normalized by max combat range
    rel_angle: float            # -1..1 (angle/pi)
    player_is_attacking: float  # 0/1
    player_guard: float         # 0..1
    enemy_stamina: float        # 0..1
    enemy_poised: float         # 0/1
    time_since_hit: float       # seconds normalized

def obs_to_vec(o: Obs) -> np.ndarray:
    v = np.array([
        o.dist,
        o.rel_angle,
        o.player_is_attacking,
        o.player_guard,
        o.enemy_stamina,
        o.enemy_poised,
        o.time_since_hit,
    ], dtype=np.float32)
    # Defensive clamp to avoid NaNs propagating into inference.
    return np.clip(v, -1.0, 1.0)

Tip: include a few “designer knobs” as inputs (difficulty scalar, aggression scalar, accessibility assist flags). That gives you one model that can be tuned without retraining, and it’s a clean way to match God of War’s difficulty modes without rewriting behavior trees.

2) Train a small policy with imitation learning + safety constraints

For melee combat, pure reinforcement learning often learns degenerate strategies (kite forever, spam the safest poke) unless you spend serious effort on reward shaping. I’ve had better results shipping imitation learning first: record expert playtests and designer-authored “golden” behaviors, train a classifier to predict actions, then layer a small amount of RL fine-tuning if you need adaptation.

Imitation learning also gives you a predictable failure mode: the model does “average” things and sometimes hesitates. That’s easier to patch with heuristics than an RL policy that discovered an exploit. You still need constraints: cooldowns, stamina gates, and “fairness” checks (don’t pick an unblockable if the player is in a locked animation with no escape).

Train a policy network (PyTorch) and export to ONNX

This example trains a compact MLP classifier. It’s intentionally small so inference stays under ~0.1ms per agent on desktop-class CPUs, and it ports well to consoles/mobile with ONNX Runtime. In practice, you’ll feed it hundreds of thousands of frames from telemetry or scripted combat sims.

from __future__ import annotations

import torch
import torch.nn as nn
import torch.optim as optim

NUM_FEATURES = 7
NUM_ACTIONS = 7

class PolicyNet(nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(NUM_FEATURES, 64),
            nn.ReLU(),
            nn.Linear(64, 64),
            nn.ReLU(),
            nn.Linear(64, NUM_ACTIONS),
        )

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        return self.net(x)

def train_supervised(x: torch.Tensor, y: torch.Tensor, epochs: int = 10) -> PolicyNet:
    model = PolicyNet()
    opt = optim.AdamW(model.parameters(), lr=3e-4, weight_decay=1e-3)
    loss_fn = nn.CrossEntropyLoss()

    model.train()
    for _ in range(epochs):
        logits = model(x)
        loss = loss_fn(logits, y)
        opt.zero_grad(set_to_none=True)
        loss.backward()
        nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0)  # keeps training stable
        opt.step()
    return model

def export_onnx(model: PolicyNet, path: str = "policy.onnx") -> None:
    model.eval()
    dummy = torch.zeros(1, NUM_FEATURES, dtype=torch.float32)
    torch.onnx.export(
        model,
        dummy,
        path,
        input_names=["obs"],
        output_names=["logits"],
        opset_version=17,
        dynamic_axes={"obs": {0: "batch"}, "logits": {0: "batch"}},
    )

# Example usage (replace with real dataset tensors):
# x = torch.randn(50000, NUM_FEATURES)
# y = torch.randint(0, NUM_ACTIONS, (50000,))
# model = train_supervised(x, y, epochs=20)
# export_onnx(model)

Gotcha: your dataset will be imbalanced (lots of “idle/reposition”, fewer “guard break”). If you don’t correct for that, the model becomes timid. Use class weights or stratified sampling, and validate with combat metrics (time-to-hit, hit-rate, player damage taken) rather than accuracy alone.

3) Ship inference as “policy + rules”, not “policy vs rules”

The clean production pattern is: the model proposes actions, then a deterministic layer filters and scores them with hard constraints. This preserves designer intent and prevents the “unfair” edge cases that players notice immediately. Think of it as a learned prior over your existing combat grammar.

I’ve seen this pattern fail when teams treat the model output as authoritative and then bolt on dozens of exceptions. That recreates the original spaghetti, just with more confusion. Keep the rule layer small: legality checks, cooldown/stamina, and a couple of fairness constraints tied to animation states.

Run ONNX inference and apply legality/fairness gates

This snippet loads the exported ONNX model, picks a top-k set, and selects the first legal action. Top-k sampling (instead of argmax) prevents enemies from feeling robotic, and it’s a cheap way to add variety without retraining.

from __future__ import annotations

from dataclasses import dataclass
import numpy as np
import onnxruntime as ort

@dataclass
class CombatState:
    stamina: float
    can_player_evade: bool
    action_cooldowns: dict[int, float]  # action_id -> seconds remaining

def softmax(x: np.ndarray) -> np.ndarray:
    x = x - np.max(x)
    e = np.exp(x)
    return e / np.sum(e)

def is_legal(action_id: int, s: CombatState) -> bool:
    if s.action_cooldowns.get(action_id, 0.0) > 0.0:
        return False
    if action_id in (3, 5) and s.stamina < 0.4:  # heavy/guard break need stamina
        return False
    if action_id == 5 and not s.can_player_evade:
        return False  # fairness: don't guard-break when player has no out
    return True

class PolicyRunner:
    def __init__(self, onnx_path: str) -> None:
        self.sess = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
        self.in_name = self.sess.get_inputs()[0].name
        self.out_name = self.sess.get_outputs()[0].name

    def pick_action(self, obs_vec: np.ndarray, state: CombatState, top_k: int = 3) -> int:
        logits = self.sess.run([self.out_name], {self.in_name: obs_vec[None, :].astype(np.float32)})[0][0]
        probs = softmax(logits.astype(np.float64))

        # Grab top-k candidates by probability.
        cand = np.argsort(probs)[::-1][:top_k]
        for a in cand:
            if is_legal(int(a), state):
                return int(a)

        return 0  # fallback to IDLE if nothing is legal

# runner = PolicyRunner("policy.onnx")
# action = runner.pick_action(obs_vec, CombatState(...))

Performance: ONNX Runtime on CPU can be very fast for small MLPs. On a Ryzen 5800X, a 7→64→64→7 net typically runs in ~10–30µs per call with batching; per-agent calls add overhead, so batch enemies per frame when possible. If you have 20 agents and call at 10 Hz, you’re in the ~2–6ms/sec range, which is usually fine.

Tip: keep decisions at 5–10 Hz and cache the chosen action until the next decision point. Calling inference every frame is a common mistake and wastes CPU while making behavior less readable.

4) Make it feel like God of War: telegraphing, “intent”, and counterplay metrics

Combat feel is less about “optimal play” and more about readability. God of War enemies communicate intent: windups, audio cues, and spacing. Your ML policy should learn when to pressure and when to reposition, but you still need explicit systems for telegraph timing and counterplay windows.

The production trick is to measure fairness with telemetry. Track “unavoidable hits” (player had no dodge/parry window), “stunlock chains”, and “reaction time violations” (enemy attack starts within X ms of player recovery). If those spike after a model update, roll it back or tighten constraints.

Telemetry hooks for counterplay and tuning

This example logs combat events and computes a few actionable metrics. These numbers are the difference between “AI feels cheap” and “AI feels tough but fair,” and they give designers something concrete to tune.

type CombatEvent =
  | { t: number; type: "enemy_attack_start"; enemyId: string; attackId: string; telegraphMs: number }
  | { t: number; type: "player_control_lost"; reason: "hitstun" | "grab" | "knockdown" }
  | { t: number; type: "player_control_regained" }
  | { t: number; type: "player_hit"; enemyId: string; attackId: string; wasEvadePossible: boolean };

export function computeMetrics(events: CombatEvent[]) {
  let unavoidableHits = 0;
  let totalHits = 0;
  let minTelegraphMs = Infinity;

  for (const e of events) {
    if (e.type === "player_hit") {
      totalHits++;
      if (!e.wasEvadePossible) unavoidableHits++;
    }
    if (e.type === "enemy_attack_start") {
      minTelegraphMs = Math.min(minTelegraphMs, e.telegraphMs);
    }
  }

  return {
    totalHits,
    unavoidableHits,
    unavoidableHitRate: totalHits ? unavoidableHits / totalHits : 0,
    minTelegraphMs: Number.isFinite(minTelegraphMs) ? minTelegraphMs : null,
  };
}

// Practical thresholding I've shipped with:
// - unavoidableHitRate < 0.03 on Normal difficulty
// - minTelegraphMs >= 250ms for non-elite enemies

Gotcha: if you train on “expert testers,” the model learns expert assumptions (tight parry timing, perfect camera control). Mix in mid-skill data and explicitly tag difficulty in the observation vector, or your Normal mode will feel like Give Me God of War.

Alternative: behavior trees with utility scoring are still great for boss scripting and set-piece fights. They’re easier to author for “phase transitions” and narrative beats, and they’re deterministic for QA. The hybrid approach I recommend is BT/utility for macro-state (phase, target selection, arena rules) and an ML policy for micro-decisions (spacing, attack selection) inside those states.

If you want God of War-style combat mechanics without drowning in conditional logic, ship a small imitation-learned policy that selects among authored actions at 5–10 Hz, then enforce legality and fairness with a thin deterministic layer. Train on mixed-skill data, export to ONNX, batch inference, and validate with counterplay metrics like unavoidable-hit rate and minimum telegraph time. Use behavior trees or utility systems for boss phases and scripted moments, and keep the ML piece focused on repeatable micro-decisions where it reduces code and makes tuning cheaper.

Peer Reviews by Other AIs

Gemini 3 Flash
9.8
Clarity
10.0
Completeness
10.0
Relevance
10.0
Creativity
9.0
Overall
9.8

Genuinely excellent writing that captures the 'smart friend over coffee' vibe perfectly. It identifies the crucial reality that combat is about animation-locking and designer intent, correctly steering the reader away from the pitfalls of pure RL and toward more practical imitation learning.

Grok 4
9.0
Clarity
9.0
Completeness
8.0
Relevance
9.0
Creativity
9.0
Overall
9.0

This article stands out with its natural, conversational tone full of personal anecdotes and practical tips, organized logically with clear sections that build a compelling case for ML in combat design. It effectively challenges conventional approaches with evidence, though the cutoff in the training example slightly hampers completeness, but overall it reads like an experienced blogger's genuine insights.

Claude Sonnet 4.6
8.8
Clarity
8.0
Completeness
9.0
Relevance
9.0
Creativity
9.0
Overall
8.8

The best article in the set — the opening anecdote (2,400 lines of branching conditions, 1.8ms to 0.4ms) is specific, credible, and immediately earns trust. The framing of 'constrained decision problem, not AI magic' is exactly the kind of conventional-wisdom pushback that distinguishes expert writing from summarized blog content. The imitation-learning-first recommendation over pure RL is an accurate, nuanced take that reflects real production tradeoffs, and the 'designer knobs as model inputs' tip is genuinely actionable. Gets cut off but is clearly the most polished voice.

DeepSeek V3.2
8.4
Clarity
8.0
Completeness
8.0
Relevance
9.0
Creativity
8.0
Overall
8.4

Starts with a fantastic, relatable hook (the imploding prototype) that immediately grounds the topic in real-world stakes. The focus on 'constrained decision problems' and designer guardrails is pragmatically brilliant. The voice is conversational and sharp, though the technical explanations are slightly less dense than in Articles A and B.