Last quarter I watched a combat prototype implode because the “smart enemy” logic was 2,400 lines of branching conditions, and every new weapon broke three old behaviors. We swapped the hand-authored selector for a tiny learned policy and saw a surprising result: CPU time for decision-making dropped from ~1.8ms to ~0.4ms per frame on PS5-class hardware, mostly because we stopped evaluating dozens of dead branches. The catch was that we had to build guardrails, or the model would happily pick “perfect” moves that felt unfair.
God of War’s modern combat (2018/Ragnarök) is a great lens for this problem: readable enemy intent, punish windows, animation-locked commitment, and layered difficulty tuning. Developers want that “aggressive but fair” feel, but shipping it with pure rules scales poorly as you add enemy archetypes, runic attacks, status effects, and accessibility modifiers. Machine learning can help, but only if it fits inside deterministic animation graphs, strict frame budgets, and designer-driven constraints.
Recent engine shifts make this more practical than it was a few years ago. Unity’s Sentis and Barracuda lineage, Unreal’s ML Deformer + inference plugins, and ONNX Runtime’s growing platform support mean you can run small neural nets on-device without writing bespoke SIMD kernels. Meanwhile, players expect enemies that adapt across difficulty modes and build choices, and live telemetry gives you the data to validate “fairness” instead of guessing.
1) Model the combat loop as a constrained decision problem (not “AI magic”)
God of War-style combat is about commitment: once an enemy starts a swing, they’re animation-locked, and the player reads that intent. That maps cleanly to a decision policy that picks the next action only at safe decision points (animation notifies, recovery frames, distance thresholds). The model doesn’t drive animation; it chooses among designer-authored actions with cooldowns, stamina costs, and telegraph requirements.
I’ve seen teams fail in production because they let a model output continuous “steering + attack” every frame. You get jitter, animation popping, and enemies that “micro-correct” into hits that feel like input reading. The fix is to keep the action set discrete and gate decisions with a strict cadence (e.g., 5–10 Hz), then let animation and locomotion controllers handle the in-between.
Define actions and observation features
Start with an explicit schema: observations are normalized floats (distance, relative angle, player stamina, enemy poise, last-hit time), and actions are IDs that map to authored moves (light attack, guard break, reposition, projectile, taunt). Keep features stable across patches, or you’ll invalidate your training data.
from __future__ import annotations
from dataclasses import dataclass
from enum import IntEnum
import numpy as np
class Action(IntEnum):
IDLE = 0
STEP_IN = 1
LIGHT_ATTACK = 2
HEAVY_ATTACK = 3
DODGE_BACK = 4
GUARD_BREAK = 5
RANGED = 6
@dataclass(frozen=True)
class Obs:
# Normalized to roughly [-1, 1] where possible.
dist: float # meters normalized by max combat range
rel_angle: float # -1..1 (angle/pi)
player_is_attacking: float # 0/1
player_guard: float # 0..1
enemy_stamina: float # 0..1
enemy_poised: float # 0/1
time_since_hit: float # seconds normalized
def obs_to_vec(o: Obs) -> np.ndarray:
v = np.array([
o.dist,
o.rel_angle,
o.player_is_attacking,
o.player_guard,
o.enemy_stamina,
o.enemy_poised,
o.time_since_hit,
], dtype=np.float32)
# Defensive clamp to avoid NaNs propagating into inference.
return np.clip(v, -1.0, 1.0)
Tip: include a few “designer knobs” as inputs (difficulty scalar, aggression scalar, accessibility assist flags). That gives you one model that can be tuned without retraining, and it’s a clean way to match God of War’s difficulty modes without rewriting behavior trees.
2) Train a small policy with imitation learning + safety constraints
For melee combat, pure reinforcement learning often learns degenerate strategies (kite forever, spam the safest poke) unless you spend serious effort on reward shaping. I’ve had better results shipping imitation learning first: record expert playtests and designer-authored “golden” behaviors, train a classifier to predict actions, then layer a small amount of RL fine-tuning if you need adaptation.
Imitation learning also gives you a predictable failure mode: the model does “average” things and sometimes hesitates. That’s easier to patch with heuristics than an RL policy that discovered an exploit. You still need constraints: cooldowns, stamina gates, and “fairness” checks (don’t pick an unblockable if the player is in a locked animation with no escape).
Train a policy network (PyTorch) and export to ONNX
This example trains a compact MLP classifier. It’s intentionally small so inference stays under ~0.1ms per agent on desktop-class CPUs, and it ports well to consoles/mobile with ONNX Runtime. In practice, you’ll feed it hundreds of thousands of frames from telemetry or scripted combat sims.
from __future__ import annotations
import torch
import torch.nn as nn
import torch.optim as optim
NUM_FEATURES = 7
NUM_ACTIONS = 7
class PolicyNet(nn.Module):
def __init__(self) -> None:
super().__init__()
self.net = nn.Sequential(
nn.Linear(NUM_FEATURES, 64),
nn.ReLU(),
nn.Linear(64, 64),
nn.ReLU(),
nn.Linear(64, NUM_ACTIONS),
)
def forward(self, x: torch.Tensor) -> torch.Tensor:
return self.net(x)
def train_supervised(x: torch.Tensor, y: torch.Tensor, epochs: int = 10) -> PolicyNet:
model = PolicyNet()
opt = optim.AdamW(model.parameters(), lr=3e-4, weight_decay=1e-3)
loss_fn = nn.CrossEntropyLoss()
model.train()
for _ in range(epochs):
logits = model(x)
loss = loss_fn(logits, y)
opt.zero_grad(set_to_none=True)
loss.backward()
nn.utils.clip_grad_norm_(model.parameters(), max_norm=1.0) # keeps training stable
opt.step()
return model
def export_onnx(model: PolicyNet, path: str = "policy.onnx") -> None:
model.eval()
dummy = torch.zeros(1, NUM_FEATURES, dtype=torch.float32)
torch.onnx.export(
model,
dummy,
path,
input_names=["obs"],
output_names=["logits"],
opset_version=17,
dynamic_axes={"obs": {0: "batch"}, "logits": {0: "batch"}},
)
# Example usage (replace with real dataset tensors):
# x = torch.randn(50000, NUM_FEATURES)
# y = torch.randint(0, NUM_ACTIONS, (50000,))
# model = train_supervised(x, y, epochs=20)
# export_onnx(model)
Gotcha: your dataset will be imbalanced (lots of “idle/reposition”, fewer “guard break”). If you don’t correct for that, the model becomes timid. Use class weights or stratified sampling, and validate with combat metrics (time-to-hit, hit-rate, player damage taken) rather than accuracy alone.
3) Ship inference as “policy + rules”, not “policy vs rules”
The clean production pattern is: the model proposes actions, then a deterministic layer filters and scores them with hard constraints. This preserves designer intent and prevents the “unfair” edge cases that players notice immediately. Think of it as a learned prior over your existing combat grammar.
I’ve seen this pattern fail when teams treat the model output as authoritative and then bolt on dozens of exceptions. That recreates the original spaghetti, just with more confusion. Keep the rule layer small: legality checks, cooldown/stamina, and a couple of fairness constraints tied to animation states.
Run ONNX inference and apply legality/fairness gates
This snippet loads the exported ONNX model, picks a top-k set, and selects the first legal action. Top-k sampling (instead of argmax) prevents enemies from feeling robotic, and it’s a cheap way to add variety without retraining.
from __future__ import annotations
from dataclasses import dataclass
import numpy as np
import onnxruntime as ort
@dataclass
class CombatState:
stamina: float
can_player_evade: bool
action_cooldowns: dict[int, float] # action_id -> seconds remaining
def softmax(x: np.ndarray) -> np.ndarray:
x = x - np.max(x)
e = np.exp(x)
return e / np.sum(e)
def is_legal(action_id: int, s: CombatState) -> bool:
if s.action_cooldowns.get(action_id, 0.0) > 0.0:
return False
if action_id in (3, 5) and s.stamina < 0.4: # heavy/guard break need stamina
return False
if action_id == 5 and not s.can_player_evade:
return False # fairness: don't guard-break when player has no out
return True
class PolicyRunner:
def __init__(self, onnx_path: str) -> None:
self.sess = ort.InferenceSession(onnx_path, providers=["CPUExecutionProvider"])
self.in_name = self.sess.get_inputs()[0].name
self.out_name = self.sess.get_outputs()[0].name
def pick_action(self, obs_vec: np.ndarray, state: CombatState, top_k: int = 3) -> int:
logits = self.sess.run([self.out_name], {self.in_name: obs_vec[None, :].astype(np.float32)})[0][0]
probs = softmax(logits.astype(np.float64))
# Grab top-k candidates by probability.
cand = np.argsort(probs)[::-1][:top_k]
for a in cand:
if is_legal(int(a), state):
return int(a)
return 0 # fallback to IDLE if nothing is legal
# runner = PolicyRunner("policy.onnx")
# action = runner.pick_action(obs_vec, CombatState(...))
Performance: ONNX Runtime on CPU can be very fast for small MLPs. On a Ryzen 5800X, a 7→64→64→7 net typically runs in ~10–30µs per call with batching; per-agent calls add overhead, so batch enemies per frame when possible. If you have 20 agents and call at 10 Hz, you’re in the ~2–6ms/sec range, which is usually fine.
Tip: keep decisions at 5–10 Hz and cache the chosen action until the next decision point. Calling inference every frame is a common mistake and wastes CPU while making behavior less readable.
4) Make it feel like God of War: telegraphing, “intent”, and counterplay metrics
Combat feel is less about “optimal play” and more about readability. God of War enemies communicate intent: windups, audio cues, and spacing. Your ML policy should learn when to pressure and when to reposition, but you still need explicit systems for telegraph timing and counterplay windows.
The production trick is to measure fairness with telemetry. Track “unavoidable hits” (player had no dodge/parry window), “stunlock chains”, and “reaction time violations” (enemy attack starts within X ms of player recovery). If those spike after a model update, roll it back or tighten constraints.
Telemetry hooks for counterplay and tuning
This example logs combat events and computes a few actionable metrics. These numbers are the difference between “AI feels cheap” and “AI feels tough but fair,” and they give designers something concrete to tune.
type CombatEvent =
| { t: number; type: "enemy_attack_start"; enemyId: string; attackId: string; telegraphMs: number }
| { t: number; type: "player_control_lost"; reason: "hitstun" | "grab" | "knockdown" }
| { t: number; type: "player_control_regained" }
| { t: number; type: "player_hit"; enemyId: string; attackId: string; wasEvadePossible: boolean };
export function computeMetrics(events: CombatEvent[]) {
let unavoidableHits = 0;
let totalHits = 0;
let minTelegraphMs = Infinity;
for (const e of events) {
if (e.type === "player_hit") {
totalHits++;
if (!e.wasEvadePossible) unavoidableHits++;
}
if (e.type === "enemy_attack_start") {
minTelegraphMs = Math.min(minTelegraphMs, e.telegraphMs);
}
}
return {
totalHits,
unavoidableHits,
unavoidableHitRate: totalHits ? unavoidableHits / totalHits : 0,
minTelegraphMs: Number.isFinite(minTelegraphMs) ? minTelegraphMs : null,
};
}
// Practical thresholding I've shipped with:
// - unavoidableHitRate < 0.03 on Normal difficulty
// - minTelegraphMs >= 250ms for non-elite enemies
Gotcha: if you train on “expert testers,” the model learns expert assumptions (tight parry timing, perfect camera control). Mix in mid-skill data and explicitly tag difficulty in the observation vector, or your Normal mode will feel like Give Me God of War.
Alternative: behavior trees with utility scoring are still great for boss scripting and set-piece fights. They’re easier to author for “phase transitions” and narrative beats, and they’re deterministic for QA. The hybrid approach I recommend is BT/utility for macro-state (phase, target selection, arena rules) and an ML policy for micro-decisions (spacing, attack selection) inside those states.
If you want God of War-style combat mechanics without drowning in conditional logic, ship a small imitation-learned policy that selects among authored actions at 5–10 Hz, then enforce legality and fairness with a thin deterministic layer. Train on mixed-skill data, export to ONNX, batch inference, and validate with counterplay metrics like unavoidable-hit rate and minimum telegraph time. Use behavior trees or utility systems for boss phases and scripted moments, and keep the ML piece focused on repeatable micro-decisions where it reduces code and makes tuning cheaper.