Visualizing Neural Network Forward and Backpropagation in Three.js

I wanted to create a simple visual representation of how neural networks actually work, not just the static diagrams you see in textbooks, but something that shows signals flowing forward and errors propagating backward. The result is the 3D animation on my homepage.

The challenge was representing the math visually while keeping it performant in the browser. Here's how I approached it.

The Animation State Machine

The visualization cycles through six phases that mirror actual neural network training:

phase: "idle" | "input" | "propagate" | "output" | "backprop" | "weight_update";

Each phase has its own duration, and the full cycle runs about 12 seconds:

export const ANIMATION_CONFIG = {
  cycleDuration: 12000,
  inputDuration: 800,
  propagationDuration: 1000,
  outputDuration: 600,
  backpropDuration: 1000,
  weightUpdateDuration: 600,
};

Forward Propagation: The Math

To keep it simple, during forward propagation, each neuron computes a weighted sum of its inputs, then applies an activation function. For hidden layers, I use ReLU. It simply returns the input if positive, otherwise zero:

function relu(x: number): number {
  return Math.max(0, x);
}

The output layer uses softmax to convert raw scores into probabilities. Each value becomes e^x / sum(e^x) for all values:

function softmax(values: number[]): number[] {
  const max = Math.max(...values);
  const exps = values.map((v) => Math.exp(v - max)); // numerical stability
  const sum = exps.reduce((a, b) => a + b, 0);
  return exps.map((e) => e / sum);
}

One important implementation detail: in the actual code I don’t repeatedly scan the full connections array while animating. Instead I build a lookup map once so I can go from (fromNeuron, toNeuron) → weight in O(1). That keeps the animation smooth.

const connectionWeightByKey = new Map<string, number>();
for (const c of connections) {
  connectionWeightByKey.set(`${c.from}->${c.to}`, c.weight);
}

Each layer's activation is computed by summing weighted inputs from the previous layer:

function propagateLayer(layerIndex: number): void {
  if (layerIndex === 0 || layerIndex >= layerCount) return;

  const prevLayer = neuronsByLayer[layerIndex - 1];
  const currentLayer = neuronsByLayer[layerIndex];

  state.activations[layerIndex] = currentLayer.map((neuron) => {
    let sum = 0;
    prevLayer.forEach((prevNeuron, j) => {
      const w = connectionWeightByKey.get(
        `${prevNeuron.index}->${neuron.index}`,
      );
      if (w !== undefined) {
        const prevActivation = state.activations[layerIndex - 1][j];
        sum += prevActivation * w;
      }
    });

    // Hidden layers: ReLU
    // Output layer: raw logits (softmax applied after)
    return layerIndex === layerCount - 1 ? sum : relu(sum + 0.5);
  });

  // Softmax on output layer
  if (layerIndex === layerCount - 1) {
    state.activations[layerIndex] = softmax(state.activations[layerIndex]);
  }
}

A quick note on the + 0.5: in a real neural network you’d typically have a learnable bias term per neuron. Here it’s just a small constant offset so more hidden neurons light up (and the visualization looks better).

Backpropagation: Computing Gradients

The backpropagation phase is where the network "learns." In the visualization I treat the output error as:

gradient = activation - target

This mirrors the common softmax + cross-entropy gradient shape (the familiar p - y form) when the target is a one-hot label or a probability distribution. In my case the target is random because the goal is to show the flow, not to actually train on a dataset—so it’s a visualization-friendly error signal rather than a true loss gradient.

For hidden layers, the error is propagated backward through the weights, multiplied by the derivative of ReLU (which is 1 if the activation was positive, 0 otherwise).

Also important: the input layer doesn’t have an activation function here, so I skip computing gradients for it.

function calculateGradients(): void {
  // Random "target" distribution for visualization
  const rawTarget = neuronsByLayer[layerCount - 1].map(() => Math.random());
  const targetSum = rawTarget.reduce((acc, value) => acc + value, 0);
  state.target =
    targetSum > 0 ? rawTarget.map((value) => value / targetSum) : rawTarget;

  // Output layer error
  state.gradients[layerCount - 1] = state.activations[layerCount - 1].map(
    (a, i) => a - state.target[i],
  );

  // Backpropagate through hidden layers (skip input layer l = 0)
  for (let l = layerCount - 2; l >= 1; l--) {
    state.gradients[l] = neuronsByLayer[l].map((neuron, j) => {
      let error = 0;
      neuronsByLayer[l + 1].forEach((nextNeuron, k) => {
        const w = connectionWeightByKey.get(
          `${neuron.index}->${nextNeuron.index}`,
        );
        if (w !== undefined) {
          error += state.gradients[l + 1][k] * w;
        }
      });

      // ReLU derivative
      const reluDerivative = state.activations[l][j] > 0 ? 1 : 0;
      return error * reluDerivative;
    });
  }

  // Keep input gradients at 0
  state.gradients[0] = neuronsByLayer[0].map(() => 0);
}

Weight Updates (Visualized)

I also visualize a "weight update" signal without actually mutating weights each cycle. For each connection, I compute a weight-gradient-like value:

dW ≈ error_at_to * activation_at_from

This is enough to color/flash the connections in a way that looks like learning.

function getWeightGradient(connectionIndex: number): number {
  const conn = connections[connectionIndex];

  const toNeuron = neurons[conn.to];
  const fromNeuron = neurons[conn.from];

  const toLayer = toNeuron.layer;
  const fromLayer = fromNeuron.layer;

  const indexInToLayer = neuronsByLayer[toLayer].findIndex(
    (n) => n.index === toNeuron.index,
  );
  const indexInFromLayer = neuronsByLayer[fromLayer].findIndex(
    (n) => n.index === fromNeuron.index,
  );

  const error = state.gradients[toLayer]?.[indexInToLayer] ?? 0;
  const activation = state.activations[fromLayer]?.[indexInFromLayer] ?? 0;

  return error * activation;
}

If I ever want to turn this into a tiny “real trainer”, the missing piece is just applying weight -= lr * dW.

Visual Representation with Sparks

The sparks traveling along connections represent signal flow. Their color and size encode the weight magnitude:

export function getSparkColor(
  weight: number,
  activation: number = 1,
): THREE.Color {
  const absWeight = Math.abs(weight);

  const negativeColor = new THREE.Color(0x0891b2); // Cyan - negative weights
  const neutralColor = new THREE.Color(0xeab308); // Yellow - neutral
  const positiveColor = new THREE.Color(0xea580c); // Orange - positive weights

  let baseColor: THREE.Color;
  if (weight >= 0) {
    baseColor = neutralColor.clone().lerp(positiveColor, absWeight * 1.2);
  } else {
    baseColor = neutralColor.clone().lerp(negativeColor, absWeight * 1.2);
  }

  const brightBoost = 2.5 + activation * 2.0;
  baseColor.multiplyScalar(brightBoost);

  return baseColor;
}

During backpropagation, sparks travel in reverse with a different color palette (teal → pink → orange) based on gradient magnitude:

export function getBackpropSparkColor(
  gradient: number,
  intensity: number = 1,
): THREE.Color {
  const absGradient = Math.abs(gradient);

  const weakError = new THREE.Color(0x14b8a6); // Teal - weak gradient
  const midError = new THREE.Color(0xec4899); // Pink - medium gradient
  const strongError = new THREE.Color(0xf97316); // Orange - strong gradient

  let baseColor: THREE.Color;
  if (absGradient < 0.5) {
    baseColor = weakError.clone().lerp(midError, absGradient * 2);
  } else {
    baseColor = midError.clone().lerp(strongError, (absGradient - 0.5) * 2);
  }

  return baseColor;
}

Performance Considerations

The visualization uses Three.js InstancedMesh for neurons and a particle system for sparks. This lets the GPU handle thousands of objects efficiently. Post-processing with UnrealBloomPass adds the glow effect, with different settings for light and dark modes.

The biggest CPU-side optimization is avoiding repeated connection scans: forward pass and backprop both use a precomputed (from → to) weight lookup, so as the network grows, the animation stays smooth.

The entire animation runs at 60fps on most devices, making the math behind neural networks tangible and interactive.