The history of artificial neural networks is inseparable from the history of neuroscience. McCulloch and Pitts published their mathematical model of a neuron in 1943, attempting to formalise what they understood about biological neural computation. Rosenblatt's perceptron in 1958 was explicitly inspired by hypotheses about how the brain learns. The connection propagated forward into the deep learning era: the field kept the vocabulary — neurons, layers, weights, activation — and some of the high-level architectural intuitions.

But modern deep learning and biological neural circuits have diverged substantially. The analogy is useful for some purposes and actively misleading for others. Understanding where the parallel holds and where it breaks down is practically relevant if you're designing systems at the intersection of the two fields — and it matters for setting realistic expectations about what current AI can and cannot do.

The Artificial Neuron: What It Actually Is

A unit in a standard deep learning network is mathematically simple. It takes a vector of inputs, computes a weighted sum, adds a bias, and passes the result through a nonlinear activation function. Common activations are ReLU (zero below zero, linear above), sigmoid (squashes to 0–1), and GELU (smooth approximation to ReLU used in transformers). The output is a scalar that feeds forward to the next layer.

The learning mechanism is backpropagation with gradient descent: compute the error at the output, propagate error gradients backward through the network, nudge each weight slightly in the direction that reduces error. Repeat millions of times. This is the engine behind essentially all practical deep learning systems today.

The Biological Neuron: Considerably More Complicated

A real neuron is a cell with a soma (cell body), an axon (output cable), and a dendritic tree (input branches). It receives thousands of synaptic inputs, mostly on its dendrites. Each synapse releases neurotransmitters that cause local changes in ion permeability, producing small electrical potentials — excitatory (EPSPs) or inhibitory (IPSPs) — that propagate toward the soma.

If the integrated voltage at the soma's axon hillock crosses a threshold, the neuron fires an action potential — a brief all-or-nothing spike that travels down the axon to its targets. The rate at which a neuron fires (its firing rate) and the precise timing of individual spikes both carry information, though neuroscience has decades of debate about the relative importance of rate coding vs. temporal coding in different brain regions.

Several properties of this computation have no clean analog in the standard artificial neuron:

Dendritic Computation

Dendrites are not passive wires. They have voltage-gated ion channels that can produce local spikes independent of the soma. Research since the 1990s has shown that individual dendritic branches can perform their own nonlinear computations — essentially making a single neuron a small recurrent network in its own right. Work from the Poirazi and Häusser labs has shown that the computational complexity of a single pyramidal neuron substantially exceeds that of a single artificial unit.

Spike Timing and Temporal Dynamics

Artificial neurons in feedforward networks are static — they map inputs to outputs without memory (ignoring recurrent architectures). Biological neurons are dynamic. The timing of a spike relative to other spikes, the bursting patterns, the after-hyperpolarisation that follows a spike, the adaptation of firing rate during sustained stimulation — all of these are information-carrying features that standard rate-coded artificial neurons discard entirely.

Synaptic Diversity

Artificial networks have synaptic weights that are scalars. Biological synapses are extremely heterogeneous. There are hundreds of receptor subtypes with different kinetics. Synaptic strength is dynamic — short-term potentiation and depression change effective weight on millisecond to second timescales. Long-term potentiation (LTP) and depression (LTD) implement slower learning. Neuromodulatory systems (dopamine, serotonin, acetylcholine, norepinephrine) act as gain controls across large brain regions, creating a context-dependent computation that has no standard analog in deep learning architectures.

Where the Analogy Actually Holds

Despite these differences, the analogy is not empty. Several things transfer well:

Hierarchical Feature Representation

Deep convolutional networks processing visual information learn representations that are strikingly similar to those in primate visual cortex — V1-like orientation detectors in early layers, object-selective responses in deeper layers. This parallelism is real, not coincidental. The statistical structure of the world constrains what efficient visual representations look like, and both biological evolution and gradient descent converged on similar solutions from different directions.

Distributed, Redundant Representations

Both artificial and biological networks represent information in distributed patterns across many units rather than in single dedicated cells. Both show graceful degradation under partial damage. The principle of population coding, developed in neuroscience to describe how groups of neurons collectively represent continuous variables, is directly applicable to understanding why deep networks are robust to individual weight perturbations.

Transfer Learning

Large neural networks pretrained on diverse data can be fine-tuned for specific tasks with relatively small datasets. The biological analog — that general representations learned during development and early experience support rapid specialisation for new tasks — is well supported by neuroscience literature on critical periods, schema formation, and fast mapping in language acquisition.

The Core Divergence

Biological brains learn continuously from a stream of experience, largely without supervision, using mechanisms that remain poorly understood. Deep learning systems are typically trained offline on fixed datasets with explicit supervision or reinforcement. Bridging this gap is one of the central problems in both AI research and computational neuroscience.

Comparison Table

Property Artificial Neuron / Network Biological Neuron / Brain
Basic computation Weighted sum + activation Spatiotemporally integrated dendrite + threshold spike
Output Continuous scalar Binary spikes (timing and rate both informative)
Learning rule Backpropagation (global, offline) Local Hebbian rules + neuromodulation (largely online)
Energy consumption 10s–1000s of watts (GPU) ~20 watts (whole brain)
Sample efficiency Typically requires millions of examples Humans learn many tasks from single examples
Robustness to noise Brittle to adversarial perturbations Robust to noisy, ambiguous input
Continual learning Prone to catastrophic forgetting Integrates new knowledge while preserving old
Architecture Fixed at training time Continuously rewired throughout life

Why This Matters for System Design

If you're building systems that process biosignals — EEG, EMG, neural spike trains — the mismatch between artificial and biological computation shows up in specific practical ways. Standard deep learning architectures expect fixed-length, uniformly sampled input tensors. Neural signals are spike trains — sparse, asynchronous events with irregular timing. You can convert spikes to rate codes and feed them to a conventional network, but you lose the temporal precision that may carry the information you care about.

Spiking neural networks (SNNs), which process explicit spike trains and propagate information through spike timing, are architecturally closer to biology. They're also harder to train — backpropagation through discrete spike events requires approximations — and current SNN performance on most benchmarks still lags behind conventional deep networks. But for edge-deployed neural decoding with strict power budgets, the energy efficiency advantages of SNNs (which only compute when spikes arrive, not continuously) are real and significant.

The neuromorphic hardware path — Intel Loihi, SpiNNaker, BrainScaleS — is designed to run SNNs efficiently. These are still research and early-industrial tools, not production infrastructure. But for applications where you need to process neural signals at very low power — wearable monitors, implanted devices — they represent a fundamentally different compute paradigm than GPU-based deep learning, and one more naturally matched to the signal domain.

What the Gap Means for AI Research

The properties where biological computation outperforms artificial networks — energy efficiency, sample efficiency, continual learning, robustness — are precisely the properties that matter for deploying AI in real-world settings with limited data, limited power, and ongoing distribution shift. This is not coincidental. The brain is the product of enormous selective pressure to solve exactly these problems.

Whether neuroscience will continue to provide the key insights that push AI forward, or whether scale and data will close the gaps through empirical engineering, is genuinely contested. My personal view is that the biological mechanisms for memory consolidation, sparse representation, and neuromodulation contain engineering principles that the AI field hasn't yet extracted — but extracting them requires much closer collaboration between the two communities than currently exists.

← Back to Blog