What IQ Actually Measures

If you took an IQ test today and a stranger born in 1920 took the same test, scored against the same norms, that stranger would on average land roughly thirty points below you. Thirty points is enormous. It is the gap that separates the statistically average person from someone in the bottom few percent. Take this at face value and our great-grandparents look like they were struggling on the edge of disability, which is absurd, because they built power grids, won wars, wrote symphonies, and split the atom. Something is clearly wrong with the literal reading, and the puzzle of what is wrong turns out to be one of the most revealing questions in all of psychology.

The pattern is real and it is stubborn. Plot the mean score on intelligence tests across the twentieth century in any industrialized country, and the line climbs by roughly three points every decade, with no sign of slowing for most of that period. The scores went up; the people did not obviously change. To understand why, and to understand what an IQ score is actually telling you, we have to look carefully at what these tests measure, what that measurement predicts, and the long list of things it quietly leaves out.

A Number Built From a Statistical Coincidence

Psychologists have never agreed on a clean definition of intelligence. Ask ten experts and you will get answers ranging from "the ability to reason abstractly" to "the capacity to adapt to one's environment," none of them fully satisfying. Rather than wait for the philosophers to settle the matter, cognitive psychology took a pragmatic route and chose to operationalize intelligence: to define it, for working purposes, as whatever a well-constructed test of mental ability reliably measures. That move sounds like a dodge, and in part it is, but it rests on a genuine and surprising empirical discovery.

In 1904 the British psychologist Charles Spearman noticed something that did not have to be true. When he gave people a battery of unrelated mental tasks, vocabulary, arithmetic, pattern completion, memory, the scores all tended to correlate positively. People who did well on one tended to do well on the others, and those who struggled on one tended to struggle across the board. There was no obvious reason a knack for word definitions should travel with a knack for spotting visual patterns, yet it did. Spearman proposed that a single underlying factor was leaking into every task, and he named it g, for general intelligence. The discovery of g became the workhorse of the entire field, and it remains the most replicated finding in the study of human cognition.

The Layered Architecture of Mental Ability

Modern intelligence research does not treat g as the whole story, because that would flatten a structure that is plainly layered. Instead, the dominant picture is hierarchical. At the apex sits g, the general factor that touches everything. Beneath it sit a small number of broad abilities, the most important distinction being between fluid and crystallized intelligence. Fluid intelligence is the capacity to reason, to see novel patterns, and to solve unfamiliar problems without relying on prior knowledge; it is what an abstract-reasoning puzzle is built to tap. Crystallized intelligence is the accumulated stock of knowledge, vocabulary, and learned procedures that a lifetime in a culture deposits in you. Below these broad factors lie more specific abilities, verbal, spatial, mathematical, and processing speed, each measurable on its own.

The two broad factors age very differently, and the contrast is one of the more humane findings in the field. Fluid intelligence tends to peak in young adulthood and then drifts gently downward across the decades, which is why raw problem-solving speed often feels sharpest in one's twenties. Crystallized intelligence does the opposite, continuing to grow well into later life as knowledge and experience accumulate. The older expert who solves a problem more slowly but more wisely than a quick young rival is not a sentimental cliché; it is roughly what the structure of cognitive aging predicts.

Pinning a Score to the Population

An IQ number means nothing in isolation, because it is not a count of anything. It is a position. Modern tests are standardized against population norms, which means a large representative sample is tested first, and an individual's raw performance is then translated into where it falls within that distribution. The convention sets the population mean at 100 and the standard deviation, the typical spread of scores around that mean, at 15. By construction, then, the average person scores 100, and most adults, about two-thirds of them, fall between 85 and 115, within one standard deviation of the center.

The further out you go, the rarer scores become, very quickly. A score above 145 or below 55 sits three standard deviations from the mean and is extremely rare, found in only a fraction of a percent of the population. This is worth keeping in mind whenever someone quotes a dramatic three-figure IQ, because the bell curve makes extreme scores far scarcer than casual conversation implies. The score is genuinely a ranking against everyone else, which is precisely why the norms have to be periodically restandardized, and which is precisely how the twentieth-century puzzle becomes visible at all.

Why the Scores Kept Climbing

Return now to the thirty points. Because each generation's scores are anchored to the previous norms before retesting, researchers could see something extraordinary: across most of the twentieth century, mean scores rose by roughly three points per decade, a trend now named the Flynn effect after the political scientist James Flynn, who documented it most thoroughly. The gains were not uniform. They were strongest on tests of abstract reasoning, the fluid-intelligence puzzles that ask you to find patterns in shapes you have never seen, and weaker on tests of accumulated knowledge like vocabulary and arithmetic.

That uneven pattern is the key to dissolving the absurdity. Our great-grandparents were not cognitively impaired; they simply lived in a world that asked far less of the abstract, classify-everything-into-categories style of thinking that these tests reward. The explanations remain genuinely contested, and honesty requires admitting no single cause has won. Candidates include better childhood nutrition, dramatically expanded schooling, the spread of cognitively demanding work, smaller families with more adult attention per child, and a modern environment saturated with abstract symbols and puzzle-like media. Flynn himself argued that modern life trained people to put on what he called scientific spectacles, to treat the world in terms of abstract categories and hypotheticals, exactly the habit those tests reward. The effect is a powerful reminder that a population's average can move enormously within a few generations without any change in the underlying genes.

The Rivals, and Why g Keeps Winning

The hierarchical, g-centered model has prominent challengers, and they are worth taking seriously, if also worth weighing honestly. Howard Gardner's theory of multiple intelligences, which proposes that there are separate and largely independent intelligences, musical, bodily-kinesthetic, interpersonal, and several more, has been enormously popular in education, where it offers an appealing message that everyone is smart in their own way. Its empirical support, however, is weak. When researchers actually measure these supposed independent abilities and run the numbers, factor analyses keep finding the same strong general factor reasserting itself; the abilities correlate rather than standing apart. Gardner's framework functions better as a humane teaching philosophy than as a validated model of mental structure.

Robert Sternberg's triarchic theory has fared somewhat better. It distinguishes analytical intelligence, the kind academic tests measure, from practical intelligence, the street-smart capacity to navigate real-world problems, and creative intelligence. The distinction between practical and analytical ability has accumulated meaningfully more empirical backing than Gardner's scheme, capturing something real about people who reason poorly on paper yet thrive in messy situations, or the reverse. Even so, no rival has displaced g, because the stubborn positive correlations Spearman found in 1904 keep showing up no matter how the tests are sliced.

What Heritability Does and Does Not Tell Us

Few statistics in psychology are as routinely misunderstood as the heritability of IQ, so it is worth slowing down. Behavioral-genetic studies, drawing on twins, adoptees, and families, estimate the heritability of IQ at somewhere between 50 and 80 percent in adults, and notably lower in young children, where shared family environment matters more. The number rising with age, counterintuitively, reflects people increasingly selecting and shaping environments that fit their dispositions as they grow up.

Here is the crucial part. Heritability is a population-level statistic about the sources of variance, that is, about why people within a group differ from one another. It is not an individual-level statement about what caused any one person's intelligence, and it carries no fixed implication that a trait is unchangeable. A heritability of 70 percent does not mean 70 percent of your intelligence came from your genes and 30 percent from your upbringing; that sentence is meaningless. It means that, within the studied population and its particular range of environments, about 70 percent of the differences among people trace to genetic differences.

The misuse that matters most concerns differences between populations, and the logic here is decisive. A trait can be highly heritable within each of two groups while the average gap between those groups is entirely environmental. The geneticist Richard Lewontin made the point unforgettable in 1970 with a thought experiment: take genetically varied seeds, split them, and grow one batch in rich soil and the other in poor soil. Within each pot, height differences are purely genetic, so heritability is 100 percent, yet the average difference between the two pots is caused entirely by the soil. The within-group statistic simply does not license any conclusion across groups. Applied to human intelligence, the high heritability of IQ within populations tells us nothing about the causes of average differences between populations, and contemporary research on race-related test-score gaps points firmly toward environmental rather than genetic explanations.

A Real Predictor With Real Limits

None of this would matter if IQ scores predicted nothing, but they do, which is part of why the construct has survived a century of criticism. IQ correlates moderately with academic achievement, with performance across a wide range of occupations, and even with several health and longevity outcomes. By the unglamorous standards of social science, it is one of the most predictively valid measures psychology has ever produced, and pretending otherwise is a form of denial.

Moderate, though, is the operative word, and the honest framing is that IQ is one strong predictor among several rather than a verdict on a life. Conscientiousness, the tendency to be disciplined and reliable, predicts long-term success at least as well in many domains. Social skill, motivation, sheer opportunity, and luck all carry real weight, and none of them shows up on a reasoning test. The point is not to dismiss IQ but to situate it: it captures something important and stable about a person's cognitive capacity, and it leaves vast and consequential territory unmeasured.

That unmeasured territory includes some of the abilities we most admire. Creativity, which leans heavily on divergent thinking, the generation of many varied and original possibilities, is only partly tied to IQ; beyond a moderate threshold the two go their separate ways. Expertise is something else again, built less by raw ability than by accumulated effort, the long apprenticeship of deliberate practice that the psychologist Anders Ericsson studied and that popular writing compressed into the rough slogan of ten thousand hours. And wisdom, the capacity to integrate knowledge, experience, and balanced judgment about how to live, sits largely outside the reach of any reasoning test. A score can tell you something true about how fast and abstractly a mind reasons under timed conditions, but it cannot tell you whether that mind is creative, expert, or wise, and it was never built to.

Key Takeaways

Intelligence resists clean definition, so psychology operationalizes it through standardized tests built around Spearman's 1904 discovery that diverse mental tasks correlate positively, revealing a general factor called g; the modern picture is hierarchical, with g atop broad fluid and crystallized abilities (the former peaking young, the latter growing across life) and specific verbal, spatial, mathematical, and speed factors below, all scored against population norms set to a mean of 100 and a standard deviation of 15, where most adults fall between 85 and 115 and scores beyond 55 or 145 are extremely rare. The roughly three-points-per-decade Flynn effect, strongest on abstract reasoning, shows scores are sensitive to environment even as genes drive much within-population variance, with heritability estimated at 50 to 80 percent in adults; crucially, heritability describes variance within a group, not individual causation, and Lewontin's logic shows it says nothing about between-population gaps, which evidence attributes to environment. Rivals like Gardner's multiple intelligences have weak support against the stubborn general factor while Sternberg's practical-versus-analytical distinction fares better, and although IQ is among psychology's most predictively valid constructs, it is only a moderate, partial predictor, leaving conscientiousness, opportunity, luck, creativity, expertise, and wisdom largely beyond its measure.

Learn more with Mindoria

Bite-sized lessons, spaced repetition, and live PvP trivia battles. Free on Android.

Download Free