The first IQ tests were developed in 1904 by Alfred Binet and his student Theodore Simon for assessing children with developmental disabilities. When Lewis Terman revised the Binet-Simon test to create the Stanford-Binet test in 1916, he incorporated (from the work of another psychologist) the idea of expressing aptitude as a ratio of the child’s mental age to his or her chronological age multiplied by 100. Thus a child who was ten years old but solved problems that normally could be solved only by children who were fourteen years old had a “ratio IQ” of 140 (14/10)*100. The “Stanford-Binet” test created by Terman thus could yield IQ scores that ranged upward of 200 (that is, children aged six might test at a mental age of twelve or even more). This method of scoring compared younger children with older ones, and became less meaningful in testing older children: was it helpful to say that a fifteen-year-old had a mental age of thirty as opposed to twenty-five? In 1937, working with Maud Merrill, Terman published two separate forms for the Stanford-Binet, form L (for Lewis) and form M (for Maud). This was the last Stanford-Binet to rely on the ratio-based reporting system.[1]

Terman noted that an unexpectedly large number of children received scores of more than 170 but did not see this as a problem. Later authors thought that these scores should be distributed in a pattern that matched the “normal” or “bell” curve so that half the children tested would fall below and half above the point where their mental age and their chronological age were the same (an IQ of 100). Based on this assumption they estimated the likelihood that any given IQ score would be found in the population. Thus, for example, it was estimated that an IQ of 180 would be found less frequently than once in one million children. However, over the years, and as more children were tested, it became evident that these assumptions about the distribution of IQ were incorrect.

In
the 1980s, psychologist James Flynn evaluated test results from older versions
of the Stanford-Binet, Raven, and Wechsler tests, comparing them to newer
versions of these tests taken by the same children. To his surprise he found
that the *average* scores from the older tests had risen steadily over
several decades. The the center of the bell curve was moving slowly to the
right. The average IQ of children around the world was rising.

Second, evidence continued to grow that more children obtained very high IQ scores on the Stanford-Binet test than was predicted by the bell curve. The right end of the curve had a bulge.

Modern aptitude tests do not use a ratio score. Instead, they are
constructed so that children of the same age taking the same test will have
scores that do fall along a bell curve. The scoring is adjusted so that every
15 points (or 16 in some tests) represent a “standard deviation:” a number that
measures the extent of the spread among the scores. The tests are “normed” by
giving them to a large number of children, and the scoring is adjusted until it
fits a bell curve. If ten percent of the children were to obtain a score of
160, the questions would be made more difficult or the scoring would be adjusted
until the numbers fit the curve.[3] On a modern normed test with a standard
deviation of 15, half of all children score at or below 100 and approximately
two-thirds score between 85 and 115*. *

There are problems with these tests from the point of view of parents of gifted children. First of all, the norming procedure involves many children, but it does not involve tens of millions. If, say, a test is given to ten thousand children for norming purposes, the child who is one in a hundred thousand probably won’t be among the population taking the test. Perhaps no child or only one child who is one in ten thousand will be taking the test among the norming group. Thus, when the test is actually administered every day, the child who is one in ten thousand is being compared with only one other child and probably only on one or two questions, whereas the child who is one in two is being compared with thousands of other children who correctly answered more than twenty questions. Thus, the highest scores are estimates that have a higher margin of error than average scores. Unfortunately, often the point where the reliability of the test begins to drop off significantly is right about the same level that is used to establish a “cutoff” for many gifted programs. Commercial test companies are simply not interested in investing millions of dollars to improve a test for the top two percent of children.

Second, the “ratio IQ” may be more meaningful than a “deviation IQ” to families planning instruction. In selecting a textbook or a class it is more helpful to know that a seven-year-old is approximately four years beyond his or her age than to know that s/he is one in four hundred. The new Stanford-Binet V and the WISC 4 provide age equivalents for scores but because of the way they are constructed, the scores along the ends of the curve are much more compressed than the old Stanford-Binet scores and so provide less information. However, achievement tests, particularly tests taken out of level, may be even more helpful for this purpose, specifically individual achievement test and the tests chosen by the national Talent Searches because they cover a broad range of information.

This year (2003/4), two new Wechsler test (WPPSI 3
^{rd} edition and WISC
4^{th} edition) and a new Stanford-Binet test (5^{th} edition)
have been published. The publishers claim that these new tests overcome the
difficulties that were caused by using their more recent predecessors for
testing gifted students. Test publishers also assert that the newest tests
provide more meaningful scores than much earlier versions of the same tests that
were normed or revised more than thirty years ago and are considered outdated by
some psychologists. At this point, not enough information has been offered by
psychologists working with gifted students for us to make a
recommendation. Parents are advised to select practitioners with extensive
experience in working with gifted children and to discuss test choices and test
interpretation carefully with their practitioner.[3]

[1] The next Stanford-Binet form L-M, published in 1960, yielded scores that were adjusted to fall along a normal "bell" curve.

[2] In the case of the Wechsler IV, more time penalty points were added to lower the scores of students who answered all the questions correctly but took longer to answer.

[3] I would like to thank other parents and correspondents for advice and editorial help.

For more information, see Becker, K. A. (2003). *
History of the Stanford-Binet intelligence scales: Content and psychometrics*.

(Stanford-Binet Intelligence Scales, Fifth Edition Assessment Service Bulletin
No. 1). Itasca, IL:

Riverside Publishing. Riverside publishes the Stanford-Binet. See
also "Testing and Assessment" on the Hoagies Gifted website at
http://www.hoagiesgifted.org/testing.htm