As any assessment system, including this one, unfolds, it will contain risks, and knowing what the risks are—you just mentioned one of the most serious risks—is really important. Any and every system of collecting and aggregating data about a child is imperfect.
I remember the first test that I ever took, at the age of seven—you might remember the first test that you ever took. I was called up to the headteacher’s desk to do a reading test. I was in P3. I can remember the last word that I could pronounce and the first and only word that I could not pronounce, when the test stopped. The last word that I could pronounce was “pneumonia”. I had to give the meaning of it, which, frankly, was not bad for a seven-year-old. The first word that I could not pronounce—I still cannot pronounce it—was “phthisis”. It is beyond me why they had a test for a seven-year-old that listed, in successive order, two words about pulmonary wasting diseases; however, I felt that the test was important.
Until 10 years ago, when the governors of my former school sent me class lists from the time when I was at the school, I did not know for sure that the test was used to decide who went into the A stream and who went into the B stream. I had the same class lists for the children at 11 years old, which were almost identical; the evidence from the time shows that only about 2 per cent of children transferred streams between those ages. Then the lists showed which secondary schools they went to: 70 per cent of the A stream went to grammar schools and zero per cent of the B stream did; they went to vocational secondary modern schools. That was all decided at the age of seven. We know that those tests were flawed and that the 11-plus was flawed. When the 11-plus was abolished—or replaced with teachers’ judgment—we found that the results of the selection according to teachers and headteachers’ judgment had more social class bias than the results of an objective test.
The first thing that I will reaffirm is that, if you are looking for a nirvana of the perfectly consistent way of making judgments or doing tests, you will be disappointed. They will all be imperfect to different degrees and in different ways. We should avoid treating teachers’ judgment as individual, autonomous judgment. In the teaching profession, we need collective autonomy, not individual autonomy—we have argued about that here. That means that we might have more autonomy from the bureaucracy but less autonomy from each other. By looking at the ways in which we make judgments together and moderate them, over time, we will create some consistency. The data can help teachers to do that. However, the data will always be imperfect, depending on whether a student is sick on the day, whether they are tired and whether they take the test at the end of the week or at the end of the day rather than at the beginning of the day.
You have outlined the risk associated with the tests, and, for me, the biggest risk is not that what you describe might happen accidentally but that it might happen systemically. The risk is that, if there is undue pressure from the Scottish Government or from local authorities to drive results up in a short period of time, in order to demonstrate success within a period of taking on leadership or before an election, that pressure will and does lead teachers to do strange but utterly predictable things.
If I were cynically advising a school now, I would say that, if it wanted to show improvement in its results over three years, it should, first, introduce a test without any preparation or professional development so that, in the first year, the students would do badly and the school would have an artificial low for its baseline. Following some professional development, everybody would do better in the test, so there would be the appearance of an improvement over time. Secondly, in the first or second year, the school should test all the children early in the year, when they are younger. A couple of years later, it should test them all at the end of the year, when they have had a bit more practice and preparation and have learned a bit more. The school would then get better results over time.
10:30
Across the world, where truly high-stakes tests are used and punitive consequences can follow, such practices go on. Technically, you cannot alter that much, although it is a good thing to allow children to take the test at different times because of things such as student anxiety and unreadiness and the possibility of a dramatic event obviating the validity of the result. You can deal with those imperfections by creating a culture of assessment and improvement in which everybody is genuinely focused on improvement, which includes accepting those moments when they are unsuccessful and they need to identify a different way of moving forward.