Friday, June 28, 2019

IEA IRC 2019: Day 2

The morning plenary at day 2 of the IEA IRC was Aaron Benavot's talk "How can IEA make a difference in measuring and monitoring learning in the 2030 agenda for sustainable development?" He is a former director of UNESCO's Global Education Monitoring (GEM) Report. GEM Reports, published yearly, previously monitored progress on the 6 Education for All Goals, now they monitor educational targets in the 2030 Agenda for Sustainable Development. 

He discussed the history behind the merging of several processes into the Sustainable Development Framework with 17 goals, 169 targets, 230 indicators. He stressed how this is the most aspirational and comprehensive international education agenda ever. This comprehensive agenda reinvigorate earlier debates on how to measure and monitor learning. The countires are supposed to have voluntary national reviews, and there will be an elaborate indicator framework with different indicators and measures - at least one global indicator per target, a number of thematic indicators (globally comparable indicators), regional indicators and national indicators. For instance, the target 4.1 talkes about relevant and effective learning outcomes, while the global indicator narrows this down to reading and mathematics. However, the measuring on the global indicators is to be done in close cooperation with each state.

He stressed how different ways of measuring gives very different results. For instance, the traditional way of measuring literacy is by census data, where often the leader of the household is asked who in the household is literate. Now, a few countries are moving towards testing - for instance asking people to read a sentence for the census taker. This reduces the literacy estimates. He gave examples of how IEA might help in developing ways of measuring.

He also discussed how the international assessments are increasingly supplemented by regional and national assessments; more than 150 countries have performed national assessments since 2000.

There was a discussion after the talk about the country-led nature of the reviews and measuring. A researcher from South Africa stressed the importance of each country being able to determine what are the education priorities in their context. If South Africa cannot itself decide but has to adopt measures from western "North" countries, that would not be suited to the local context.

Again, I decided to skip the panel (which was on PIRLS and not part of my research interest presently).

After lunch, there was a Norwegian symposium on the TIMSS. There were three presentations from the University of Oslo research team (CEMO: Centre for Educational Measurement). The first was Rolf Vegar Olsen and Sigrid Blömeke: "Predicting change in mathematics achievement in Norway over time". He started out by pointing out the dramatic fall in Norwegian TIMSS results from 1995 to 2003 (comparable to twice the difference between 8th and 9th grade in 2015), followed by an increase from 2015.
The method used for this paper was the Oaxaca-Blinder Decomposition, a method for studying the mean differences between two groups - basically looking at both the constant and the slope of the regression lines for the two groups. (Actually, they used a threefold OBD, with "endowments", "coefficients" and "interaction" terms.) They wanted to include predictors which had changed in the period used. However, fairly little of the change in score could be explained by the included predictors. (A lot of possible predictors had to be excluded because the questions were different in 2003 and 2015.)

The second talk was Trude Nilsen, Julius Björnsson and Rolf Vegar Olsen: "Has equity changed in Norway over the last decades?" First, they discussed the definition of equity: it could be defined as lack of achievement differences between schools, a small SES effect on achievement or as a low proportion of pupils getting low scores. They looked at all cycles of TIMSS and PISA. While many measures had changed over time, "Number of books at home" had kept stable in both TIMSS and PISA. The findings was that the total variance had decreased over time (which may, however, be because the proportion of high performing students have decreased), while on the school level there has been different developments in the different studies. The variance explained by SES has increased over time. The main problem however is the lack of stability in the SES measures. A solution could be to combine ILSA (international large-scale assessments) with register data (but that could be controversial for privacy reasons).

The third talk of this symposium was Hege Kaarstein and Trude Nilsen: "Twenty years of science motivation mirrored through TIMSS: Examples of Norway". Their goal was to look at the development of science motivation. Methodically, every TIMSS study was compared to the 1995, in addition comparisons between 4th and 8th grade and between girls and boys in all cycles, was planned. They studied intrinsic motivation, self-concept and extrinsic motivation (the third one not measured in 4th grade).  It is an important point that it must be checked (within the means available) that questions are understood similarly over time, but the details of the scalar measurement invariance (MI) I am not able to repeat. The results of the study were mixed, but the motivation seem to have increased. Self-concept has the highest correlation with performance, but the self-concept did not increase significantly in 8th grade. (However, Norwegian students already reported very high self-concept from the beginning.)

Jan-Eric Gustafsson was the discussant, who picked up on the difficulties of looking at change over time, and asked how the ILSAs could be improved to make it easier to study change over time. He also pointed out that many of the independent variables used here are prone to large errors in measurement (as they are self-reported by students), which can lead to regression coeffisients being underestimated. He also pointed out that the PISA scales vary in reliability from year to year, while TIMSS scales have higher reliability. He also noted that "number of books at home" is shown to be working differently in different countries, so it may also be assumed to be working differently over time in one country. (It was actually pointed out in the plenary on Friday that the proportion of pupils with many books in the home, is decreasing in rich countries.) Also, he criticised cutting out lots of the comparisons based on the MIs, as the MI analyses has so high power that they detect substantially insignificant differences. (A very interesting point although he himself admitted that including these comparisons may make the paper impossible to publish.)

He also provided a fun example of problems of measuring: when a new grade scale was introduced in Sweden, confusion followed as teachers did not use the new scale consistently. This lead to increased variance in the grades (and less correlation with the underlying competence of pupils, I guess), leading to a decrease in the SES effect. (Of course, if grades are more randomly assigned, all correlations between grades and other variables will decrease.)

Then, there was the last session of the second day. The first talk was Samo Varsik: "Differences in students' and teachers' characteristics between high and low performing classes in Slovakia". He used PIRLS 2016 4th grade data from the Slovak republic. The methodological approach is based on similar research done in Czech Republic. He first showed how SES has a huge impact in Slovakia. But he also looked for differences in teaching methods between high- and low-performing classes, but found very few significant correllations. The only two significant differences were connected to high-performing classes being tested more often and being more often asked to summarize the main ideas. The second part of his work was regression models, showing for instance that students' confidence n reading is, not so surprisingly, correllated with performance, also when controlling for SES, gender and so on. However, he did not find significant results regarding teachers' characteristics. (Other than this, I did not manage to write down so much of his results.) At the end, he noted that an important limitation of his method is the "Modern Teaching Methods" variable, based on a few self-reported questions.

The second paper of the session was supposed to be Bieke De Fraine et al: "Reading comprehension growth from PIRLS Grade 4 to Grade 6", but this was cancelled.

The third paper was Marie Wiberg and Ewa Rolfsman: "Nordic students' achievement and school effectiveness in TIMSS 2015". (Nordic = Sweden & Norway). They looked at student variables (sex, native father (NF) and number of books) and school variables (student behaviour, urban school location, school climate (teacher, student, parents), aggregated SES, aggregated NF, general resources at school and resources in mathematics) and used linear regression. They included the concept of "effective schools" based on them having better results than expected based on background data. Students's background was important everywhere. In Norway, school location and school climate was significant, while in Sweden only school climate was significant. For future research, the possibility to connect to register data will make possible other analyses.

The final paper of the day was André Rognes' talk on "Birth month and mathematics performance relationships in Norway", written with Annette Hessen Bjerke, Elisabeta Eriksen and myself. Of course, I knew the paper quite well in advance: the main point is that the Relative Age Effect (RAE) is statistically significant in all content and cognitive domains of mathematics in 4th, 5th and 8th grade. There were no statistically significant RAE in 9th grade. We also tested whether there was a significant difference in RAE between 4th and 5th grade and between 8th and 9th grade - there was not.

We did get a question about whether we had looked at SES in our research. We had not. It is unlikely that birth month can be predicted by SES (and a colleague actually pointed out that he had checked it. Whether the RAE is larger in some SES groups than others, is another question that it would be interesting to investigate. (Although I fear the Norwegian data alone would not provide enough power to find out. Even with more than 4000 students in each cohort, the number of students per month gets quite small if dividing into different SES groups.)

That was the end of the second day. For me, this day was more aligned with my research interests than the first, so I was happy about it.

No comments:

Post a Comment