Last week we had a “timed essay writing” practice with my intermediate level students which went really very bad.
We usually write essays in a process which involves multiple drafting and on-going feedback from the teacher and peers. After reading and listening to some input materials that would give my students some ideas for their outlines, we write in class and at times if they can’t finish their writing within due time, they also work at home. Occasionally we have timed writing as well. However this time only two of them were able to finish their writing within the given time frame (70 minutes as in their exam) and I thought “Well, they couldn’t do it because they didn’t want to…because they are not under exam conditions and they are not motivating themselves…etc.” But I have to admit that there were many statements regarding the difficulty of the topic which was gender inequality. Following this experience, last week in our “Language Assessment” course, we focused on assessing writing. Our discussions and Sara Cushing Weigle’s book entitled “Assessing Writing” helped me to view issues related to writing assessment under a different light. Here come the highlights…
Designing writing assessment tasks
According to Weigle (2002) development process for a test of writing involves certain stages such as 1) design, 2) operationalization and 3) administration. I would like to summarize points to consider that are suggested by Weigle (2002, p.78-82) at different stages to avoid potential problems with the test at a later test in the table below.
When Ece (a very dear classmate and a friend) said; “The stimulus material should be picked with respect to the construct definition of writing. Choosing a textual, a pictorial or a personal experience as a prompt in writing tasks should be in accordance with the construct definition and test takers’ characteristics” it rang a loud bell in my mind, explaining the inefficient timed writing experience I told you about at the beginning.
I have to admit that I may have overlooked some of the points listed above. For instance, as a teacher when I give my students a writing task to assess their language abilities I often skip pre-testing the items/ the writing prompt. But I have taken my lesson and you will see that in the coming Metamorphosis section
Importance of having test specifications
Test specifications are blueprints/ guidelines that give brief information about the tests so that when a group of educators have that in their hands, they can design assessment tasks that would be standard in assessing the constructs. Also test specifications provide a means for evaluating the finished test and its authenticity. There are many suggested formats for specifications but according to Douglas (2000 cited in Weigle) at a minimum they should contain:
- A description of test content (how the test is organised, description of the number and type of test tasks, time given to each task, & description of items)
- The criteria for correctness
- Sample task items
I should also say that Weigle provided a particular format of test specifications in her book that was originally developed by Popham (1978) which entail detailed description and examples of test specifications that could help development of writing tests (2002, p.84-85).
Grading the writing papers
Weigle defines “score in a writing assessment” as the outcome of an interaction between test takers, the test/ the prompt or task, the written outcome, the rater(s), and the rating scale. She categorised three types of scales based on whether the scale is intended to be specific to a single writing task (primary trait score) generalized to a class of tasks (holistic or analytic scores) and whether a single score (primary trait or holistic) or multiple scores (analytic) are given to each written outcome.
In addition we discussed about advantages and disadvantages of using holistic and analytic scales in our class meeting and it was an interesting discussion, reflecting real life difficulties that we all encounter as teachers who need to score students’ written outcomes.
Weigle argues that advantages of holistic scales cover 1) faster grading via assigning a single point rather than assigning different points for different aspects of writing, 2) focusing the reader’s attention to the strengths of the writer, rather than deficiencies in the writing, 3) being more authentic and valid than analytic scoring because it reflects the reader’s natural reaction to the text better. On the other hand some disadvantages of holistic scales are that a single assigned score may not provide useful diagnostic information regarding weaknesses in certain parts of writing ability.
Advantages are that it provides useful diagnostic information about students’ writing abilities, higher level of reliability because the criteria is more detailed and comprises of more items. As for the disadvantages it is argued that it takes a longer time to score compared to holistic scoring and raters may read holistically and adjust their scores analytically based on the criteria.
After our Thursday evening classes of ‘Language Assessment’ with Prof. Farhady and classmates (Ece, Volkan, Ece, Merve and Jerry) focusing on writing assessment, I thought about how we deal with this issue at my school.
This is a picture of me and my lovely colleagues just before the writing standardisation session.
Before grading the papers we come together in a standardisation session and go over our criteria. Then, we grade papers together within groups and assign grades and discuss the rationale behind our grading.
Although at times they take time, I really think that standardisation sessions help me because they refresh my understanding of the scoring and criteria and set the scene.
In standardisation sessions we have the opportunity to talk about how raters should arrive at their decisions independently and then compare and discuss their scores, how to treat students who responded to the writing question partially or fully off topic, what to do about memorized and/or incomplete responses.
Metamorphosis: lessons to be taken
Piloting and pre-testing items with a sample group who represent the target group will become my routine in the future.
I will be much more careful about clarity, validity; (“potential of the writing prompt for eliciting written products that span the range of ability of interest among test-takers” (Weigle, 2002, p.90)), reliability of scoring and the potential of the task for being interesting for the test-takers.
While choosing the writing topic (personal or general topic) it’s always a good idea to keep the homogeneity or heterogeneity of the test takers, the test purpose (general or academic writing), test takers’ interests, abilities, and their background knowledge into consideration
In order to sustain fair practice, one of the requirements should be evaluating scoring procedures involving assessing reliability of scores, validity of scoring procedures and evaluating the practicality of scoring procedures. Scoring and issues related to the procedures should be revisited frequently.
I will definitely work on having a user-oriented scoring rubric and familiarising students with these criteria. I really believe that such an understanding will guide them in their writing.
How do you deal with assessing writing at your institution? What’s the students’ reaction to writing test(s)? Please feel free to comment.
Next week we will deal with Assessment in ESP and I am looking forward to our Thursday class with Prof. Farhady …
Reference: Weigle Cushing, S. (2002). Assessing Writing. CUP, Edinburg.