I’ve been a bit swamped over the course of the semester and unfortunately haven’t made the time to write regularly. There were lots of factors converging, and nothing negative, so I accepted that it might be one of the things to slip. This is something I will adjust for semester two.

I’ve written in the past about my reassessment systems and use of WeinbergCloud to manage them. I knew something had to change and thought a lot about what I was going to do to make my system more reasonable, something the old system was not.

At the beginning of the year, I sat down and started to reprogram the site...and then stopped. As much as I enjoyed the process of tweaking its features and solving problems that arose with its use, it was not where I wanted to spend my time. I also knew that I was going to teach a course with a colleague who also was planning to do reassessment, but I was not ready to build my system to manage multiple teachers.

I made an executive decision and stepped away from the WeinbergCloud project. It served me well, but it was time to come up with a different solution. We use Google for Education at my school, and the students are well versed in the use of calendars for school events. I decided to make this the main platform for all sorts of reasons. By putting my full class and meeting schedule into Google calendar, it meant that I could schedule student reassessments by actually seeing what my schedule looked like on a given week. Students last year would sign up to reassess at times when I had lunch duty or an after school meeting because my site didn’t have any way to block out times. This was a major improvement.

I also limited students to one reassessment per week. They needed to email me before the beginning of any given week and tell me what standard they wanted to reassess over. I would then send them an invite to a time they would show up to do their reassessment. This improved both student preparation and my ability to plan ahead for reassessments knowing what my schedule looked like for the day. Students liked it up until the final week of the semester, when they really wanted to reassess multiple times. I think this is a feature, not a bug, and will incentivize planning ahead.

I recorded student reassessments in PowerSchool in the comment tab. Grades with comments appear with a small flag next to them. This meant I could scan across horizontally to see what an individual student had reassessed on. I could also look vertically to see which standards were being assessed most frequently. The visual record was much more effective for qualitative views of the system than what I had previously with WeinbergCloud.

The system above was for my IB and AP classes. For Algebra 2 (for which I teach two sections and share with the other teacher) we had a simpler system. Students would be quizzed on standards, usually two at a time. Exams would be reassessments on all of the standards. Students would then have a third opportunity to be quizzed on up to three of the standards of each unit later in the semester. Students that had less than an 8 were required to reassess. This system worked well for the most part. Some students thought that the type of questions between the quiz and exam were different enough that they were not equivalent assessments of the standards. My colleague and I spent a lot of time talking through the questions, identifying the types of mistakes on individual questions that were indicators of 6 versus 8 versus 10, and also unifying the feedback we gave students after assessments. The system isn’t perfect, but students also were all given up to three opportunities to be assessed on every standard. This equity is not something that I’ve had happen before in my previous manifestations of SBG.

On the whole, both flavors of reassessment systems were much more reasonable and manageable, and I think they are here to stay. I’ll spend some time during the winter break thinking about what tweaks might be needed, if any, for the second half of the year.

Overview

I've used Standards Based Grading, or SBG, with most of my classes for the past five years. It transformed the way I think about planning, assessment, classroom activities...and pretty much everything else around my teaching practice. I have a difficult time imagining what would happen if I had to go back. I've written a lot about it this year - here are some of the posts:

As I wrote in that last post, I still wrestle with the details. I'm fully invested in the philosophy though. I am glad to have my administrators supportive in having me adapt it to work within the more traditional system. I've also had some great conversations with colleagues who are excited by the concept, but that wonder how to make it work in their courses.

Here's the rundown of how it went this year.

What worked:

Students really bought into the system. The most common responses on student surveys on what I needed to keep involved the grade being defined by standards and the reassessment system. I found students were often the system's best advocates when other teachers and parents had questions, which made communication much easier.

The system was the gateway to many very positive conversations with students around learning, improvement, and the role of feedback. Conversations were around understanding concepts and applying them, not asking for points. Many students would finish a reassessment and tell me that they their grade should stay the same, but that they would keep trying. Other students would try and argue their way to a higher score, but by using the vocabulary I use to define my standard descriptors (linked here). They understood that mistakes are informative, not punitive. Transplanting this understanding to students in my new school was a major success of the year.

I developed a better understanding of what I'm looking for at each level on my 5 - 10 scale. Part of this came from being at a new school and needing to articulate this to students, parents, and administrators. The SBG and Leveling up project (linked above) helped refine my definitions of what distinguishes a 9 from a 10, or a 6 from a 7.

What needs work:

I had way too many reassessments. Full stop. I wrote about this in my post Too Many Reassessments, Just in Time for Summer and am exhausted just thinking about doing it again. There are a couple elements of this to unpack. One is that my credit system allows for reassessments to occur more frequently than I believe deep learning can really take place. I'm thinking about making it so students are locked out of reassessing on a standard for a set period of time, at least when going for a score of 8 or above where the goal is transfer of skills and flexibility of application. The other thing I am considering is limiting students to a single reassessment per week, or day, or some other interval. I have some time to decide on this, which is good, because both require a rewrite of my online signup tool, WeinbergCloud.

Long term retention was still not where it needs to be. I wrote about this already in my post about my IB Mathematics Year 1 course. As I have taught more and more in this system, I have believed ever more strongly that clear communication about what grades signify about a student matters. A lot. Moving from quarters to semester grades is one part of improving this, a change that my administrator team made for this coming year, but a lot of it still sits with me. I need to spiral, I need to reassess on old standards, and still hold students accountable for older material.

Communicating the role of semester exams was a major challenge for me this year. In a small school, I found it was easy to communicate with individual students and parents about the role of semester exams. I based much of my outreach on what I understood about these exams and the role of learning standards grades throughout the year. A standards based grade book breaks down the entire topic into bite sized pieces, which makes it easier both to communicate strengths and weaknesses, and for students and teachers to decide what is the best next step. Semester exams are opportunities to put all of these pieces together and assess a students's ability to decide which standards apply in a given problem. Another way of looking at it is a soccer practice versus a soccer game mentality.

Ultimately, I do want students to be successful across the breath of the content on which a course is based. Semester exams serve as one way to measure that progress in the bigger picture of an entire course, rather than a unit. This also serves as a third scale on which to consider assessment in my course. Quizzes assess a standard, exams assess a unit of standards (with a few older standards thrown in), and semester exams assess mastery of a portion of the course. That different scale is why the 80% quarter grade, 20% exam grade proportions that I've followed for seven years is entirely reasonable.

A student that aces all of the standards with a 100 but gets a 50 on the final ends up with a 90. This student receives with the same semester grade as someone that has a 90 up until the final, and gets a 90 on the final. I'm fine with this parity in grades. I would have very different conversations with those two students before the next semester of mathematics in their plans.

The main challenge I found was that students and parents often looked at that final exam grade in isolation from, not together with, the rest of the scores in the grade book. The parent of the first student (100 than 50) that asks me to explain that disparity is certainly justified in doing so. Where I fell short was communicating the reality that in a standards based system, grades usually drop after a semester exam. It's a fundamentally different brand of assessment.

I'll also point out that the report card presented a semester of assessment in table form as quarter 1 grade, quarter 2 grade, exam grade, and then semester grade. This artificially shows the exam grade as perhaps being more consequential to the grade than it actually is. This isn't in my realm of influence, so I'll stop talking about it. The bottom line is that I need to to a better job of communicating these realities to everyone involved.

Conclusion

I'm glad to be starting another year soon and to continue to make this system do good things for students. Cycle forward.

I posted this graph of cumulative reassessments versus the day of the semester on Twitter:

That, my friends, is a reassessment system gone wild. The appropriate title for that image, as one person pointed out, is Too Many Reassessments. The grand total for this semester was 711. There are obvious bunches of reassessments close to the ends of the quarters when the grade book closes.

Here is a histogram of the reassessment data for the semester. There is some discrepancy in the total number in the data here, but I haven't figured out exactly where that is.

I committed to transplanting the system I have used in the past to my new school this year, and didn't want to make a full change without seeing how this would play out. This semester I was much more consistent in the types of questions I gave students reassessing, changing grades based on a reassessment, and the choice I offered them for the level of reassessment. Some of this I wrote about at the beginning of the semester.

The most important observation I can make at this point is that this system is not sustainable as is. I cannot make my sign-up and credit system more efficient to manage the volume - that isn't the issue. I'm satisfied with the quality of the questions I give students. I've developed a pretty nice bank of questions that span the spectrum of application, understanding, and transfer. The bigger issue is my capacity to give the sort of feedback I want to give to students throughout the semester. I have many conversations about learning, and many of them are great, but I cannot multiply myself to have as many of those conversations as I want.

Here's a graph of the average learning standards grade for a sample of students compared with the number of reassessments:

This doesn't support the expectation that more reassessments implies a higher grade. Students are not necessarily doing machine-gun style reassessment. They are working on specific skills and show my what they are doing. They are responding in a positive way to my feedback. Credits, which students earn by doing work and review of concepts, are still required for students to reassess. Students are for the most part using their credits. Expiring credits, as much as I thought it would make a difference, is not making that much of a difference for behavior (i.e. signing up for reassessments) or course grade. I need to dig into the data more to be able to explain why.

In terms of moving forward, I have many things to think about.

The past three or four years have been an exercise in exploring a system that centers around student-initiated reassessment. I'm not sure it's time for that to completely go away, but I wonder about shifting my focus to an assessment structure centered on teacher-initiated. I already do this on unit exams, but I wouldn't say it is the focus of where I spend my time.

I wonder if reducing the permitted number of reassessments per student to one per week would improve their effectiveness. This effectiveness increase could be based in higher quality feedback from me, more focused effort on the part of the student for improving understanding on a given learning standard, or something else entirely. This reduces the options for students to learn on their own timeline, which isn't a good thing. While we're being honest though, that exponential curve at the end of the assessment period is all the evidence I need to accept that the timeline is based on the grading-period structure, not learning.

How do I most efficiently help the weak student that reassesses on the same standard multiple times and makes limited progress on each attempt?

How do I give meaningful guidance to the student that aces everything on the first try? How do I get them more involved in finding learning that is meaningful, rather than waiting for me to tell them what to learn?

What do the students think? I've collected all sorts of anecdotal evidence that students appreciate the opportunities to reassess, and not just in a superficial way related to their course grade. I've given students an end of year survey to complete, and those results are rolling in slowly as students complete their final exams.

These are the big picture questions that add one more reason to be thankful that the summer is ahead. Getting back to my main point, I am brought back to the idea that quality feedback is the main way we as teachers add value. This, like many things in education, is not easy to scale. This need for improving and scaling the transfer of feedback is really the only basis for innovation in the ed-tech realm that interests me at all these days. So far, despite the best intentions of many that are trying, machine learning is not the answer yet. Make it easy for me to organize and collect student thinking, respond to that thinking, and give helpful nudges to the resources needed to make progress, and then I'll consider your product.

We're almost at the end of the third quarter over here. Here's the current plot of number of reassessments over time for this semester:

I'm energized though that the students have bought into the system, and that my improved workflow from last semester is making the process manageable. My pile of reassessment papers grows faster than I'd like, but I've also improved the physical process of managing the paperwork.

While I'm battling performance issues on the site now that there's a lot of data moving around on there, the thing I'm more interested is improving participating. Who are the students that aren't reassessing? How do I get them involved? Why aren't they doing so?

There are lots of issues at play here. I'm loving how I've been experimenting a lot lately with new ways of assessing, structuring classes, rethinking the grade book, and just plain trying new activities out on students. I'll do a better job of sharing out in the weeks to come.

...or you can read this quick review of where I've been going with this:

When a student asks to be reassessed on a learning standard, the most important inputs that contribute to the student's new achievement level are the student's previously assessed level, the difficulty of a given reassessment question, and the nature of any errors made during the reassessment.

Machine learning offers a convenient way to find patterns that I might not otherwise notice in these grading patterns.

Rather than design a flow chart that arbitrarily figures out the new grade given these inputs, my idea was to simply take different combinations of these inputs, and use my experience to determine what new grade I would assign. Any patterns that exist there (if there are any) would be determined by the machine learning algorithm.

I trained the neural network methodically. These were the general parameters:

I only did ten or twenty grades at any given time to avoid the effects of fatigue.

I graded in the morning, in the afternoon, before lunch, and after lunch, and also some at night.

I spread this out over a few days to minimize the effects of any one particular day on the training.

When I noticed there weren't many grades at the upper end of the scale, I changed the program to generate instances of just those grades.

The permutation-fanatics among you might be interested in the fact that there are 5*3*2*2*2 = 120 possibilities for numerical combinations. I ended up grading just over 200. Why not just grade every single possibility? Simple - I don't pretend to think I'm really consistent when I'm doing this. That's part of the problem. I want the algorithm to figure out what, on average, I tend to do in a number of different situations.

After training for a while, I was ready to have the network make some predictions. I made a little visualizer to help me see the results:

You can also see this in action by going to the CodePen, clicking on the 'Load Trained Data' button, and playing around with it yourself. There's no limit to the values in the form, so some crazy results can occur.

The thing that makes me happiest about the result is that there's nothing surprising about the results.

Conceptual errors are the most important ones that limit students from making progress from one level to the next. This makes sense. Once a student has made a conceptual error, I generally don't let students increase their proficiency level

Students with low scores that ask for the highest difficulty problems probably shouldn't.

Students that have an 8 can get to a 9 by doing a middle difficulty level problem, but can't get to a 10 in one reassessment without doing the highest difficulty level problem. On the other hand, a student that is a 9 that makes a conceptual error on a middle difficulty problem are brought back to a 7.

When I shared this with students, the thing they seemed most interested to use this to do is decide what sort of problem they want for a given reassessment. Some students with a 6 have come in asking for the simplest level question so they can be guaranteed a rise to a 7 if they answer correctly. A lot of level 8 students want to become a 10 in one go, but often make a conceptual error along the way and are limited to a 9. I clearly have the freedom to classify these different types of errors as I see fit when a student comes to meet with me. When I ask students what they think about having this tool available to them, the response is usually that it's a good way to be fair. I'm pretty happy about that.

I'll continue playing with this. It was an interesting way to analyze my thinking around something that I consider to still be pretty fuzzy, even this long after getting involved with SBG in my classes.

As I've gained experience, I've been able to hone my definitions of what it means to be a six, eight, or ten. Much of what happens when students sign up to do a reassessment is based on applying my experience to evaluating individual students against these definitions. I give a student a problem or two, ask him or her to talk to me about it, and based on the overall interaction, I decide where students are on that scale.

And yet, with all of that experience, I still sometimes fear that I might not be as consistent as I think I am. I've wondered if my mood, fatigue level, the time of day affect my assessment of that level. From a more cynical perspective, I also really really hope that past experiences with a given student, gender, nationality, and other characteristics don't enter into the process. I don't know how I would measure the effect of all of these to confirm these are not significant effects, if they exist at all. I don't think I fully trust myself to be truly unbiased, as well intentioned and unbiased as I might try to be or think I am.

Before the winter break, I came up with a new way to look at the problem. If I can define what demonstrated characteristics should matter for assessing a student's level, and test myself to decide how I would respond to different arrangements of those characteristics, I might have a way to better define this for myself, and more importantly, communicate those to my students.

I determined the following to be the parameters I use to decide where a student is on my scale based on a given reassessment session:

A student's previously assessed level. This is an indicator of past performance. With measurement error and a whole host of other factors affecting the connection between this level and where a student actually is at any given time, I don't think this is necessarily the most important. It is, in reality, information that I use to decide what type of question to give a student, and as such, is usually my starting point.

The difficulty of the question(s). A student that really struggled on the first assessment is not going to get a high level synthesis question. A student at the upper end of the scale is going to get a question that requires transfer and understanding. I think this is probably the most obvious out of the factors I'm listing here.

Conceptual errors made by the student during the reassessment. In the context of the previous two, this is key in whether a student should (or should not) advance. Is a conceptual error in the context of basic skills the same as one of application of those skills? These apply differently at a level six versus a level eight. I know this effect when I see it and feel pretty confident in my ability to identify one or more of these errors.

Arithmetic/Sign errors and Algebraic errors. I consider these separately when I look at a student's work. Using a calculator appropriately to check arithmetic is something students should be able to do. Deciding to do this when calculations don't make sense is a sign of a more skilled student in comparison to one that does not. Observing these errors is routinely something I identify as a barrier to advancement, but not necessarily in decreasing a student's level.

There are, of course, other factors to consider. I decided to settle on the ones mentioned above for the next steps of my winter break project.

I've been quietly complaining recently to myself about how the reassessment sign-up and quiz distribution tool I created (for myself) isn't meeting my needs. Desmos, Peardeck, and the other online tools I use have a pretty impressive responsiveness when it comes to requests for features or queries about bugs, and that's at least partly because they have teams of expert programmers ready to go at any given moment. When you are an ed-tech company of one, there's nobody else to blame.

This is the last week when reassessment is permitted, so the size of the groups I've had for reassessments has been pretty large. Knowing this, I worked hard this past Sunday to update the site's inner workings to be organized for efficiency.

Now I know what my day looks like in terms of which reassessments students have signed up for, what their current grade is, and when they plan to see me during the day:

With one or two reassessments at a time, I got along just fine with a small HTML select box with names that were roughly sorted. Clicking those one at a time and then assigning a given question does not scale well at all. I can now see all of the students that have signed up for a reassessment, and then easily assign a given question to groups of them at a time:

With the user experience more smooth now, I have been able to focus this week on making sure that the questions I assign truly assess what students know. I could not do this without the computer helping me out. It feels great to know things are working at this higher scale, and I'm looking forward to having this in place when we get going again in January.

Given that I use standards based grading with most of my classes, the grades I assign to students change quickly. I'm modifying those scores multiple times a day in some cases in my school's instance of PowerSchool Pro.

What the system currently lacks is an easy way to get that data out. For whatever reason, the only export format is PDF. This makes it difficult to get things into a spreadsheet.

After some hacking around in the console, I was able to put together a script that scrapes a class scoresheet page for the student names and assignment names and stores the result in a variable called exportData. This code is included below, and is also here in a gist. Paste the entire code into the console and run it. Then type in exportData and the scraped data will appear.

You can then copy and paste the resulting string (leaving out the quotes) into Excel, OpenOffice, or Google Sheets and the data will appear there, ready to be spreadsheet-ified.

The only place where this doesn't work perfectly is when there are more students than will fit on the page. As far as I could tell after poking around, the grade data is re-rendered to fit the page as scrolling occurs. I didn't work that hard to see if the data is stored somewhere else on the page, so someone with a bit more insight might be able to improve upon my work.

For those of you that are readers of my blog, you already know that I've become a believer in the power of standards based grading, or SBG. It's amazing looking back at my first post on my decision to commit to it four years ago. Seeing how this system has changed the way I plan my lessons, think about class activities, and interact with students about learning makes me wonder where I would be at this point without it.

I'm now trying to help others see how they might make standards based grading have a similar change to their classrooms. I'm running a one hour workshop this Friday at 1:30 in room C315 to introduce Learning2 attendees to how a teacher might go about this. More important for those considering a change to such a system is the fact that I run my system in a non-standards based PowerSchool environment. Here's the workshop description:

Suppose a student has earned a 75 in your class. How do you describe that student's progress? What has that student learned in your class? Obviously a student with an 85 has done better than the student with a 75, but what exactly has the 85 student achieved that the other student has not? Is it ten percent more understanding? Two more homework assignments during the quarter? Perhaps most importantly, what can the 75 student do to become an 85 student?

Grades are part of our school culture and likely aren't going anywhere soon. We can work to tweak how we generate and communicate the meaning of those grades in a way that better represents what students have actually learned. One approach for doing this is called Standards Based Grading, or SBG.

In this one hour workshop, you will learn about SBG and how it can clarify the meaning of grades, as well as how it can be implemented effectively within a traditional reporting system. You will also learn how a SBG mindset encourages productive changes to the process of planning units, activities, and assessments. We will also discuss the ways such a system can be run in the context of various subject areas.

It's a lot to cover in an hour, but I'm hoping I can nudge a few folks to try this out moving forward.

I'm really excited about the Learning 2.0 conference this year. I first attended back in 2011 in Shanghai and the experience was what prompted me to become active on Twitter and begin blogging back then. I know the next few days will be filled with inspiring conversations and ideas that challenge my thinking and push me to grow as a teacher.

Stay tuned to the blog and to Twitter to see what I'm up to over the weekend.

A quick comment before hitting the hay after another busy day: the reassessment system has hit it big in my new school.

Some facts to share:

In the month since my reassessment sign-up system went up, 87% of my students have done at least one self-initiated reassessment, 69% doing more than one. This is much more usage than my system has had, well, ever.

Last Friday was an all time high number of 53 reassessments over the course of a day. I will not be doing that again, ever.

Students are not hoarding their credits, they are actually using them. I've committed to expiring them if they go unused, and they will all be expired by the end of the quarter, which is essentially tomorrow.

I need to come up with some new systems to manage the volume. I'll likely limit the number of slots available in the morning, at lunch, and after school to encourage them to spread these out throughout the upcoming units instead of waiting, but more needs to be done. This is what I've been hoping for, and I need to capitalize on the enthusiasm students are showing for the system. Now I need to make it so I don't pull all my hair out in the process.