Can AI Pass the Test, Part I: Of Centaurs and Cyborgs
By Lew Ludwig
Last week, as the semester was winding down, I found myself unexpectedly tearing up during a meeting with a student. I was pressing the student on their understanding of a mathematical concept and how they used AI in this process. But I’m getting ahead of myself. Let’s start at the beginning to see why I was having such a visceral reaction to this student. As an educator navigating the evolving landscape of technology, I've faced both unprecedented challenges and surprising breakthroughs. This story isn’t just about the tears or the technology; it’s about the unexpected ways that blending human insight with AI can profoundly impact our understanding and push the boundaries of what we believe is possible in education.
Challenging conventions with the “Cheat-proof” Calculus Test
Readers may recall my reference to my “Cheat-proof” Calculus Test. This is a take-home test I developed where students create their own calc test based on my specifications. Although I created this approach as an alternative to an in-person test during the pandemic's lockdown, it has stayed remarkably robust in the era of generative AI. That is, students cannot simply cut and paste and have AI do the work.
Centaurs vs. cyborgs in AI
Over the last year, two analogies have emerged to describe how one approaches AI: centaur vs. cyborg. Centaurs split tasks between AI and themselves. There is a distinct separation between human and AI responsibilities, like the clear division between the human torso and the horse body of the mythical centaur creature. In the context of my 'Cheat-proof' Calculus Test, this could mean students using AI to handle parts of their test creation but still significantly relying on their own understanding to complete the task. The cyborg approach, on the other hand, blends human and AI capabilities more intimately and seamlessly. Ideally, this would involve students integrating AI to enhance their conceptual understanding and creatively solve complex problems, with the technology and student thought processes deeply intertwined.
Our biggest concern as educators is that students are copying and pasting our prompts into ChatGPT and then copying and pasting the answers back to us. This would be an extreme example of Centaurs - the horse doing most of the work while the human does little. Let’s call this the Lazy Centaur.
I have seen instances of Lazy Centaurs at work in my calculus class semester - students responding to open prompts like “What are three goals you have this semester regarding class participation and community building?” with well-crafted and formatted responses from ChatGPT (one even left the * *Bold* * formatting, copied from the markup language). But to be fair, I have given my students permission to use AI broadly in my course. Clearly, I need to explain my expectations better when using AI.
The true promise of AI in education lies in the cyborg approach. We can create more impactful learning experiences that expect more of our students as they can productively use technology to push their abilities. This is exactly what led to my tearful interchange with my student.
A True Cyborg Moment: John’s Breakthrough
In the 12th week of the semester, we had our third test - an in-class portion and the take-home portion described above. A key aspect of the cheat-proof calculus test is that it grows with the semester. The first test covers limits (20 points), the second covers limits and derivatives (30 points), and the third (40 points) covers the trifecta - limits, derivatives, and integrals. I use video grading to provide feedback. On the second test, there is a clear jump in scores for those who actually watch the video and take it to heart. But this also means I can ratchet up the difficulty to challenge my students, being sure to stay within their zone of proximal development (ZPD). Usually, I have a handful of As, a large chunk of Bs and Cs, and a smattering of Ds and Fs, but never a perfect score.
Needless to say, I was surprised to see John1 earn a perfect score on the third test. He had earned a 32/35 on Test 2. I was curious how he had aced the third, most challenging test. I suspected AI played a role, but as I already discussed, my take-home has been remarkably Centaur proof as demonstrated in my last post.
For the second test, I required students to use AI to help improve the explanations of their solutions on the take-home. This had mixed results, but nothing particularly enlightening. So how did John pull this off?
As I argued in my last post, “the future's successful teachers will be those who prioritize their students' well-being, focusing on nurturing rapport and fostering community.” Luckily, John and I had a very good rapport based on trust and respect. So when I asked how he used AI on the last test, he gladly shared his prompt list with me.
It was stunning! John's prompts revealed a sophisticated dialogue with ChatGPT, spanning over 14,000 words. While some interactions mirrored the lazy centaur approach - create a horizontal asymptote at y=1 or a vertical asymptote at x=-5 - others showcased a true cyborg collaboration, especially in the more complex problems where AI alone wasn't sufficient. For example, creating a question that used the product rule for f(x) · cos x, where the answer exists and can be computed without a calculator. This nuanced use of AI not only helped John excel but also highlighted the potential for 'cyborg' strategies to foster deeper academic engagement and understanding.
The true cyborg moment was the last question. I wanted an example where the limit as x approaches a of f’’(x) existed, but the same limit of f’(x) did not. I’ve seen pairs of PhDs puzzle over this one in workshops, so I knew it was a challenging question. The exchange between John and the chatbot was intriguing. It proposed an example featuring a vertical tangent, but John was uncertain what this meant for the second derivative. He quickly learned that if the first derivative does not exist, then neither does the second derivative2. This insight underscored the depth of understanding required to navigate such intricate mathematical interaction.
No matter how he tried, the ever-wanting-to-please chatbot persistently returned to the vertical tangent example. Realizing the need for alternative approaches, John remembered two other ways that the derivative does not exist: cusps and discontinuities. He then suggested a cusp to the chatbot, and within four more exchanges, they arrived at a suggestion he could apply to his graph.
So, had John beat my “cheat-proof calculus test?” Perhaps. On the one hand, he exposed some of the weaker parts of the test - create a vertical tangent, a horizontal asymptote, etc. - but he also showed that I could expect more from my students with his work on the later part of the test.
Now what? We were headed into a take-home final, which would be much like the first three tests. Should I even bother having John go through the motions or just give him a perfect score? I’ll tackle this question in Part II of this post on Thursday!
1 Not my student’s actual name, but I have their expressed consent to share this story (DU IRB FA23 #22).
2 Please keep in mind dear readers, this is a business calculus class, so we do not have the time to press the finer points of the analysis of calculus.
Lew Ludwig is a professor of mathematics and the Director of the Center for Learning and Teaching at Denison University. An active member of the MAA, he recently served on the project team for the MAA Instructional Practices Guide and was the creator and senior editor of the MAA’s former Teaching Tidbits blog.