Across numerous school districts in the United States, educators have started employing software programs designed to detect the use of artificial intelligence (AI) in student work. These detection tools analyze assignments and provide a likelihood score indicating whether portions of the text were generated by AI. Despite widespread adoption, especially in areas such as Utah, Ohio, and Alabama, recent research and expert assessments reveal that these tools are far from infallible.
One illustrative case involves Ailsa Ostovitz, a junior at Eleanor Roosevelt High School in Maryland. Since the start of this academic year, Ailsa has faced repeated accusations of using AI to complete her schoolwork. In one documented incident from September, a teacher shared with her a screenshot from an AI detection program that showed a 30.76% probability that AI was involved in an assignment describing her favorite music. Ailsa, who has a strong passion for music and authors her own reflections on the subject, was dismayed by the implication.
Feeling misjudged, Ailsa reached out to her teacher through the school's online platform to dispute the AI use allegation and requested that a different detection tool be used for confirmation. Unfortunately, her message went unanswered, and her grade on that assignment was reduced. Her mother, Stephanie Rizk, expressed concern over the teacher's swift conclusion without adequately assessing Ailsa's actual skill level. Rizk later met with the teacher in mid-November and was informed that the communication had not been seen, yet following the meeting, the teacher reportedly no longer believed AI had assisted in Ailsa's work.
The Prince George's County Public Schools district clarified that the AI detection software was implemented independently by the teacher and not provided or paid for by the district. The district further emphasized during staff training sessions that educators are cautioned against relying solely on such tools due to their documented inaccuracies.
Despite these concerns, a survey conducted by the Center for Democracy and Technology indicated that over 40% of teachers in grades 6 through 12 used AI detection tools during the last school year. This trend persists even though many studies underscore substantial reliability issues with these technologies.
Mike Perkins, who researches academic integrity and AI at British University Vietnam, characterizes popular AI detection software such as Turnitin, GPTZero, and Copyleaks as unreliable. His research found frequent misclassifications, including false positives where original student work was flagged as AI-generated and false negatives where AI content escaped detection. Moreover, attempts to mask AI-generated text by manipulating phrasing further reduce detection accuracy.
Financial investments in these tools continue at the district level. For example, Broward County Public Schools in Florida has committed over $550,000 to a three-year contract with Turnitin, a company known for plagiarism detection and, more recently, AI detection features. While Turnitin provides a percentage score reflecting the amount of suspected AI content within a submission, it cautions that scores of 20% or lower may not be dependable.
Sherri Wilson, director of innovative learning with the Broward district, stated that the Turnitin AI feature serves primarily as a resource to foster discussions between teachers and students rather than for grading enforcement. She acknowledged the limitations of the tool's accuracy and emphasized that teachers maintain the final judgment on the authenticity of student work.
Broward County offers International Baccalaureate and Cambridge programs, which require teacher authentication of student assignments before submission for external review. Although neither program mandates AI detection software, the district made the choice to equip teachers with Turnitin to fulfill authentication needs. Wilson underscores that the technology is supportive rather than determinative, aiming to prompt constructive dialogue.
At Shaker Heights High School near Cleveland, language and literature teacher John Grady employs GPTZero, a different AI detection platform, as an initial filter. He reviews all student essays with the software and investigates further if the probability exceeds 50%. Grady complements this with examination of revision histories, such as time spent and edits made, to discern the writing process. When anomalies arise, he engages directly with students to discuss their work. He notes that approximately 75% of flagged cases involving AI use result in student admissions, followed by revisions with reduced credit.
GPTZero's co-founder and CEO Edward Tian echoed the sentiment that the tool should not function as a punitive measure but rather as an informative aid. He stresses that scores below 50% suggest a higher likelihood of human writing, while greater scores signal the need for additional inquiry. He acknowledges the research revealing imperfect reliability but maintains that the intelligence gleaned can help educators monitor classroom dynamics when appropriately contextualized.
Nonetheless, skepticism about AI detection tools is prevalent, particularly regarding potential bias against non-native English speakers. Zi Shi, a Shaker Heights junior whose first language is Mandarin, points out that his writing style has sometimes been mistaken for AI-generated text due to repetitive word choice and limited vocabulary. He recounted an incident where an assignment was flagged by GPTZero, possibly influenced by his use of Grammarly, an AI-powered grammar correction tool, which his teacher confirmed.
Shi advocates viewing AI detectors as early warning systems, akin to smoke alarms, subject to false alarms but still useful for initial detection. He questions the allocation of thousands of dollars to these technologies, suggesting a stronger focus on professional development for educators.
Carrie Cofer, a high school English teacher in Cleveland Metropolitan School District, shares a similar stance. As part of an experiment, she submitted a chapter of her Ph.D. dissertation to GPTZero, which incorrectly identified it as predominantly AI-written. Cofer contributes to formulating her district's AI policies and currently advises against investing in AI detection software, citing the ease with which students might circumvent it through detection software or AI “humanizer” programs designed to evade detection.
She advocates for adaptation of teaching and assessment methodologies to better address the presence of AI in education, rather than relying heavily on detection software that may be circumvented.
Reflecting these challenges, Ailsa Ostovitz now proactively runs all her homework through multiple AI detectors prior to submission. While confident in her original work, she edits sentences flagged by the software to reduce the chance of being mistakenly accused, an effort that adds significant time to her workload. She expresses increased vigilance in defending her authorship to avoid repercussions based on potentially flawed AI detection results.
This account illustrates the broader complexities and concerns surrounding the use of AI detection software in schools. While such tools offer some assistance in identifying AI-generated content, their limitations create risks of false accusations, unfair treatment, and unnecessary burdens on students. These developments prompt ongoing discussions about the appropriate role of technology in educational integrity and the need for thoughtful policies balancing detection with fairness, accuracy, and pedagogical innovation.