Change display time — Currently: Eastern Daylight Time (EDT) (Event time)

The Messy Middle: How AI Writing Tutors Close 6th Grade Achievement Gaps

,
W304CD, Table 4

Roundtable presentation
Research Paper
Save to My Favorites

Session description

Explore recent research from two rigorous quasi-experimental (ESSA Moderate) studies demonstrating how AI writing tutors dramatically reduce 6th grade achievement gaps. Discover evidence-based strategies for implementing personalized writing instruction at scale, with compelling data showing two-thirds gap reduction for students with disabilities and economically disadvantaged learners.

Framework

Integrated Theoretical Perspectives
Integrating an AI-chatbot into instruction is grounded in a multifaceted theoretical framework that combines cognitive, social, and pedagogical perspectives to address persistent challenges in middle school literacy education. Three primary theoretical frameworks emerge from the LXD Research efficacy studies:
1. Strategy Modeling and Guided Practice Framework (Bereiter & Bird, 1985)
At its core, integrating technology into reading instruction is grounded in Bereiter and Bird's (1985) research demonstrating that effective reading comprehension requires both explicit modeling of strategies and structured guided practice. This theoretical perspective informs this study's implementation of:

AI-based reading tools that combine modeling and guided practice
"Smart supports" using graphic organizers to break down complex literacy tasks
Text highlighting capabilities for identifying key passages
Embedded dictionary functions supporting comprehension
AI-powered correction of misunderstandings with contextually appropriate scaffolding

This framework is particularly significant in addressing the persistent literacy challenges faced by middle school students, especially those with foundational skill gaps stemming from pandemic disruptions and other educational barriers.
2. Intelligent Tutoring Systems and Human-AI Collaboration (VanLehn, 2011)
The implementation of AI-supported writing tools heavily from VanLehn's (2011) meta-analysis demonstrating that intelligent tutoring systems can achieve comparable effectiveness to human tutoring (effect sizes of 0.76 vs. 0.79) when they provide step-based interaction and complement rather than replace human instruction.
This theoretical perspective shapes this study's approach as a "co-teaching environment" where:

AI extends teacher capacity to work with students across varied literacy proficiency ranges
Individualized feedback adapts to each student's specific responses
Technology enhances rather than replaces existing classroom routines
The platform integrates seamlessly with curriculum alignment

3. Explicit Strategy Instruction for Writing (Gillespie & Graham, 2014)
The writing instruction component in this study was informed by Gillespie and Graham's (2014) meta-analysis showing that explicit strategy instruction has strong impact on student writing outcomes (effect size = 1.09), particularly when combined with systematic support and scaffolding. An AI-integrated platform can implement this through:

Breaking down the writing process into manageable components
Maintaining high expectations while providing tailored support
Guiding students through planning, drafting, and revision
Offering real-time feedback on student writing
Supporting the revision process with specific, actionable guidance

4. Retrieval Practice for Deepening Learning (Agarwal et al., 2021)
The platform used in this study uses a three-tiered activity structure (Gist → Close Read → Write) integrates retrieval practice principles based on Agarwal et al.'s (2021) systematic review showing consistent benefits across content areas and formats. This approach emphasizes:

Active recall and application of understanding rather than passive review
Systematic scaffolding that promotes increasingly independent engagement with concepts
Graduated supports that build metacognitive awareness
Multiple opportunities to retrieve, apply, and extend knowledge

Empirical Validation Through Mixed Methods Research
The two efficacy studies employ a robust mixed-methods research approach that triangulates quantitative and qualitative evidence. The quasi-experimental design with a large sample (over 3,000 students in two districts, with different state tests) provides more methodological rigor than typical educational technology studies, enabling detection of meaningful effects while supporting generalizability in high-need contexts.
Key methodological strengths of the research framework include:

Established baseline equivalence (Hedges' g of .16 and .13, within WWC standards)
Low attrition rates (overall below 11%, differential below 7%)
Full academic year implementation tracking performance across multiple assessment points
Multiple data sources converging to explain implementation outcomes
Authentic implementation conditions reflecting real-world classroom contexts

This methodological approach aligns with the ESSA Level II evidence standards and addresses common limitations in previous AI-powered literacy research.

More [+]

Methods

Our study employed a rigorous quasi-experimental design following ESSA Level II evidence standards to examine the effectiveness of CourseMojo's AI-powered literacy platform. The research was conducted across two major school districts during the 2024-2025 academic year: a large, diverse district in Texas (Aldine ISD) and Sumner County in Tennessee. This mixed-methods approach combined quantitative academic outcome analysis with qualitative stakeholder feedback to provide a comprehensive understanding of the platform's impact and implementation factors.
Primary Research Questions

What is the impact of CourseMojo on student ELA achievement after one year of use, comparing users to non-users?
How do usage patterns and implementation quality relate to student outcomes?
What are the experiences and perceptions of educators and students using the platform?

Participant Selection and Sample Description
Texas District Study
The Texas district sample included 3,327 sixth-grade students across 13 middle schools, with 541 students in 2 treatment schools using CourseMojo and 2,786 students in 11 comparison schools continuing with standard instruction. All schools used the Wit and Wisdom curriculum as their core English Language Arts program, ensuring curricular consistency across treatment and comparison conditions.

Student-level demographic characteristics were carefully documented to ensure representation and analyze differential effects:
Baseline equivalence was established between the treatment and comparison groups using fall NWEA MAP scores and previous year (2024) STAAR scores. The standardized mean difference (Hedges' g) between groups was 0.16 and 0.13 standard deviations respectively, both within the What Works Clearinghouse (WWC) range for equivalence (0.25). As these differences exceeded 0.05, pretest scores were included as covariates in all impact estimation models, consistent with WWC standards for statistical adjustments.

Tennessee District Study
The Sumner County sample included 2,203 sixth-grade students across 12 schools, with balanced distribution between 6 Coursemojo schools (n=1,017) and 6 comparison schools (n=1,186). This secondary implementation site provided cross-context validation of findings.
Stakeholder Participation
To capture implementation experiences and contextual factors, we collected feedback from:

Students: Survey data was collected from 1,248 students (45 from the Texas district and the remainder from Tennessee) regarding perceptions of learning effectiveness, engagement, and satisfaction.
Teachers: 36 educators (12 from the Texas district) completed comprehensive surveys at the beginning (BOY) and middle (MOY) of the academic year.
Educational Leaders: In-depth interviews were conducted with 8 educational leaders across both districts, including:

Instructional coordinators
Instructional coaches
Principals
Language arts program directors

The educational leader participants were selected to represent diverse roles and experience levels, with years of experience ranging from 1 to 27 years. Interviews were conducted between April and May 2025 to allow for substantial implementation experience.

Data Collection Instruments and Procedures
Quantitative Academic Measures

NWEA MAP Reading Assessment: Administered at three time points (Fall, Winter, Spring) to track growth trajectories. The MAP Growth assessment is a computer-adaptive test that precisely measures student achievement and growth in reading.

aimswebPlus Reading Assessment & TN Ready Progress Monitoring: Administered multiple times throughout the year(Fall, Winter, Spring) to track progress towards grade level stills.

State Standardized Assessments:

Texas: STAAR (State of Texas Assessments of Academic Readiness)
Tennessee: TCAP (Tennessee Comprehensive Assessment Program) and TN Ready benchmarks

Platform Usage Data: Comprehensive metrics were collected on:

Number of weeks using the platform
Average days per week of usage
Time spent on each component (Gist, Close Read, Write)
Task completion rates

Usage was categorized into Low, Medium, and High implementation tiers based on frequency and consistency of engagement.
Qualitative Data Collection

Student Surveys: Comprehensive feedback instruments captured:

Perceived learning impact (5-point scale)
Engagement and effort ratings
Persistence through challenging activities
Overall satisfaction

Teacher Surveys: Collected data on:

Implementation experiences
Observed student impacts across different subgroups
Platform effectiveness compared to traditional methods
Areas for improvement
Professional development experiences

Educational Leader Interviews: Semi-structured protocols explored:

Implementation context and support systems
Observed impacts on teaching and learning
Administrative factors influencing success
Professional development and coaching experiences
Recommendations for optimization

Interview sessions lasted 45-60 minutes and were conducted virtually. All sessions were recorded with permission, transcribed, and coded for thematic analysis.

Attrition Analysis
The study maintained low attrition rates throughout implementation, strengthening the validity of findings. These attrition rates fall within WWC standards, with overall attrition below 11% and differential attrition below 7% for all assessment points.
Data Analysis Methods
Quantitative Analysis

Linear Regression with Cluster-Robust Standard Errors: Primary analysis method to account for the nested structure of students within schools. OLS linear regression models included the following covariates:

Pretest scores (fall NWEA MAP or prior year state assessment)
Student demographic characteristics
School-level factors

Multilevel Modeling (HLM): Two-level hierarchical linear models were used to analyze outcomes while accounting for student- and school-level clustering effects.
Subgroup Analysis: Differential impacts were examined for:

Special education students
English language learners
Economically disadvantaged students
Gender groups
Students with varying baseline achievement levels

Dosage Analysis: Implementation quality was assessed by examining the relationship between usage patterns and student outcomes, specifically analyzing:

Threshold effects (minimum usage required for impact)
Dose-response relationships
Optimal implementation patterns
Low Usage:

Less than 12 weeks of use throughout the school year
AND/OR
Less than 1.7 days per week (on weeks when the platform was used)

Medium Usage:

Between 12-20 weeks of use throughout the school year
AND
Between 1.7-2.5 days per week (on weeks when the platform was used)

High Usage:

More than 20 weeks of use throughout the school year
AND/OR
2.5+ days per week (on weeks when the platform was used)

Effect sizes were calculated using Cohen's d to standardize the magnitude of observed differences and enable comparison with other educational interventions.
Qualitative Analysis

Thematic Analysis: Interview transcripts and open-ended survey responses were coded using a constant comparative method to identify recurring patterns and themes.
Framework Analysis: A structured approach organized qualitative findings around key research questions related to implementation, student impact, and contextual factors.
Triangulation: Multiple data sources were systematically compared to validate findings and provide a comprehensive understanding of implementation factors and outcomes.

The qualitative coding process employed both deductive codes derived from research questions and inductive codes that emerged from the data. Two researchers independently coded data samples to establish inter-rater reliability before finalizing the coding framework.
Validity Measures
Several strategies were employed to strengthen the validity of findings:

Baseline Equivalence: Established using multiple pretest measures to ensure comparability between treatment and comparison groups.
Multiple Outcome Measures: Use of both formative (NWEA MAP or aimsweb) and summative (state assessments) measures to triangulate impact findings.
Low Attrition Rates: Maintained sample integrity throughout the study period.
Mixed Methods Validation: Qualitative findings were used to explain and validate quantitative outcomes.
WWC Standards Alignment: The research design adhered to What Works Clearinghouse standards for quasi-experimental studies, including appropriate statistical adjustments for baseline differences.

More [+]

Results

Here's the updated summary with "ALP" replaced with "Coursemojo":
The results of this study indicate that Coursemojo students significantly outperformed comparison students on TCAP state assessments by 8 points (β = 7.461, p < .001, Cohen's d = 0.12), with consistent advantages across quarterly benchmarks and progress monitoring assessments. In the Aldine district, similar positive outcomes were observed, with Coursemojo students outperforming comparison students on the Texas STAAR assessment by 10.51 points, and significantly higher proportions achieving passing rates (70% vs. 60%) and meeting grade level expectations (41% vs. 37%).
Dosage analysis revealed dramatic differences based on implementation frequency: students using Coursemojo 2.5+ days per week scored 9.5 points higher than comparison students, while those using it less than 1.7 days per week performed similarly to non-users. The Aldine study confirmed these threshold effects, with students who used Coursemojo for at least 1.7 days per week scoring significantly higher on STAAR than less frequent users (F(1, 482) = 20.53, p < .001, Cohen's d = 0.41). Students with at least medium usage (minimum 12 weeks) significantly outperformed the comparison group (F(1, 2802) = 20.53, p < .001, Cohen's d = 0.30).
The platform demonstrated exceptional effectiveness in closing achievement gaps across both sites. In Tennessee, students with disabilities reduced achievement gaps by two-thirds and economically disadvantaged students reducing gaps by more than half. The Texas findings were similar, with special education students showing substantial NWEA MAP gains (effect sizes of 0.21–0.36) and doubling their rate of meeting grade level on STAAR, while economically disadvantaged students significantly outperformed their peers by 13.8 points.
Teacher feedback was consistently positive across both implementations. In Tennessee, teacher surveys revealed high satisfaction levels (69% expressing high enjoyment, average 3.8/5) and positive impacts on workload (average 3.7/5). In Aldine, teacher satisfaction was universal (100%), with 75% rating Coursemojo as more effective than traditional methods. Students in both districts demonstrated high engagement, with 80% of Tennessee students providing high positive ratings for effort (average 4.2/5), and Aldine students showing similar patterns with 91.1% providing high effort ratings and 84.4% persisting through challenging activities.

More [+]

Importance

he study addresses the growing need for evidence on AI tool effectiveness in authentic classroom settings. It highlights the critical link between implementation quality and outcomes, demonstrating that consistent use is key to achieving positive results. Dosage findings suggest that occasional or supplemental use may be insufficient for meaningful impact. Evaluated through the EVER framework, the study demonstrates strong methodological quality, measurable achievement gains, and generalizability across diverse educational contexts. It also contributes to understanding how AI-powered tools can support equity goals when implemented with adequate support and systematic integration into regular instruction.
This research provides ISTE attendees with the evidence-based decision-making tools they urgently need as AI tools flood the educational technology market. Rather than relying on vendor claims or anecdotal success stories, attendees gain access to rigorous, methodologically sound research that offers credible evidence of what actually works in authentic classroom settings. The dosage findings deliver immediately actionable insights: occasional or supplemental AI use proves insufficient for meaningful impact, requiring systematic, consistent integration into regular instruction—critical knowledge for teachers, coaches and administrators planning professional development, scheduling, and resource allocation. For ISTE members committed to educational equity, the research provides concrete evidence that AI tools can close achievement gaps for students with disabilities and economically disadvantaged learners when implemented with adequate support, moving beyond theoretical promises to documented results. Attendees leave with frameworks for evaluating AI effectiveness in their own settings, understanding the implementation quality thresholds necessary for success, and practical guidance for transforming AI adoption from experimental add-ons into evidence-based instructional strategies that genuinely advance equity and achievement goals.

More [+]

References

References:
Agarwal, P. K., Nunes, L. D., & Blunt, J. R. (2021). Retrieval practice consistently benefits student learning: A systematic review of applied research in schools and classrooms. Educational Psychology Review, 33, 1409–1453.
Bereiter, C., & Bird, M. (1985). Use of thinking aloud in identification and teaching of reading comprehension strategies. Cognition and Instruction, 2(2), 131–156.
Gillespie, A., & Graham, S. (2014). A meta-analysis of writing interventions for students with learning disabilities. Exceptional Children, 80(4), 454–473.
Schechter, R., Ackerman, C., Gulemetova, M., & Janakiefski, L. (2025). The impact of Coursemojo AI-powered literacy instruction on sixth-grade English language arts achievement: A quasi-experimental study. LXD Research.
VanLehn, K. (2011). The relative effectiveness of human tutoring, intelligent tutoring systems, and other tutoring systems. Educational Psychologist, 46(4), 197-221.

More [+]

Presenters

Photo
Lead Researcher
LXD Research
Photo
Lead Researcher
LXD Research
Photo
Founder LXD Research
LXD Research at Charles River Media, Inc

Session specifications

Topic:

Personalized Learning

Grade level:

6-8

Audience:

Curriculum Designer/Director, District-Level Leadership, Technology Coach/Trainer

Attendee devices:

Devices not needed

Subject area:

Language Arts

ISTE Standards:

For Education Leaders: Equity and Citizenship Advocate
For Educators: Designer, Analyst

Transformational Learning Principles:

Ensure Opportunity, Ignite Agency

Disclosure:

The submitter of this session has been supported by a company whose product is being included in the session

Influencer Disclosure:

This session includes a presenter that indicated a “material connection” to a brand that includes a personal, family or employment relationship, or a financial relationship. See individual speaker menu for disclosure information.