December 15, 2024

HomeworkBot: What should we finish today?

The Pointable engineering team: Mike Wu, Scott Wey, Brian Wu, Jerry Wang, Justin Sung, Tyler Duong, and Andrew Maas

Introduction

This is the second blog post in a series on building a real-world LLM agent with a focus on education. If you are new to this series, you can find the first blog post here. This series aims to provide perspective on what issues really matter when developing real-world, complex agents. In this post, we will introduce the user experience for HomeworkBot and product learnings from Wizard of Oz experiments with students. 

Product and user experience goals for HomeworkBot 

Unlike most educational tools today, the north star for HomeworkBot has nothing to do with content tutoring (helping students learn and understand coursework topics). Rather, this agent helps students develop organizational skills and good long-term habits, particularly around planning, organization, and study skills. 

Our overall goal for the HomeworkBot agent is a low-friction, always-available, helpful, positive supporter of student learning. Students should enjoy interacting with HomeworkBot, and feel as if a helpful friend is making the process of schoolwork and homework easier. In terms of particular tasks/functions, we think the HomeworkBot will achieve our overall goal if we start with some important student support tasks:

A first HomeworkBot user experience: Daily check-ins

This is a wide spectrum of features, and building a reliable LLM agent to do all of these would be a challenging technical endeavor. Instead, we decided to focus on a single feature for a minimally viable product: daily check-ins. 

From early need finding conversations, even something as basic as reminding students on the most imminent due dates is extremely valuable to their success. You see something like this often in popular mobile applications, spanning education to gaming. For example, Duolingo  effectively leverages daily notifications to remind students to complete a learning goal. 

Formalizing a daily check-in user experience hypothesis

HomeworkBot aims to be a trusted, easy-to-use organizational agent for students, empowering them to manage assignments, deadlines, and academic progress independently. Unlike typical educational apps, HomeworkBot emphasizes developing long-term skills in planning, organization, and goal-setting to build positive, lifelong academic habits.

Product Hypothesis

If students have access to a non-intrusive, engaging, and consistently available agent to help them manage school logistics, they will develop stronger organizational skills, achieve higher academic satisfaction, and feel more in control of their learning process.

User Experience Goals
User Experience Hypothesis

By providing a friendly, low-friction experience that allows students to set goals, track assignments, and receive timely, encouraging reminders, HomeworkBot will increase student engagement with their own organizational processes and reduce the stress associated with school logistics. A personalized, adaptable experience will foster a positive association with HomeworkBot, encouraging ongoing use.

High-Level Objectives
  1. Task and Deadline Management: Enable students to track and manage their assignments, exams, and projects in a centralized, easy-to-access space.
  2. Progress and Performance Tracking: Help students monitor their academic progress, identifying areas of strength and improvement without requiring manual input of grades.
  3. Goal Setting and Achievement: Offer tools for students to set academic goals (e.g., grade improvement, timely assignment submissions) and track their progress toward these goals.
  4. Reminders and Notifications: Provide timely, helpful reminders about key deadlines, upcoming assignments, and logistics such as submitting work, ensuring students stay on top of their responsibilities.
  5. Feedback and Reflection: Enable students to log and revisit instructor and peer feedback to foster a continuous improvement mindset, with guided prompts for self-reflection.

Minimum viable product for early feedback

We set out to build a minimum viable product (MVP) capable of generating daily check-ins and interacting with students through conversation. For the student experience, we think an easy-to-implement and familiar texting interface works well for a first attempt. In particular, students interacted with the bot through iMessage in our first tests (a group chat so our product team could easily observe the interactions too). This avoids asking students to download yet another app, and build it into their workflow. It also avoids the need for Pointable’s team to invest technology development time into custom messaging or phone app software. 

Daily reminder checklist

Based on research of tutoring best practices and discussions with both students and tutors, here is a first set of topics and questions Homework Bot can ask students each day to create a daily homework organization session. 

For our first user tests, we plan to have Homework Bot guide a discussion over iMessage on the following topics:

  1. Start with a quick review
    • Ask the student about yesterday’s tasks: What did they complete? What challenges did they face?
    • Review any changes to their schedule, such as new assignments, test dates, or extracurricular commitments.
  2. Assignment Review
    • Go over each subject and make sure the student knows all the upcoming deadlines (homework, projects, tests, etc.).
    • Check for missing assignments or past due work.
  3. Prioritization
    • Help the student prioritize tasks based on deadlines and difficulty.
    • Discuss any high-priority tasks for the day (e.g., studying for a test tomorrow).
    • Break down large projects or papers into smaller, manageable tasks.
  4. (Optional) Deeper support topics if student asks:
    • Time Management & Scheduling (5-10 minutes)
      1. Look at the student’s schedule for the day or week. Help them block out specific time slots for studying and completing assignments.
      2. Discuss how much time they realistically need for each task and whether they need to adjust any plans (e.g., extracurriculars or family activities).
    • Study Strategy
      1. Ask the student if they have any challenging subjects or tasks and discuss how they will approach them.
      2. Suggest specific strategies like starting with the hardest task first or using breaks between study sessions to stay fresh.
  5. Wrap-Up 
    • Summarize the day’s plan and tasks.
    • Ask the student to confirm their next steps and express confidence in their ability to complete them.

Making a reliable and automated experience turns out to be very difficult. Given any off-the-shelf LLMs would likely hallucinate, we knew that building a robust system would require a custom RAG pipeline or finetuning on extensive student conversations. To avoid investing in a long engineering project before we can put it in front of users, we opted for a Wizard of Oz setup. 

Building daily reminders like the example above requires:

A course platform like Canvas, Ed, Piazza, or Khan Academy should contain all three data requirements above. Paired with a large language model, we in theory have all the pieces to build daily reminders for the student and interactively assist them in navigating multiple courses. 

A screenshot of the Canvas UI containing student coursework.

The Canvas dashboard contains feedback, todos, and a link for all courses. While this page may contain all the information needed for a student’s learning success, students may find this presentation difficult to navigate. One of the primary benefits of HomeworkBot is a better user experience for students that is both simple and two-way. Instead of students having to remember to comb through Canvas, HomeworkBot is able to surface the most important items daily and answer any clarification questions that the student may have.

Wizard of Oz automation for early iterations

Rather than building an automated agent at the outset, the initial “automated bot” is simply a human with browser access to the student’s Canvas account. Behind the curtain of this “wizard of Oz” system, the human operator can respond to any question/request from the student, and a human should be an upper bound for what we expect to achieve when answering questions or summarizing information from the online course platform using an LLM. 

If you are not familiar with Wizard of Oz (WoZ) experiments, these are a design paradigm,  originating from a research lab at Johns Hopkins in the 1970s, in which a human (a “wizard”) simulates the behavior of a software application, often without the end-user knowing that there is a human involved. WoZ experiments are vital to iterative design and provide the ability to quickly obtain user feedback and test product hypotheses. 

The setup for a Wizard of Oz experiment for HomeworkBot. Students are told to text a number to chat with an AI homework tutor. Messages are sent to a human-in-the-loop who has knowledge of the student’s course data and template messages from an LLM. The human-in-the-loop is able to craft/edit messages to send back to the student.

An LLM agent is given the student’s course and assignment metadata in context and prompted to output daily check-ins and candidate responses to student messages. Importantly, a human-in-the-loop is added to verify and edit LLM outputs before they manually send a text to the student to prevent hallucinations. The human-in-the-loop may also choose to initiate a conversation with the student, akin to a reminder that can be automated in the future. The wording and tone of all messages is up to the human-in-the-loop. From the student’s perspective, they are communicating with an AI agent. 

This design mimics the quality of a well-tuned agent to allow us to run experiments with early users without significant time or resource investment. Piloting this WoZ MVP with a core group of early beta testers, we share three early findings from a case study with a middle school student. Below, we include anonymized snippets of conversations the student had. 

Learnings from MVP testing with students

Students might not play along with a complex check-in

Our initial approach to designing daily check-ins focused on specific, detailed questions to help a student stay organized and think proactively about coursework. Our plans for daily sessions covered several topics including upcoming work, prioritizing projects for study time, and possibly going deeper on a study plan for a larger project or course. However, we quickly learned from initial student sessions that our daily check-in script was too complicated! Students did not engage with the complex check-in script we prepared.

Exactly zero of our first ten sessions with students allowed HomeworkBot (human agent) to cover all topics in the daily check-in script. Students would sometimes not engage with probes for deeper conversations, especially since some students were in transit while texting with HomeworkBot. Further, even with an engaged student the questions around work prioritization and planning for a study session were too long for a quick, helpful conversation about homework. Not all topics in our check-in script were relevant to many student situations.

It’s a good thing we did this WoZ experiment first before building a full LLM agent for the complex checklist! To continue iterative testing, we simplified the daily check-in script to focus on graded work and upcoming assignments while keeping the conversation short overall. Of course with our human WoZ system, we can still respond to any requests or topics students raise during early testing.

Overall, students seem to much prefer bite-sized pieces of timely information, rather than something that might be more “complete” or “powerful” but create a lengthy user interaction. 

Students value easy conversational access to graded work

Many of the questions asked by students surround grades and their scores on previous assignments. Understanding one’s performance is an important part of planning for future work and what to focus study time on in order to improve grades and understanding. 

Student: Ok is there anything for Spanish?
HomeworkBot:
Let me take a look for you. It does look like you have an assignment today but that it was in class I believe.
Examencito- vocab. 'Los Días Inolvidables'
Due Oct 9 at 11:59pm. You scored 43/55
Student: Aww :( I was hoping to get a B, but to be fair, it was really confusing
HomeworkBot: Nice! On Friday you also have a quiz in that class on “How the Sun Works”
Student:
I’m excited. Do I have any zeros outside of geometry which is to be ignored? In geometry, if it is classified under homework assignment it counts as 0% towards our grade I usually only do them if I feel I need extra help
HomeworkBot:
I see a few zeros in Earth & Space Systems. You may want to take a look at those. Also English as well. Were there any unexpected issues in completing any of those assignments?
Student:
What do I have for English?
HomeworkBot:
There are some items marked “late” for The Most Dangerous Game Essay assignment. And a “missing” for Anticipation Guide for Wes Moore's The Other Wes Moore which was due today.

These examples suggest the conversational agent may be a convenient way for students to receive updates on graded work. While this information was all available on Canvas, this student was not aware of their grades on recent assignments but could easily engage with graded work via chat. They found it significantly easier to chat with HomeworkBot about performance rather than through the web application. This looks like a promising theme to build upon – an LLM-based agent should be able to access and discuss graded work in Canvas as a way to keep students engaged in courses and grades.

Last mile reminders for next-day and turning in work stand out as helpful

Some students face challenges not with completing the assignment but remembering to turn it in, resulting in late penalties. HomeworkBot asking and reminding about assignments due the next day provided significant benefit in some cases.

Student: Yes i finished a little late for most items but still submitted them
HomeworkBot:
Have you been able to complete those?
Student:
I have done that I need to submit it
HomeworkBot:
That’s great you were able to complete them. [...]
HomeworkBot: How are you feeling about the rest of this week? Will you be able to submit your English assignments?
Student:
I’ve submitted them. Usually, English isn’t very difficult. I just struggle to remember about it.
HomeworkBot:
Great. How do you feel you did on those? Feeling good?
Student:
I think I did ok, but I submitted them late so I have some credit off

These are good examples of the types of challenges that an educational planning copilot could help students with. Even something as simple as a daily check in with the student can serve as an important reminder for what logistics need to be done that day. 

Same-day or night-before reminders could allow HomeworkBot to help students prevent late or missing assignments. We will need to be careful about “reminder overwhelm” – HomeworkBot can’t simply send messages about all work to turn in without becoming annoying! Selecting the correct reminders and overall nudge frequency from HomeworkBot will be critical to making it a welcome helper for students.

Conclusions

So far, we’ve studied the value of an “organizational and planning copilot” for students through early experiments, namely through daily check-ins that keep students up to date with their performance in courses and remind them to complete their assignments. The initial daily check-in was too detailed, and students never played through the full check-in routine. However, our iterative testing showed clear value from easily accessing course updates and grades via a chat agent. 

Our overall conclusion for this experiment: There is value in building an automated HomeworkBot to (1) send (short) daily customized reminders about homework, upcoming deadlines, and grades and (2) provide a chat interface where students can ask questions about work and grades instead of trying to find information directly on Canvas. Our experiment showed consistent daily engagement and positive feedback from students using HomeworkBot.

With some initial feedback from user testing, we have identified a first set of LLM agent capabilities (customized daily reminders) that seems achievable to build from the course platform data.

A daily reminder MVP

As we iterated with students in WoZ testing mode, it became clear that a customized daily reminder message has a lot of impact on student engagement. We were able to use LLMs to generate daily reminder messages surprisingly quickly! It was just a matter of some prompting and validation to generate rich daily reminders using the data extracted from a student's course information platform (see previous blog post on understanding the data).

Here is an example of an LLM-generated daily reminder message:

​​Let’s look ahead to the next few days

Today (Oct 9):
📚 2.2.1 Quick Check (Geometry) – Due by 3:00pm
📐 DeltaMath: Pythagorean Theorem Practice – Due by 3:00pm
🔺 DeltaMath: Triangle Theorems Practice – Due by 3:00pm

Tomorrow (Oct 10):
🌞 Sun Reading & Questions (Earth & Space Science) – Due by 8:00am

Friday (Oct 11):
📝 Quiz: How the Sun Works – Due by 8:00am
🎨 Art Journal Entries – Due by 2:35pm
🧮 Geometry Quick Check & DeltaMath – Due by 3:00pm

Reminder:
🚨 No school on Monday, Oct 18!


To try this automated approach, we continued WoZ testing and provided the LLM-generated reminder message to the WoZ (human) agent. Providing a rich suggested message makes WoZ testing faster and easier for the agent, while allowing us to test if the human WoZ agent finds issues with the LLM-generated reminder message. This quickly converged to a "good enough" LLM-generated daily message we can use with no human agent in the loop -- a true minimum viable product for a simple homework tutoring agent.

We believe that LLM-based agents are "products" in their own right, and by applying Lean Product thinking it's possible to iteratively build agents that are helpful and engaging for users, while avoiding the shortcomings of current LLM and language processing technologies.

Excited about education or building something similar? We’d love to hear from you. Reach out to the Pointable team.

© Pointable 2024. All rights reserved.