Introduction

We’re starting a blog post series to provide real examples of developing a full LLM agent product experience. There are many technical, product design, and user testing facets to iteratively building, launching, and improving real agents. This series aims to provide perspective on what issues really matter when trying to develop real-world, complex agents. This is the first post and there are more to come every few weeks!

Building HomeworkBot: An agent for student organization and coursework tracking

Developing useful agents with Large Language Models (LLMs) comes with a host of challenges ranging from conversation experience design, data quality challenges, and reliability issues when connecting LLMs to external APIs or tools. This blog post series will cover the development of an agent to assist students develop organizational skills and keep track of assignments and coursework to improve grades and build good long-term habits. Educational agents are a great example of the potential of LLM-based systems because digital learning helpers can be always available and infinitely patient. Let's dive in!

‍

Introduction: Agents for student support

Since the rise of LLMs, significant attention has been devoted towards applications of generative models towards scalable education. While application directions have been broad - examples include Turnitin’s plagiarism detection [1], automated feedback to student work [2], and even generating full courses [3] - a significant number of projects are working on “tutoring co-pilots” to assist students in learning and coursework. One of the most prominent examples of this is Khanmigo [4], where students can chat with an LLM to get tutoring on new concepts they are learning in school.

When we think of an agent “tutor,” we might initially focus on answering questions about course content, or coming up with quiz questions to help students review a particular topic. This is broadly known as content tutoring because the tutor assists students with particular topics, courses, or content areas. A content tutor must be knowledgeable about the topics for tutoring. Of course there is huge variety in what course topics students might be working on, and even within a topic there are many levels of detail or different learning progressions. There are many variations of content tutoring systems under development and often the most important aspect is what information sources or training data is used because that affects the level of content breadth and expertise of the resulting system.

Our new agent will help with a different set of skills: study time management, project/work planning, and minimizing missed work. Starting around age 13, students have increasing amounts of assignments to complete at home or assignments which require breaking down high level goals into smaller chunks of work to complete. Developing executive function skills to plan and track such work is critical not just for academic work; these same skills translate into managing time and complex projects generally in life.

‍

Organizational tutor agent overview

For a lucky fraction of students, parents, mentors, or teachers help to track work assignment due dates, encourage students to work on things with enough advance time, and remind students to submit completed work. Over time, external reminders from these people are internalized by the student into effective planning, tracking, and time management skills and habits.

However, not all students have enough support in building these skills. Challenges with organization and lower grades due to missing or incomplete work can lead to compounding challenges and frustration with academic work. This problem becomes more of an issue as a student matures and is assumed to manage their time and class load independently (as a college undergraduate for example).

Our hypothesis for designing and building this agent: Given the scalability of LLMs, can an LLM agent connect with coursework data to aid students in developing strong planning and time management skills (without the need for content expertise we see in systems like Khanmigo)? We believe this sort of tutoring does not require any domain-specific content knowledge, and can help students improve grades while building long-term skills.

In recent years, US public secondary schools and universities have quickly adopted online platforms for course grades, assignments, and sharing course resources with students. These platforms (e.g. Canvas, Blackboard, and PowerSchool) track assignments, grades, course schedules, and other information for students and parents. Parents and mentors can perform organizational tutoring with students by reviewing these online platforms together, so it’s possible with a human expert! Now let’s get working with some real course data to see what’s possible with LLMs connected to this type of data.

‍

Working with real course data

To get started building an agent, it’s always good to start by looking at your data! Just like data science work, agents often connect with and need to parse messy, incomplete, or inconsistent data sources. Getting your eyes on a few examples of such data can help shape what you think an agent can do.

In this phase, we’re going to get working with some real data, and use basic LLM prompting to see what an LLM can do with the data in a lightly processed initial form. This will help us brainstorm about what is feasible when considering an agent where the online course management platform could be the sole data source to help inform students about assignments, exams, grades, etc.

First, our team built a small tool to access data from a popular Learning Management System (LMS), Canvas. Our tool simply fetches the available data from a student’s Canvas account and aggregates the data along with applying some basic cleanup. We built a “quick and dirty” first attempt at fetching and compiling the raw data into a format we can use as context for an LLM prompt. This first step is akin to the “data ingestion and wrangling” step in data science projects. This data setup is a famously critical step to prevent garbage data (and sometimes frustrating to get right).

Sample data

Here are short, anonymized excerpts of the coursework data for a 9th grade student in a US public high school from the Canvas LMS platform used by the student. The bulk of the LMS data is simply a set of courses, with each course containing rich, nested information. Here is how the data starts,

‍

<course>
      <name>Art I: Found of Studio</name>
      <workload>
        <this-week>
          <assignments-due>
    <assignment>
        Assignment: Q1 Journal Entries
        Due: 11/11/2024 01:35 PM (Later this Week, Friday)
        Locked Date no lock date
        Unlocked Dateno unlock date
        Points Possible: 50.00
        Submission Type: on_paper
        Group Assignment: This is not a group assignment.
        Peer Reviews: false
        Description: (... full data not shown ...)

‍

LLMs are great tools for processing this sort of tagged, semi-structured text data. However, useful information like most recent homework grades etc. are buried within this possibly complex structure. That could be an issue for an LLM trying to use this data as-is to respond to complex queries about the data.

The data contains information in nested, tagged sections These include assignments to link assignment handouts and attach metadata, online quizzes (which this course leaves empty), announcements, and course content modules which can nest other data or contain links/text or other information about the course and weekly topics.

‍

(... full data not shown ...)
<assignments>    
<assignment>
        Assignment: Wounded Knee and Battle of Little Big Horn Source Activity 
        Due: no due date
        Locked Date no lock date
        Unlocked Dateno unlock date
        Points Possible: 10.00
        Submission Type: on_paper
        Group Assignment: This is not a group assignment.
        Peer Reviews: false
        Description: no description
        Locked: false
        Grading Type: points
        Submitted: submitted
        Grade: No grade yet
    </assignment>
    
    <assignment>
        Assignment: Warm-ups August  and September 
        Due: 11/09/2024 10:59 PM
        Locked Date no lock date
        Unlocked Dateno unlock date
        Points Possible: 10.00
        Submission Type: online_text_entry, online_url, online_upload
        Group Assignment: This is not a group assignment.
        Peer Reviews: false
        Description: no description
        Locked: false
        Grading Type: pass_fail
        Submitted: submitted
        Grade: 10.00/10.00 (100.00%)
    </assignment>
    
      </assignments>
      <quizzes>
        
      </quizzes>
      <announcements>
        
      </announcements>
      <modules>
        
    <module>
        Name: October 7-11
        Workflow State: no workflow state
        Position: 41
        Unlocked At: no unlock at
        
    </module>
    
    <module>
        Name: September 30-October 4th
        Workflow State: no workflow state
        Position: 42
        Unlocked At: no unlock at
        
    </module>

‍

We can see this data is granular, sometimes missing, and overall a bit messy. LMS systems contain many potential data fields but different schools, courses, and instructors will set up and use the LMS data platform in very different ways.

We could choose to work on data cleaning and normalization to try and improve this data now. However, early in our agent development lifecycle we like to take a “learn fast” approach and build towards a partly working system first, and once we see those results we can go back and improve data cleaning if that seems like a promising way to improve our overall results.

‍

LLM Question Answering on Student Data

Modern LLMs support increasingly large context windows, and we can fit the full course data for this student into a single LLM prompt! So for a very quick sense of what an LLM can do with this data, we will provide all the data as context along with some basic data handling prompt instructions, then evaluate how well an LLM can answer some basic questions a student might ask.

Aside: Think twice about data security when prototyping with an LLM in this way – you are sending potentially sensitive data to an external tool/API. In our case, we are using a secure LLM service that we host internally and that does not use data we send for later model training.

Here are some example questions and responses for that same high school student,

Input 1: Do I have anything due for my Spanish class?

LLM Response: Yes, you have a couple of things coming up for Spanish this week: 1. **Examencito - vocab. "Los Días Inolvidables"** is due tomorrow, Wednesday. Make sure you're studying the vocabulary for that quiz! 2. **Conjuguemos - "Pretérito Regular"** is due Thursday. You'll need to practice the regular preterite tense three times on Conjuguemos and upload your scores to Canvas. Is there anything specific you'd like to focus on for your Spanish assignments?

Input 2: What assignments do I have due for this week?

LLM Response:
Here's a list of your assignments due this week:

1. **Geometry:** - DeltaMath: 2.2.0 Prerequisite (Pythag Thm Practice) - Due Wednesday - DeltaMath: 2.2.1 Triangles Theorems Practice - Due Wednesday - 2.2.1 Quick Check - Due Wednesday - DeltaMath: 2.2.2 Pythagorean Thm & Its Converse Practice - Due Friday - 2.2.2 Quick Check - Due Friday
2. **Spanish III:** - Examencito - vocab. "Los Días Inolvidables" - Due Wednesday - Conjuguemos - "Pretérito Regular" - Due Thursday
3. **US History:** - Tactics of Business and Labor - Due Wednesday
4. **English 9:** - Class Discussion - Central Themes in The Other Wes Moore - Due Wednesday - Close Reading & Annotations - The Other Wes Moore: Part One Prison Interview (p. 1-4) - Due Friday
5. **Earth & Space Systems Sci G/T:** - HW LS2 #1: The Sun Reading & Questions - Due Thursday - Quiz: LS2: How the Sun Works - Due Friday - Sunspot Activity - Due Friday
6. **Art I:** - Skull 1: Contour Line Sketches - Due Friday - Q1 Journal Entries - Due Friday That's quite a bit! Let me know if you need help prioritizing or organizing your tasks.

Input 3: I’m going to work for 90 minutes now. I want to work on whatever is due soonest — what should I work on?

LLM Response: Great plan! The assignment that's due soonest is the **"Tactics of Business and Labor"** for your US History class, which is due tomorrow. Focusing on that first would be a good idea. After that, you might want to tackle the **"Examencito- vocab. 'Los Días Inolvidables'"** for Spanish III, also due tomorrow. How does that sound for a plan?

Input 4: Which upcoming quiz should I study for first?

LLM Response: Let's take a look at your schedule. You have a quiz in Spanish III on "Los Días Inolvidables" vocabulary due tomorrow, October 9th. You also have a US History assignment on "Tactics of Business and Labor" due tomorrow. Given that the Spanish quiz is specifically a test and might require more focused study on vocabulary, it might be a good idea to prioritize studying for that first. How do you feel about your preparation for these?

These initial question answers look fairly good! The LLM was able to parse through the somewhat messy raw data and answer factual questions about due dates, and the LLM was able to compile information into a succinct list (e.g. upcoming assignments).

Fact check! Remember that LLM responses can sometimes contain hallucinations (contain information that is not factual relative to the provided source material). As part of evaluating these responses, we manually compared LLM responses to the source data we provided – this is a critical part of early testing as we want to understand if the LLM is able to easily provide correct information given the data we provide.

Our fact checking review uncovered that the LLM has some issues with correctness but it’s subtle – if you look at the responses for inputs 2 and 3 above you can find the LLM is inconsistent about what is a homework assignment + deadline vs what is an in-class quiz – the Spanish vocab quiz is “due Thursday” but that is actually an in-class quiz so it’s a bit different from the Geometry work to complete at home.

‍

Evaluating additional student datasets

Given that each set of courses might have slightly different data, and different schools set up LMS systems with different data fields or workflows, it’s important to do our early testing on more than just a single account. Below we show some inputs and LLMs responses from a different set of student data – a final-year undergraduate in a large public university. Let’s see how similar questions about deadlines, grades, and study strategies look against this data,

Input 1: What assignments do I have due for this week?

LLM Response:
For this week, you have a couple of things coming up:

1. **US History, 1877-Present**:
   - **Week 3: Writing Assignment--Gilded Age** is due by Monday at 01:59 AM.
   - **Quiz: Week 3, Lecture 1 (Industrialization: Capital)** is due by Monday at 01:59 AM.
   - **Quiz: Week 3, Lecture 2 (Workers and the Labor Movement)** is also due by Monday at 01:59 AM.
2. **Computer Architecture**:
   - **Assignment: Addition, Subtraction, Mux, Complete gates** is due on Thursday at 01:30 PM.

Make sure to plan your time accordingly to get these done! Do you have any questions about these assignments or need help with anything specific?"

‍

Input 2: Do I have anything due for my Calculus II class this week?

LLM Response: It looks like there aren't any assignments or quizzes with specific due dates for your Calculus II class this week. However, you do have several quizzes available that you can use to check your understanding of the material. Is there anything specific you'd like to focus on in Calculus II?

Input 3: Are there any extra credit assignments available that I can work on to boost my grade for my Multivariable Calculus Class?

LLM Response:
It looks like there aren't any extra credit assignments listed for your Multivariable Calculus class. You could try asking your teacher about any extra credit assignments they might have available. How does that sound?

Once again, the LLM was able to sift through the raw course data and provide a useful list about assignments, as well as answer specific questions about individual classes! Our fact checking review of these responses verified that the LLM delivered accurate responses relative to the source data.

This is looking promising! We are able to use the same simple data extraction and preprocessing to answer questions about undergraduate and high school courses from multiple students. We also notice something valuable about LLMs as an organizational tutor – the LLM is easily available for questions, and it can provide suggestions for time management, staying positive about grades, or how to stay focused on long-term projects.

Here is a similar helpful output from the LLM,

Input: Can you give me a schedule for allocating time to work on my assignments for this week? I have school from 7am to 3pm and would like to be in bed by 10-11pm.
‍
LLM Response:
Absolutely, let's create a schedule that fits your school hours and ensures you have enough time for both studying and rest. Here's a suggestion for your weekday schedule:
**Monday to Friday:**
- **3:30 PM - 4:00 PM:** Quick break and snack after school.
- **4:00 PM - 5:00 PM:** Work on overdue assignments, like the ""Logic Gates"" and ""Addition, Subtraction, Mux, Complete gates"" for Computer Architecture.
- **5:00 PM - 6:00 PM:** Focus on the ""Manifest Destiny"" writing assignment for US History.
- **6:00 PM - 7:00 PM:** Dinner and relax.
- **7:00 PM - 8:00 PM:** Choose one of the Calculus II quizzes to review and complete, like the ""Polar Curves Quiz"" or ""Trig Substitution Quiz"".
- **8:00 PM - 9:00 PM:** Review notes or readings for upcoming classes.
- **9:00 PM - 9:30 PM:** Wind down, prepare for the next day.
- **9:30 PM - 10:00 PM:** Free time or quick review if needed.
- **10:00 PM:** Start getting ready for bed.

**Saturday and Sunday:** - Use these days to catch up on any missed assignments or focus on longer study sessions for upcoming tests or projects. Does this schedule work for you, or would you like to adjust anything?

The LLM combined with source data allows this response to be detailed, customized, and potentially really helpful. This individualized attention creates a fun mechanism for students to turn time and task management into a fun chat with an LLM helper. It’s rare for any student to continuously get detailed ideas like this for completing work along with their other daily routine. When designing agents, we often see a key point of value is the always-available, customized responses LLM agents with custom data feeds can provide.

‍

Conclusions

When building data science and machine learning models, it’s best to start by exploring the real data to see what’s possible. This allows us to account for how data format, missingness, etc. might affect our ability to do what we hope with the data. Building LLM agents is similar, we started by exploring what’s possible given the data available, and that informs our vision and requirements for upcoming work to scope and develop an agent.

An organizational helper must be personalized to the student and understand what courses they are in, what assignments they have, and what concepts they are learning right now. Our initial experiments validate our hypothesis that an LLM provided with data extracted from an LMS platform can create customized, accurate, useful answers about coursework, deadlines, and study scheduling. We performed a simple but fairly effective data transformation to achieve these initial results, so we know more accurate, detailed responses are possible as we improve data extraction and clearing over time.

‍

What’s Next?

In our next post, we explore more about the possible user experience (UX) for students. Answering student questions alone will not drive the skill building and accountability we’d like to achieve with an organizational tutor agent. Our first big product/UX hypothesis for this agent centers around the idea of HomeworkBot initiating a daily check-in to review upcoming work and prompt the student to strategize about time use, deadlines, and of course answer student questions too.

By combining our UX hypothesis about a daily check-in with what we found possible using an LLM connected with coursework data, we can start developing an LLM action to generate a daily reminder message. Simply by changing the LLM prompt we can use our existing student data extraction to generate a daily reminder like this,

Let’s look ahead to the next few days

Today (Oct 9):
📚 2.2.1 Quick Check (Geometry) – Due by 3:00pm
📐 DeltaMath: Pythagorean Theorem Practice – Due by 3:00pm
🔺 DeltaMath: Triangle Theorems Practice – Due by 3:00pm

Tomorrow (Oct 10):
🌞 Sun Reading & Questions (Earth & Space Science) – Due by 8:00am

Friday (Oct 11):
📝 Quiz: How the Sun Works – Due by 8:00am
🎨 Art Journal Entries – Due by 2:35pm
🧮 Geometry Quick Check & DeltaMath – Due by 3:00pm

Reminder:
🚨 No school on Monday, Oct 18!

In our next post we start testing a daily check-in like this with actual students and show you some of the early learnings which further refine our hypotheses of what is possible and what is useful for an organizational tutor agent.

Excited about education or building something similar? We’d love to hear from you. Reach out to the Pointable team.

‍

A new series – developing an LLM agent to support students