Standing on the Shoulders of Giants

Sir Godfrey Kneller, Sir Isaac Newton, 1689justine-cassell

To understand this blog, it’s best to begin with an explanation of the work of Dr. Justine Cassell, principal investigator of the ArticuLab at Carnegie Mellon University. She has made significant progress in the field of embodied conversational agents, which are computer programs that are displayed with a body image and which try to imitate human behavior in some way (verbally or nonverbally or both) in order to help humans; for example, by teaching them math or being their personal assistant (like Apple’s Siri). The ArticuLab, where I do all research posted on this blog, studies human interaction in sociocultural contexts and designs computational systems that are sensitive to these contexts. The RAPT project (Rapport Aligning Peer Tutor) in the ArticuLab is headed by Michael Madaio. RAPT is researching optimal strategies for teaching math to middle school aged children. Recent research in learning sciences has shown that students learn better when they form a friendly bond with a tutor, and also that virtual tutors can improve student performance by up to two standard deviations. RAPT capitalizes on these two important discoveries, by building a virtual tutor that can respond in real-time to human friendship-building behavior (also called rapport). Hopefully some day, such a virtual tutor will make human learning not only more efficient, but fun and engaging.

This is where my personal work at the ArticuLab begins (summary can be found at the end of this post). By standing on the shoulders of giants, we can make important discoveries without reinventing the wheel. To that effect, my first week began with a literature review. My goal for this first week is to use the literature review to find research questions for future investigation in one of three categories (as described in the project proposal submitted to CREU):

A. How does a student’s learner profile (e.g. degree of extroversion or conscientiousness)      and the design of a virtual tutor (e.g. its ability to manage face) affect the building of          rapport between virtual tutor and tutee?

B. How does rapport impact learning (behaviors and outcomes) differently between                human-human interactions and human-virtual agent interactions?

C. Does rapport that is built during the tutoring process cause greater student                          engagement in the tutoring process and how can we measure that engagement (e.g. by      counting explanations or utterances)?

In approximately 9 hours, I explored 11 articles. I began by first reading articles authored by ArticuLab members to find out what the ArticuLab’s response to categories A through C are. I specifically sought articles loosely covering the subject of categories A through C and discarded the other articles. Next, I read the articles cited in the aforementioned articles. Finally, I picked the following broad questions for research, out of the countless fascinating avenues of research in this burgeoning field:

  1. Are certain personalities more likely to insult or disengage from virtual agents?
  2. Are there gender differences in problem solving behavior and to what extent are these important?
  3. How does anxiety predetermine what rapport building strategies a student uses (which a virtual tutor must be sensitive to)?

Below is a review of sources that led me to these questions (1-3), with numbers corresponding to the questions (1-3) above. The complete citations for the works mentioned below can be found in the bibliography at the end of this post.

  1. Question 1 fits into Category A because it seeks a connection between innate characteristics of the student (e.g. personality traits) and rapport-destructive behaviors (e.g. insults, abuse). Noftle and Shaver (2006) described how a human’s personality affects the kinds of relationships which that human seeks with other humans; for example, highly anxious people tend to fear attachment (high rapport) with others. Furthermore, Park et al. (2014) demonstrated that anger, frustration, and narcissism are all significant (p < 0.01) positive predictors of rude, rapport-destructive behavior in humans. Due to the persona effect (Johnson et al., 2004), whereby humans treat intelligent virtual agents as social actors, one can assume the observations on human-human relationships carry over to human-virtual agent relationships. Therefore, the personalities of the students being tutored can destroy rapport with virtual agents, namely traits like neuroticism or narcissism. These traits are part of the Big Five Personality Traits, and we measured these traits through surveys in prior experiments with human-virtual agent dyads. Putting the articles together, Question 1 arises. Due to the fact we have the data to answer this question, it seems like a promising direction for further research.
  2. Question 2 is derived from the research of leaders in linguistics research using feminist ethnography to rebut old theories of learning based on heterosexist ideologies, which polarize boys’ behavior as solely conflictive and girls’ behavior as solely cooperative. The question of gender differences appears in this blog due to Category B; I am interested in the effects that differences in gender have on communication styles and how this could cause a virtual agent that is insensitive to these differences to have more difficulty building rapport because it appears so inhuman. In a book chapter written by Norma Mendoza-Denton (1999), she describes the polysemy (roughly, multiple meanings) of the word “no” in Spanish spoken by Latina high school students. Sometimes it’s used as an affirmative word and sometimes as a prelude to a summary of what was said by previous speaker; both uses are not the conventional use of “no” for confrontation or negation. Additionally, Ardington (2006) describes how, in a tutoring paradigm, the speech pattern of two high school girls oscillates between teasing wordplay and directions to focus. For example, when completing homework, one girl makes a face threatening statement via text in a joking manner (“dingbat!”, which is a cute marsupial) and the other girl demonstrates she is not offended by the joke by replying “dingbat!” which is followed by “stop it! we’re supposed to be doing homework!”. Ardington (2006) claims this kind of play-talk often goes on in the background of serious activity. Thus, Question 2 arises because it would be beneficial to rapport-building efforts to have a virtual tutor that engages in play-talk, not necessarily only with girls but also with boys and any gender; this makes the agent appear less artificial or stoic.
  3. Question 3 fits into Category A. Question 3 asks whether anxiety, as measured by our survey, can affect rapport building behavior. Kang et al. (2008) compared personalities of different subjects to their feelings of rapport with a virtual agent that used nonverbal cues (e.g. posture mirroring, head nods). Kang et al. found that each personality corresponded to the pursuit of one of three components or rapport: coordination, mutual attentiveness, or positivity (giving each other happy emotions). For example, highly agreeable people correlated highly with self-rated reports of coordination with the virtual agent, whom they were giving instructions to but were told it was human. Overall, agreeableness correlated more significantly with rapport than any other trait in the Big Five Personality group. This leads to the question of which personality traits do and don’t influence rapport. More interestingly, Question 3 asks what the effect of anxiety is, because Kang et al. found conflicting results that higher anxiety people do not benefit any more than low anxiety people from building rapport with an agent (whom they think is human). These questions both fit into Category A because the learner profile in Category A is derived from our surveys that measured the Big Five Personality traits in all our subjects.

Once a week, I will update this blog with challenges I faced in my research. surprising challenges posed by Michael’s rejection of a gender difference question. This week was not too tough but I faced one unexpected challenge (mentioned later). Luckily, I’ve already done research on learning sciences, so my research for this literature review did not have to be so extensive. I knew what topics are best suited for our lab, and what kinds of rabbit holes to avoid (such as topics in learning sciences that are too broad or narrow). One big challenge was not being able to access my previous research for this lab, because the administration needed time to renew my access to its server. This precluded me from accessing my previous work, and building off of it. However, due to my experience at this lab this was not a big setback.

The surprising challenge I faced was the controversy behind my Question 2. After an hour meeting with my research mentor, where we discussed our future plans for this semester, we came to my Question 2. It became immediately apparent that in asking Question 2, we do not attempt to make a hypothesis that stereotypes a gender as being limited to one learning behavior over another. Clearly, Question 2 is controversial due to the involvement of gender. We decided that we would carefully explore the subject if it ever arises, and personally, I would like to modify the question to address speech patterns in middle school students in general, not just girls, and avoid heading in the direction of separating one gender’s behavior from another.

SUMMARY: I read 11 articles in 9 hours, created 3 research questions, and met with my research mentor for another 2 hours to get tips on research, make concept maps, and discuss progress. We agreed that we would meet this Thursday to narrow down the research questions even more, and then hopefully move on to the next stage, which is developing a hypothesis to answer the question.


This website is dedicated to publishing my research in learning sciences, particularly on the topic of how virtual agents can most optimally interact (socially and pedagogically) with children to teach them math. This blog was created in agreement with the CREU program, and I’m grateful for the opportunity to research such a fascinating field, and to be guided by the expertise of the researchers at the renowned ArticuLab at Carnegie Mellon University, particularly my research mentor, Michael Madaio. Thanks also to Dr. Justine Cassell for her guidance and providing the necessary equipment and lab space, as well as for hiring great people! The next post will describe the purpose of this blog in detail, for newcomers to this blog.



Last Update

Hello All,


I am posting this as a reminder that I am no longer involved in the CREU project, and my literature review can be found in my previous post This literature review with accompanying hypotheses was the culmination of a semester’s worth of research and experimental data review.


Thank you,


Alvaro Granados

Annotation Hell


I hope you like my happy title for this post. After developing annotation manuals that formally explain how to code the behaviors we observe in our experimental data, the next task is to annotate the data according to those manuals. For this particular experiment, we decided to code the behavior in Microsoft Excel, but there also exists specialized software that synchronizes each code to a specific time of video and audio, called ELAN. I mention ELAN software because that’s what we typically use, but below I have included a diagram of an annotated Excel spreadsheet to demonstrate what is meant by annotation in this particular round.


The Q’s and R’s above represent “questions” and “responses”. These were annotated using my intelligent colleague Michelina Astle’s annotation manual. If we want to determine whether we are observing a real phenomenon, after having the same file coded by two different people, we delete the utterances and timestamps, and we replace all the codes with numbers, either 1 or 2, and blank spaces are replaced with 0’s. Then we submit this into a website,, which uses statistics (Krippendorf’s Alpha value) to determine whether the phenomenon is real and we’re not just randomly annotating random noise. After we’re sure the phenomenon is real, then we go and annotate many, many files. As you can see, each annotation of a file takes time, typically at elast an hour, because we have to read and think about each utterance and sometimes go back to the manual to determine what the right code is. It’s tedious, but it’s worth it. The product of this hard work is a coded file that is easy to import into the statistical software R Studio, and can be analyzed in a few minutes to determine correlations and whatnot.

I’m describing the annotation process because this is what I’ve spent nearly all my hours last week doing. I hope I was clear in that the process takes longer that it may at first seem to take, but is rewarding.

Putting Theory into Practice

Hello all,

My team and I have decided to put our annotation manuals into practice. My manual that dictates how to code for meta-cognitive behaviors can be found at this link: 

To recap,  my meta-cognitive behavior manual was written based on prior research that suggested that students who are able to recognize that they are in a Stuck state are able to avoid negative emotions (negative affect), such as frustration, and are able to develop more intrinsic motivation due to the fact they develop a strategy to break out of the Stuck state. Evidence of this is that there is a correlation between students who realize they are stuck and those who are exerting more effort into the problem at hand.

For that past week, I used a few hours to continue revising my manual and I made it neater, by adding a table and more sources. I also spent another couple hours using my colleague’s well-written manual, to code for the behaviors she saw as interesting. The behaviors she was focusing on were question-answer pairs, because she is studying reciprocity. I recommend reading her blog about the topic of reciprocity, created for the CREU program, because it is expertly crafted ( Anyway, after coding for both my behaviors and Mimi’s behaviors of interest, my team met for another two hours this week to discuss how we can improve our manuals.

Two major areas that were challenging to improve were that Mimi’s definitions of question-answer pairs was a little vague. For example, though she wanted pairs of questions and answers to be related to the problem being tutored pedagogically, some questions were in a grey zone where they didn’t have enough content or were ambiguous. These edge cases are important to discuss in any application of computer science, and I’ve noticed how important it is to review definitions that I’ve made in order to ensure they are actually applicable to the real world. I think this experience with CREU is helping me understand the difference between a theory created in my head and the real world, which is namely that the real world contains many exceptions and edge cases that even the best theories sometimes cannot account for.


Making Annotation Schemes

Image result for turkey

Hello readers!

Firstly, a quick logistics note. I will be away the rest of this week for Thanksgiving break, so my next blog entry will not contain anything new (perhaps not anything at all). Happy holidays, by the way!

Secondly, I will speak about my work this past week. Last week I said I had finalized my research questions, and so this week my focus was on operationalizing these research questions to create a manual called an “annotation scheme”. The manual contains precise definitions of the behaviors my research questions are about (i.e. meta-cognitive reflection and self-efficacy). These precise definitions are given to diligent “coders” who tirelessly review video and verbal transcripts of conversational data, making a label in the corresponding transcript file each time the behavior or interest pops up. Later, these transcript files are exported to R Studio (statistical software package) for statistical analyses.

I spent about two hours thinking first about what kinds of codes would apply to my research question. My mentor had suggested I code meta-cognitive reflection (MCR), because that appears in my research questions, and this is a good idea, but my codes would need to be more nuanced than just MCR because the whole point of doing research in MCR is to shed light on a previously unexamined trait of MCR. To this end, I considered splitting MCR into three types, meta-cognitive procedural (MCP), meta-cognitive knowledge based (MCK), and meta-affective (MA). Each of these codes is based on the idea that when a student reflects enough on his work in order to become aware that he is in a state of Stuck (no productivity, no learning, waste of energy), he can break out of it. If he is encouraged enough, he may be able to develop his own unique internal mechanism to break out of Stuck. My research question (RQ1a) asked whether rapport could encourage this kind of self-reflection.

Today,  I spent an hour and a half in a meeting with my mentor and my CREU-comrade. We discussed whether our codes were unambiguous, discrete categories that serve to answer our research questions. This meeting persuaded me to get rid of my MA code, because affective state was not explicitly stated in my research question. Additionally, my mentor suggested I change the organization of my manual to make it more clear, and provide more elaborate definitions of the motivation behind each of the codes (how they answer my research questions).

I will put a Google Docs link for my annotation manual a.s.a.p — currently, Google Docs is not allowing me to edit an upladed Microsoft Word document. But stay tuned!

UPDATE: here is the link:

Finalizing my research questions

So last week I was busy finalizing my literature review. After speaking with my mentor for an hour and meeting with my CREU-colleague Michelina Astle, who writes great literature reviews, I realized that I was using the wrong name to refer to certain variables. For example, I was using the term Self Disclosure to refer to a student who admits his own misunderstanding of material, but this is really a metacognitive reflection (a reflection made by himself on his own performance that is not simple logic, but rather metacognitive). After clearing this up, and reworking my literaure review, I finalized my research questions using the new terminology.

I have enjoyed exploring the field of virtual agent interaction with humans and the rapport maintenance mechanisms involved in these interactions. Though this topic is more psychological than computational, I find it fascinating, and I never forget that the ultimate goal of this project, even beyond the CREU program, is to develop a socially-sensitive virtual agent. Specifically, the questions I have generated each week reflect this reality; I cannot explore psychological topics that are too complex because our lab would never be able to program those into the virtual agent. For example, our virtual agent will never be a robot, it will always be an image on a screen, so it makes no sense to research interpersonal distance interacting with rapport. One always has to keep such practical  limitations in mind when researching, and I think this is a valuable lesson for me to keep in mind as I progress to graduate school. This lesson is more specific to me than just “you can’t do everything at once”. For example, there may be many, many variables that are very much relevant to building rapport, but if your agent has limitations, like the inability to make hundreds of facial expressions, then researching these variables is simply a distraction. I have learned that even topics that seem relevant may be distractions.

On another note, I also learned about technical difficulties in collecting data from social interactions. As Micky Chi once stated in “Analyzing Verbal Data”, conversation and social interaction data tend to be voluminous, containing many utterances; just one hour of data can take many more hours to parse. Additionally, one can have technical difficulties, as our lab had, where the camera frame rate dropped sporadically and ruined the video data we collected.

Perhaps the most important use of my time was the meeting I had with my CREU colleague. She is very adept at locating the main point of my literature review and seeing the moments when I veer off track. Her feedback helped me see the moments when I needed to ask myself more often “why am I claiming this? With what basis?”

To summarize my most recent (and final) literature review process iteration, I essentially found literature supporting the idea that metacognitive self reflection, or self-awareness, is important to motivating students to work through a task. This somewhat answers the question you may remember I used to have, “when is rapport becoming too abundant to be productive?” Motivated students can stay on task longer. In fact, there is a cycle students go through from exploration, to grappling with confusion, to finally solving a problem and wanting to solve more, that becomes interrupted when a student gets “stuck” in a failure state (a state of confusion and frustration). If the student becomes self aware that he is in this state he may be able to motivate himself to get out of it, and this is the idea behind affect-adaptive virtual agents, which try to detect this failure state via proxy measures like skin conductance (measuring excitement about the topic) and smiling behavior (measuring enjoyment obviously). However, I hypothesize that another way to nurture metacognitive self reflective behavior in students, to make them awawre that they’re in this state, is not just by creating an agent that can respond to frustration and curiosity, but one that builds rapport with the user. Note that many behaviors associated with rapport maintenance, such as referring to a shared experience (“Are you bored? Yes? I’m bored too”), are the same behavior the affect-adaptive agents are using, but the developers of these agents have not delved into the “relational effect” (Burleson and Picard 2004) of having a virtual tutor interacting with a human. In other words, they have not explicitly tried to employ rapport-building strategies to encourage metacognitive self reflection. Thus, I ask in my review, are metacognitive self reflective behaviors increased by rapport? Does this interaction differ between a virtual agent-human and human-human dyad?

My literature review is here:

Works Cited:

Burleson, W., Picard, R. (2004). Affective agents: Sustaining Motivation to Learn Through Failure and a State of “Stuck”. In: Workshop of Social and Emotional Intelligence in Learning Environments, in conjunction with the 7th International Conference on Intelligent Tutoring Systems

Finding my research questions

Hello readers,

If you’ve been reading my last three or four posts, you’ll know I’ve been slightly changing research topics in the hopes of ascertaining (nice word, huh?) a solid foundation for a literature review, which is robust enough to generate research questions. I have good news for the eager reader; I have come up with research questions. The literature review that led to these questions needs some more work, however, as I attempt to edit it to express my ideas more clearly. However, the research questions and corresponding hypotheses are of a good quality. The entire document of my new lit review draft can be found at this link:

Now I will describe my latest week of research. I spent approximately 4 hours trying to do video thin slicing, which is important because the way our lab doe rapport ratings on videos is by slicing the video files into 30 second segments, then randomizing these segments and presenting them out of order to a rapport rater. It is important the videos be out of order because that prevents a rapport rater from subliminally assuming an automatic increase in rapport as time goes on, rather than solely rating the most immediate rapport that can be inferred. Anyway, to do this, a python script was used, that contained a loop that iterated through the file names of videos in a folder. The purpose of this script was to generate another script that is a batch file, that actually does the thin slicing. I still need to install a program called ffmpeg before running the actual batch file, which will use this program (ffmpeg). Luckily, it’s a free program! What I found most difficult about this was trying to install ffmpeg. This became tricky because the installation file was in a folder that was zipped multiple times. Surprisingly, understanding the python script was not difficult, especially because I’ve taken a course in C programming (although admittedly, python is a much higher level language). I understood the syntax and function definitions of the python script relatively well (relative, say, to the nonexistent installation instructions for ffmpeg).

Aside from this, I spent an hour meeting with my research mentor who helped me very much in crafting research questions. Fortunately for me, I now feel these research questions have been solidified and are very very close to being finalized, thanks to my mentor’s help.  Again, the link to my latest lit review is in the first paragraph of this post.


Alvaro Granados


Woah, We’re Halfway There: More Research and Writing

My past week has been spent between writing a second draft of my literature review and also meeting with my research mentor. First, I’ll describe my meeting because it gives some clues as to the direction of this project. During my meeting my mentor and I (and the other CREU student) discussed what information we most need from the newly collected experimental data (Called WoZ 2017 data) and how we should process the data. The processing of the data is somewhat confusing because we have to program a computer to segment hours of videos into short duration slices, with each about 30 seconds long. These slices will then be rated by my partner and I for rapport (using a 1 to 7 Likert scale) and we will test also our inter-rater agreement. This will culminate with our ability to use this rapport-rated data to test our hypotheses.

That being said, I will now digress to my new literature review draft and new hypotheses.  Last week my goal was to connect the varaibles self disclosure, on task behavior, and rapport, which you readers have heard about now a few times. Well, I hope I have done that in the last paragraph of this new literature review draft. While it may be ineffective for me to try to measure the self efficacy of a student, our data contains information on the number of attempts a student has used to answer a question and the duration of time between these attempts, which can all be used as a measure of on task behavior. My new hypotheses try to connect self disclosure, which is essential for the production of productive learning interactions between a tutor and tutee, to the amount of on task behavior. If my hypotheses are not rejected, it would help clarify some details on yet more aspects of rapport than motivate on task behavior.

This is my new literature review:

“Are we there yet?” Progress on my third Lit Review Draft

This past week I spoke to my research mentor Michael Madaio and he suggested to me that I am getting more on topic with my previous lit review draft. What exactly does on topic mean? Well, we are focusing on four variables: self disclosure, rapport, problem solving efficiency, and self efficacy. I have to connect these with my new literature review draft, and that is what I have tried to do. The draft is at this link (keep in mind this is a very rough draft that only serves for me to connect my ideas loosely, more reading and meeting with my team is needed):

For those wondering how this connects to the original CREU project proposal, the connection between self efficacy and problem solving and rapport may answer more questions as to how personal measures of students (e.g. shyness, extrovertedness) may influence the rapport the student creates with the virtual agent (research proposal goal number 1). Thus, I am on topic and focused (although my head is spinning from so much reading and thinking about gaps).

What follows is a summary of my literature review draft, and connections I’m trying to isolate.

My main goal is to elucidate at what point rapport becomes more destructive to learning gains than beneficial. For example, in our data we have observed that sometimes friends with too high a rapport can have lower learning gains because they don’t focus on the problems. So what is the behavior of rapport that most directly leads to learning gains? And whch behaviors can lead students astray?

To this end, I believe there is a connection between self efficacy and self disclosure. Research shows that the act of disclosing personal information not only deepens a bond between friends (thereby increasing intimacy and encouraging fearless questioning behavior that leads to learning gains), but the very act of publicly admitting one’s limits (e.g. “I can’t understand this problem at all, I’m very afraid to answer.”) reduces anxiety. When I say anxiety, I don’t mean clinically relevant (severe) trait-based anxiety, I mean the small anxieties psychologically healthy people experience when choosing to respond with one answer or another during a test (during a tutoring session). If this is true, then there is some connection between self disclosure and confidence in learning and being questioned, and confidence (self efficacy) in learning is one of the best predictors of learning gains. Thus, self disclosure may be very important to increasing learning gains. however, my mentor brought up the problem that our measures of self efficacy are insufficient to describe self efficacy variations across an entire session. Thus, I need to ook at self efficacy by using indirect measures, and possibly I may have to do the same with self disclosure.




Patience is a virtue: Re-reworking my literature review

Image result for patience

The process to creating a good literature review is iterative. We write a review of prior research in order to find gaps in the field, then we may find we need some more literature to link our ideas more clearly, or we may find we need to change our research direction slightly. To that end, I changed my research direction (slightly) that was originally about anxiety during tutoring and now will investigate self disclosure and its effects on self efficacy. Though I’m changing directions, my research is moving forward because my research on anxiety inspired me to choose my new direction.

To summarize last week’s efforts:

  • Changed research direction, re-wrote literature review to make a second rough draft, and did relevant research
  • Reviewed more videos to remind myself of what kinds of behaviors we observed (to keep my questions well informed by our data)
  • Tried (and failed) to fix bug on our data collecting computer (Windows OS on Mac hardware). The bug had resulted in severely diminished frame rates for the videos we recorded. We may get new software for our future data collection.
  • Met with research mentor to discuss why my literature review rough draft (second attempt, after changing my research direction) contains ideas that are poorly connected, and why my research gap that I identified was really a menagerie of many, many smaller research gaps (and poorly organized in paragraph form).

Due to the fact my most recent draft of my literature review is not focused (its ideas are not well connected) enough for me to consider it worthy of posting here, I will instead summarize it briefly and post the fixed version next week (this is just a “quick” summary!!!). The summary is at this link: