Last Update

Hello All,


I am posting this as a reminder that I am no longer involved in the CREU project, and my literature review can be found in my previous post This literature review with accompanying hypotheses was the culmination of a semester’s worth of research and experimental data review.


Thank you,


Alvaro Granados

Annotation Hell


I hope you like my happy title for this post. After developing annotation manuals that formally explain how to code the behaviors we observe in our experimental data, the next task is to annotate the data according to those manuals. For this particular experiment, we decided to code the behavior in Microsoft Excel, but there also exists specialized software that synchronizes each code to a specific time of video and audio, called ELAN. I mention ELAN software because that’s what we typically use, but below I have included a diagram of an annotated Excel spreadsheet to demonstrate what is meant by annotation in this particular round.


The Q’s and R’s above represent “questions” and “responses”. These were annotated using my intelligent colleague Michelina Astle’s annotation manual. If we want to determine whether we are observing a real phenomenon, after having the same file coded by two different people, we delete the utterances and timestamps, and we replace all the codes with numbers, either 1 or 2, and blank spaces are replaced with 0’s. Then we submit this into a website,, which uses statistics (Krippendorf’s Alpha value) to determine whether the phenomenon is real and we’re not just randomly annotating random noise. After we’re sure the phenomenon is real, then we go and annotate many, many files. As you can see, each annotation of a file takes time, typically at elast an hour, because we have to read and think about each utterance and sometimes go back to the manual to determine what the right code is. It’s tedious, but it’s worth it. The product of this hard work is a coded file that is easy to import into the statistical software R Studio, and can be analyzed in a few minutes to determine correlations and whatnot.

I’m describing the annotation process because this is what I’ve spent nearly all my hours last week doing. I hope I was clear in that the process takes longer that it may at first seem to take, but is rewarding.

Putting Theory into Practice

Hello all,

My team and I have decided to put our annotation manuals into practice. My manual that dictates how to code for meta-cognitive behaviors can be found at this link: 

To recap,  my meta-cognitive behavior manual was written based on prior research that suggested that students who are able to recognize that they are in a Stuck state are able to avoid negative emotions (negative affect), such as frustration, and are able to develop more intrinsic motivation due to the fact they develop a strategy to break out of the Stuck state. Evidence of this is that there is a correlation between students who realize they are stuck and those who are exerting more effort into the problem at hand.

For that past week, I used a few hours to continue revising my manual and I made it neater, by adding a table and more sources. I also spent another couple hours using my colleague’s well-written manual, to code for the behaviors she saw as interesting. The behaviors she was focusing on were question-answer pairs, because she is studying reciprocity. I recommend reading her blog about the topic of reciprocity, created for the CREU program, because it is expertly crafted ( Anyway, after coding for both my behaviors and Mimi’s behaviors of interest, my team met for another two hours this week to discuss how we can improve our manuals.

Two major areas that were challenging to improve were that Mimi’s definitions of question-answer pairs was a little vague. For example, though she wanted pairs of questions and answers to be related to the problem being tutored pedagogically, some questions were in a grey zone where they didn’t have enough content or were ambiguous. These edge cases are important to discuss in any application of computer science, and I’ve noticed how important it is to review definitions that I’ve made in order to ensure they are actually applicable to the real world. I think this experience with CREU is helping me understand the difference between a theory created in my head and the real world, which is namely that the real world contains many exceptions and edge cases that even the best theories sometimes cannot account for.


Making Annotation Schemes

Image result for turkey

Hello readers!

Firstly, a quick logistics note. I will be away the rest of this week for Thanksgiving break, so my next blog entry will not contain anything new (perhaps not anything at all). Happy holidays, by the way!

Secondly, I will speak about my work this past week. Last week I said I had finalized my research questions, and so this week my focus was on operationalizing these research questions to create a manual called an “annotation scheme”. The manual contains precise definitions of the behaviors my research questions are about (i.e. meta-cognitive reflection and self-efficacy). These precise definitions are given to diligent “coders” who tirelessly review video and verbal transcripts of conversational data, making a label in the corresponding transcript file each time the behavior or interest pops up. Later, these transcript files are exported to R Studio (statistical software package) for statistical analyses.

I spent about two hours thinking first about what kinds of codes would apply to my research question. My mentor had suggested I code meta-cognitive reflection (MCR), because that appears in my research questions, and this is a good idea, but my codes would need to be more nuanced than just MCR because the whole point of doing research in MCR is to shed light on a previously unexamined trait of MCR. To this end, I considered splitting MCR into three types, meta-cognitive procedural (MCP), meta-cognitive knowledge based (MCK), and meta-affective (MA). Each of these codes is based on the idea that when a student reflects enough on his work in order to become aware that he is in a state of Stuck (no productivity, no learning, waste of energy), he can break out of it. If he is encouraged enough, he may be able to develop his own unique internal mechanism to break out of Stuck. My research question (RQ1a) asked whether rapport could encourage this kind of self-reflection.

Today,  I spent an hour and a half in a meeting with my mentor and my CREU-comrade. We discussed whether our codes were unambiguous, discrete categories that serve to answer our research questions. This meeting persuaded me to get rid of my MA code, because affective state was not explicitly stated in my research question. Additionally, my mentor suggested I change the organization of my manual to make it more clear, and provide more elaborate definitions of the motivation behind each of the codes (how they answer my research questions).

I will put a Google Docs link for my annotation manual a.s.a.p — currently, Google Docs is not allowing me to edit an upladed Microsoft Word document. But stay tuned!

UPDATE: here is the link:

Finalizing my research questions

So last week I was busy finalizing my literature review. After speaking with my mentor for an hour and meeting with my CREU-colleague Michelina Astle, who writes great literature reviews, I realized that I was using the wrong name to refer to certain variables. For example, I was using the term Self Disclosure to refer to a student who admits his own misunderstanding of material, but this is really a metacognitive reflection (a reflection made by himself on his own performance that is not simple logic, but rather metacognitive). After clearing this up, and reworking my literaure review, I finalized my research questions using the new terminology.

I have enjoyed exploring the field of virtual agent interaction with humans and the rapport maintenance mechanisms involved in these interactions. Though this topic is more psychological than computational, I find it fascinating, and I never forget that the ultimate goal of this project, even beyond the CREU program, is to develop a socially-sensitive virtual agent. Specifically, the questions I have generated each week reflect this reality; I cannot explore psychological topics that are too complex because our lab would never be able to program those into the virtual agent. For example, our virtual agent will never be a robot, it will always be an image on a screen, so it makes no sense to research interpersonal distance interacting with rapport. One always has to keep such practical  limitations in mind when researching, and I think this is a valuable lesson for me to keep in mind as I progress to graduate school. This lesson is more specific to me than just “you can’t do everything at once”. For example, there may be many, many variables that are very much relevant to building rapport, but if your agent has limitations, like the inability to make hundreds of facial expressions, then researching these variables is simply a distraction. I have learned that even topics that seem relevant may be distractions.

On another note, I also learned about technical difficulties in collecting data from social interactions. As Micky Chi once stated in “Analyzing Verbal Data”, conversation and social interaction data tend to be voluminous, containing many utterances; just one hour of data can take many more hours to parse. Additionally, one can have technical difficulties, as our lab had, where the camera frame rate dropped sporadically and ruined the video data we collected.

Perhaps the most important use of my time was the meeting I had with my CREU colleague. She is very adept at locating the main point of my literature review and seeing the moments when I veer off track. Her feedback helped me see the moments when I needed to ask myself more often “why am I claiming this? With what basis?”

To summarize my most recent (and final) literature review process iteration, I essentially found literature supporting the idea that metacognitive self reflection, or self-awareness, is important to motivating students to work through a task. This somewhat answers the question you may remember I used to have, “when is rapport becoming too abundant to be productive?” Motivated students can stay on task longer. In fact, there is a cycle students go through from exploration, to grappling with confusion, to finally solving a problem and wanting to solve more, that becomes interrupted when a student gets “stuck” in a failure state (a state of confusion and frustration). If the student becomes self aware that he is in this state he may be able to motivate himself to get out of it, and this is the idea behind affect-adaptive virtual agents, which try to detect this failure state via proxy measures like skin conductance (measuring excitement about the topic) and smiling behavior (measuring enjoyment obviously). However, I hypothesize that another way to nurture metacognitive self reflective behavior in students, to make them awawre that they’re in this state, is not just by creating an agent that can respond to frustration and curiosity, but one that builds rapport with the user. Note that many behaviors associated with rapport maintenance, such as referring to a shared experience (“Are you bored? Yes? I’m bored too”), are the same behavior the affect-adaptive agents are using, but the developers of these agents have not delved into the “relational effect” (Burleson and Picard 2004) of having a virtual tutor interacting with a human. In other words, they have not explicitly tried to employ rapport-building strategies to encourage metacognitive self reflection. Thus, I ask in my review, are metacognitive self reflective behaviors increased by rapport? Does this interaction differ between a virtual agent-human and human-human dyad?

My literature review is here:

Works Cited:

Burleson, W., Picard, R. (2004). Affective agents: Sustaining Motivation to Learn Through Failure and a State of “Stuck”. In: Workshop of Social and Emotional Intelligence in Learning Environments, in conjunction with the 7th International Conference on Intelligent Tutoring Systems

Finding my research questions

Hello readers,

If you’ve been reading my last three or four posts, you’ll know I’ve been slightly changing research topics in the hopes of ascertaining (nice word, huh?) a solid foundation for a literature review, which is robust enough to generate research questions. I have good news for the eager reader; I have come up with research questions. The literature review that led to these questions needs some more work, however, as I attempt to edit it to express my ideas more clearly. However, the research questions and corresponding hypotheses are of a good quality. The entire document of my new lit review draft can be found at this link:

Now I will describe my latest week of research. I spent approximately 4 hours trying to do video thin slicing, which is important because the way our lab doe rapport ratings on videos is by slicing the video files into 30 second segments, then randomizing these segments and presenting them out of order to a rapport rater. It is important the videos be out of order because that prevents a rapport rater from subliminally assuming an automatic increase in rapport as time goes on, rather than solely rating the most immediate rapport that can be inferred. Anyway, to do this, a python script was used, that contained a loop that iterated through the file names of videos in a folder. The purpose of this script was to generate another script that is a batch file, that actually does the thin slicing. I still need to install a program called ffmpeg before running the actual batch file, which will use this program (ffmpeg). Luckily, it’s a free program! What I found most difficult about this was trying to install ffmpeg. This became tricky because the installation file was in a folder that was zipped multiple times. Surprisingly, understanding the python script was not difficult, especially because I’ve taken a course in C programming (although admittedly, python is a much higher level language). I understood the syntax and function definitions of the python script relatively well (relative, say, to the nonexistent installation instructions for ffmpeg).

Aside from this, I spent an hour meeting with my research mentor who helped me very much in crafting research questions. Fortunately for me, I now feel these research questions have been solidified and are very very close to being finalized, thanks to my mentor’s help.  Again, the link to my latest lit review is in the first paragraph of this post.


Alvaro Granados


Woah, We’re Halfway There: More Research and Writing

My past week has been spent between writing a second draft of my literature review and also meeting with my research mentor. First, I’ll describe my meeting because it gives some clues as to the direction of this project. During my meeting my mentor and I (and the other CREU student) discussed what information we most need from the newly collected experimental data (Called WoZ 2017 data) and how we should process the data. The processing of the data is somewhat confusing because we have to program a computer to segment hours of videos into short duration slices, with each about 30 seconds long. These slices will then be rated by my partner and I for rapport (using a 1 to 7 Likert scale) and we will test also our inter-rater agreement. This will culminate with our ability to use this rapport-rated data to test our hypotheses.

That being said, I will now digress to my new literature review draft and new hypotheses.  Last week my goal was to connect the varaibles self disclosure, on task behavior, and rapport, which you readers have heard about now a few times. Well, I hope I have done that in the last paragraph of this new literature review draft. While it may be ineffective for me to try to measure the self efficacy of a student, our data contains information on the number of attempts a student has used to answer a question and the duration of time between these attempts, which can all be used as a measure of on task behavior. My new hypotheses try to connect self disclosure, which is essential for the production of productive learning interactions between a tutor and tutee, to the amount of on task behavior. If my hypotheses are not rejected, it would help clarify some details on yet more aspects of rapport than motivate on task behavior.

This is my new literature review:

“Are we there yet?” Progress on my third Lit Review Draft

This past week I spoke to my research mentor Michael Madaio and he suggested to me that I am getting more on topic with my previous lit review draft. What exactly does on topic mean? Well, we are focusing on four variables: self disclosure, rapport, problem solving efficiency, and self efficacy. I have to connect these with my new literature review draft, and that is what I have tried to do. The draft is at this link (keep in mind this is a very rough draft that only serves for me to connect my ideas loosely, more reading and meeting with my team is needed):

For those wondering how this connects to the original CREU project proposal, the connection between self efficacy and problem solving and rapport may answer more questions as to how personal measures of students (e.g. shyness, extrovertedness) may influence the rapport the student creates with the virtual agent (research proposal goal number 1). Thus, I am on topic and focused (although my head is spinning from so much reading and thinking about gaps).

What follows is a summary of my literature review draft, and connections I’m trying to isolate.

My main goal is to elucidate at what point rapport becomes more destructive to learning gains than beneficial. For example, in our data we have observed that sometimes friends with too high a rapport can have lower learning gains because they don’t focus on the problems. So what is the behavior of rapport that most directly leads to learning gains? And whch behaviors can lead students astray?

To this end, I believe there is a connection between self efficacy and self disclosure. Research shows that the act of disclosing personal information not only deepens a bond between friends (thereby increasing intimacy and encouraging fearless questioning behavior that leads to learning gains), but the very act of publicly admitting one’s limits (e.g. “I can’t understand this problem at all, I’m very afraid to answer.”) reduces anxiety. When I say anxiety, I don’t mean clinically relevant (severe) trait-based anxiety, I mean the small anxieties psychologically healthy people experience when choosing to respond with one answer or another during a test (during a tutoring session). If this is true, then there is some connection between self disclosure and confidence in learning and being questioned, and confidence (self efficacy) in learning is one of the best predictors of learning gains. Thus, self disclosure may be very important to increasing learning gains. however, my mentor brought up the problem that our measures of self efficacy are insufficient to describe self efficacy variations across an entire session. Thus, I need to ook at self efficacy by using indirect measures, and possibly I may have to do the same with self disclosure.




Patience is a virtue: Re-reworking my literature review

Image result for patience

The process to creating a good literature review is iterative. We write a review of prior research in order to find gaps in the field, then we may find we need some more literature to link our ideas more clearly, or we may find we need to change our research direction slightly. To that end, I changed my research direction (slightly) that was originally about anxiety during tutoring and now will investigate self disclosure and its effects on self efficacy. Though I’m changing directions, my research is moving forward because my research on anxiety inspired me to choose my new direction.

To summarize last week’s efforts:

  • Changed research direction, re-wrote literature review to make a second rough draft, and did relevant research
  • Reviewed more videos to remind myself of what kinds of behaviors we observed (to keep my questions well informed by our data)
  • Tried (and failed) to fix bug on our data collecting computer (Windows OS on Mac hardware). The bug had resulted in severely diminished frame rates for the videos we recorded. We may get new software for our future data collection.
  • Met with research mentor to discuss why my literature review rough draft (second attempt, after changing my research direction) contains ideas that are poorly connected, and why my research gap that I identified was really a menagerie of many, many smaller research gaps (and poorly organized in paragraph form).

Due to the fact my most recent draft of my literature review is not focused (its ideas are not well connected) enough for me to consider it worthy of posting here, I will instead summarize it briefly and post the fixed version next week (this is just a “quick” summary!!!). The summary is at this link: