D4 - Remote Behavioral Assessment and Job Coaching via Video and Motion Technology

Task Leaders: Michael McCue, Ph.D. and Jessica Hodgins, Ph.D.

Co-Investigators: Edmund LoPresti, Ph.D., Adam Bargteil, Ph.D. and Andrea Fairman, OTR/L

Other participants: Robotics Institute, Carnegie Mellon University, PA Office of Vocational Rehabilitation: Hiram G. Andrews Rehabilitation Center, AT Sciences, LLC

Objectives of task

Background / Rationale

Individuals with cognitive disability resulting from conditions such as brain injury and autism often present unique challenges in education and vocational rehabilitation. Persons with these cognitive disabilities experience a complex array of functional limitations that impact their ability to perform effectively in education, vocational training and employment settings.

Traditional approaches to cognitive rehabilitation adopted "brain-train" paradigms. Research results indicate that while producing positive results in the clinic or laboratory, these approaches did not generalize to the working and learning settings of the individual, and were therefore ineffective in addressing the impact of the cognitive disability. Effective rehabilitation of persons with cognitive disability involves the application of specific strategies, accommodations and supports that are delivered in the natural environment where consumers encounter obstacles in functioning. This "in vivo" cognitive rehabilitation intervention has been effectively applied in supported employment, a vocational rehabilitation approach that provides specific supports to an individual with a disability in the workplace through a "job coach." The job coach provides direct interventions that address work problems including difficulty recalling task instructions and sequencing problems. Job coaches address these problems by monitoring client performance and providing cueing responses to address initiation and inattention problems and specific task instruction to remediate problems in memory, learning and sequencing.

In order for supported employment approaches to be effective, the job coach must provide direct service to the consumer at the work-site. Unfortunately, "in vivo" intervention is costly and often requires a full-time, indefinite period of one-to-one intervention. As a result staffing is expensive and services are often not readily available to vocational rehabilitation consumers. In addition, support is typically needed on an episodic, as needed basis rather than regularly scheduled, extended times. This makes it difficult to be responsive to the needs of the consumer without having the job coach present or locally "on call." Moreover, placing the job coach in the work environment is often problematic because it draws attention to the consumer (and their disability) and may not be enthusiastically accepted by other individuals in t he environment (for example, an employer or co-worker).

Project Description

A system is being developed to provide guidance in vocational tasks through motion-based activity recognition. The system is trained to identify individual actions in a job-related activity based on motion-sensing technology (accelerometers). The automated system recognizes whether the actions are accurate and performed in the correct order. The system is trained to recognize the component actions of a task; so that even if a consumer performs actions in the wrong order, the system will recognize the individual actions and be able to compare them to the correct task order. The hypothesis that we intend to test is that remote observation of activity performance using accelerometers will provide information regarding the performance of individuals with brain injuries which is comparable to direct observation as is generally provided in traditional job coaching services.
If successful, this motion-based activity recognition could contribute to a model of telerehabilitation-based "in vivo" supports with consumers who have cognitive disabilities that present obstacles to functioning in a work environment. The data will be used to assess the subject's performance and the frequency and duration of symptomatic behavior (e.g., inattention, repetitive behaviors, and task sequence problems). Once the reliability and validity of the assessment phase are determined, we will be able to automatically deliver specific task guidance and cues/prompts to the subject in response to the occurrence of specific problem behaviors (e.g., inattention -> auditory prompt via headphones to attend, or task sequence error -> task instruction via headphone to repeat step in proper sequence). The system could also maintain a record of a client's task performance for later and/or remote review by a live job coach; informing future live interventions. The setup for such a system is shown in the following figure:

D4 graphic 1

The application of the technology was informed by a series of focus group sessions with job coaches, employers, and consumers representing a variety of work settings, along with observations at industrial and food service work sites. Based on these observations, we identified food preparation as a domain in which many people requiring job coaching are employed and which has tasks for which this technology might be appropriate. More specifically, we identified the task of grilling hamburgers in a fast food setting as an initial application. In employment settings this task requires a long series of actions which must be performed in sequence to create a high quality product.

In prior work, members of our team used a motion database to train learning algorithms to reproduce high quality motion from low quality, inexpensive, and easily installed sensors. We built on this insight to develop a mechanism for collecting qualitative data on task performances of consumers at work. We initially implemented motion tracking using video. Simple vision processing provides partial information about the user's movements and domain knowledge from a previously captured motion database supplements the information from the interface to allow plausible movement to be inferred. The process can be performed interactively, with less than a second of delay between the capture of the video and the rendering of the animated motion.
This hamburger grilling task was replicated in a motion capture lab at Carnegie Mellon University. Video of able-bodied individuals was captured while performing the hamburger grilling task. A machine learning technique (AdaBoost) was then used to identify features in the person's movement pattern by which the system could automatically recognize components of the task (e.g. flipping burgers, salting burgers, placing burgers on or taking burgers off the grill). We then replicated the model using data from accelerometers instead of cameras and found similar accuracy in our activity recognition data. Given that accelerometers are potentially more portable and have fewer privacy concerns than cameras, accelerometers were the focus of further work in this project.


The initial work has yielded a model which is able to correctly identify component tasks for three members of our research team without cognitive disabilities.

D4 graphic 2

D4 graphic 3

Recent activities include:

  1. Testing the model's ability to recognize movement patterns for additional members of the research team without disabilities
  2. Testing the model's ability to recognize movement patterns for individuals with cognitive impairments;
  3. Processing the activity recognition output to determine when the task is performed correctly and when errors are being made (e.g. the user pressed the hamburger when he should be salting, which should be recognized as an error).

Adding additional data to the model should increase the accuracy of the system. However, the more variety there is in the movement patterns used to perform the task, the more data from a variety of people will be needed to develop the model. Testing the model on additional able-bodied subjects has initially resulted in a decrease in the accuracy of the system's ability to accurately recognize task actions. We have observed that persons tend to have various "styles" of performing the actions, yet are still able to complete the task to achieve the desired outcome.

D4 graphic 4

Testing of this system has also been completed with three persons diagnosed with traumatic brain injury. Motion patterns involved with completing the individual actions in the task of hamburger grilling (placing, flipping, salting, etc.) vary widely from person to person but typically do not vary enough within individuals. Therefore, when using a model developed for other people with different movement patterns, actions may be incorrectly recognized. For example, an (e.g. salting) may go unrecognized, or an action (e.g. flip) may be mistaken for a different action (e.g. picking up the hamburger).

It appears that the actions of hamburger grilling are not distinct enough for recognition to be robust across variations in individual performance. Future applications of this technology may be more successful if utilized to monitor tasks which require larger, gross movement patterns. We also recognize that the motion sensing technology alone would be inadequate for the task of hamburger grilling even if we were able to detect movement patterns with 100% accuracy due to safety concerns that cannot be monitored through motion recognition alone. We are planning to develop a more transparent and portable interface for job coaching. Remote motion recognition is being considered for other telerehabilitation applications.


Return to RERC TR 2