The goal is to engineer a system hat can guide users through unfamiliar tasks. To achieve it, I have developed models that, given a procedure such as a recipe or medical protocol, can interpret human behavior from an egocentric perspective, identifying objects, and deducing steps.
Given a recipe, augmented reality allows us to sense objects in specific states, infer user actions, and provide critical guidance to perform a task. |
Through this project I have collaborated with the company Raytheon/BBN, who use my multimodal models that harmoniously integrate video, audio, and object detections to infer steps in distinct medical procedures. This system has been evaluated by MIT Lincoln Labs, and it is able to guide paramedics through medical tasks such as applying a tourniquet. Working closely with BBN we have optimized the model integration in an AR system that can offer real-time feedback to users on potential errors.
My multimodal deep learning model for step detection in medical procedures. It uses audio-visual representations of actions, objects, and sounds to infer user progress through a task. |
© 2024 Iran R. Roman