progress is imminent when we share with each other.
I investigate the intricacies of human perception and action and translate these insights into machine perception models. As a postdoctoral scholar at New York University’s Tandon School of Engineering and the Music and Audio Research Laboratory, I explore the nexus between human behavior and AI. With a foundation in biology, signal processing, and linguistics, I challenge conventional views of human behavior to discover mechanisms that can inform novel insights for AI models.
My research has particularly looked at human actions with rhythmic events. Traditional Bayesian models fall short in capturing this, leading me to propose models anchored in “strong anticipation”. These non-linear oscillator-based models illuminate the behavioral nuances between musicians and non-musicians, a contribution recognized by Stanford’s Human-Centered Artificial Intelligence award.
In the realm of machine learning, I aim to enhance models for spatial perception by harnessing acoustic imaging and computer vision. I’ve advanced models for sound source distance and direction of arrival estimation. More recently I have been developing new models for multimodal machine perception that can recognize and track object states in dynamic scenes. My contributions extend to open-source projects like librosa
, soundata
, mirdata
, and micarraylib
. Parallel to academia, my research has found applications in product development at companies such as Apple, Tesla, Raytheon/BBN, and Plantronics.
© 2023 Iran R. Roman