
Overview of visual modalities per actor: top tile shows talking-head videos; bottom tiles (left-to-right) show personalized avatars, full-body videos, and real-time reconstructable volumetric avatars.
Abstract
The Audiovisual Multimodal Interaction Suite (AMIS) is an open-source dataset and accompanying Unity-based demo implementation designed to aid research on immersive media communication and social XR environments. AMIS features synchronized audiovisual recordings of three actors performing monologues and participating in dyadic conversations across four modalities: talking-head videos, full-body videos, volumetric avatars, and personalized animated avatars. These recordings can be used to simulate scenarios such as traditional video conferences or XR meetings with 3D avatars in controlled and replicable environments.
The limitations of existing datasets, which include a restricted number of audiovisual formats, a narrow application focus, and suboptimal inclusion of verbal and non-verbal cues, are addressed by AMIS. With AMIS Studio, a Unity-based demonstrator, researchers can explore the recordings and compare the different audiovisual formats in VR scenes. This paper outlines the creation of AMIS, its design considerations, and how it may be applied in interdisciplinary domains, including cognitive psychology, audiovisual quality assessment, and social XR research.
Publication
Bhattacharya, A., de Souza Cardoso, L. F., Schleising, A., Rendle, G., Kreskowski, A., Immohr, F., Broll, W., Ramachandra Rao, R. R., Raake, A.
AMIS: An Audiovisual Dataset for Multimodal XR Research
To be presented at the 2025 ACM Multimedia Systems Conference (MMSys'25), Stellenbosch, South Africa. DOI: 10.1145/3712676.3718344.
[preprint]