Details on Evaluation in the Engagement Task
The test set evaluation phase is approaching! We would like to raise your awareness to the following points on the
Multi-domain Engagement Estimation task.
(1) We have clarified the precise computation rule for the score that is used in the overall ranking. In particular, the
overall performance of a team will be evaluated by a weighted average of performances across test datasets. PInSoRo will
receive a weight of 1/3 (as there are child-child and child-robot interactions), the four other datasets a weight of 1/6
each.
(2) Please report extensive ablation experiments on the provided validation sets, as the number of test set evaluations
are limited.
(3) In addition to overall score, please report the separate scores for individual datasets. In the case of Pinsoro,
please report separate numbers for child-child and child-robot interactions, as well as for social- and task engagement.
(4) For comparable validation set evaluations, please use the evaluation code provided at
https://github.com/hcmlab/MultiMediate26.