These are the prompts and qrels used for the experiments in Thomas et al., "System Comparison using Automated Generation of Relevance Judgements in Multiple Languages", SIGIR 2025. Github for Data ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results