For me, a dataset is a common name used to talk about data that come from the same origin (are in the same file, the same database, etc.) while a data set is a more general set of data. Dataset designate the common source of data.
I am unsure for noun 'dataset', when should we use perp. in and when use on or in and on both are exchangable, no essential difference? For an example, we can say: 1. We run a comparative experiment on the whole dataset. 2. We run a comparative experiment in the whole dataset.