Output comparisons between different model versions are ubiquitous during climate model development, model tuning and while performing numerical experiments. As such, statistical tests of significance are fundamental tools to help us judge the merits of a given model modification. Despite that RCM studies are becoming a mature branch of climate modelling, surprisingly little has been done to define the most appropriate battery of these tests. The main issue that makes RCM modelling distinct from GCM modelling from this point of view is that in the former time variance cannot generally be used to replace ensemble variance; when RCM simulations are driven by the same atmospheric lateral and ocean surface boundary conditions time variance severely overestimates ensemble variance. Despite this being the case, several publications do not take this into account creating thus an artificial reduction of power in statistical tests that makes detection of differences much more difficult. Rewriting a statistical test to account for this issue is not the only problem to solve. The estimation of ensemble variance implies considerable computational cost – that of generating multiple runs – and hence the merits of producing an increasing number of simulation members should be estimated.
The objective of this project is to formalize this problem and create a toolbox with RCM-centred statistical tests. The lack of formal literature in this issue is an obstacle for our community and this project aims to contribute filling this gap. Previous work, such as that of Separovic et al. (2012a), illustrates the need for this development. We expect our developed tests to be useful for these two situations: 1- In a detection problem (e.g. sensitivity test), when the optimal experimental setup is sought, and 2- For a given experimental setup, the suite of statistical tests that extract the most from the available information will be derived.