To the extent that IA models have a forecasting dimension, one would want to evaluate them as is traditionally done for forecast models. That is, by comparing model simulations run over some historical time period with historical data. Since IA models are typically run a century or two into the future, one might wish to evaluate their outputs for simulations starting about a century or two ago and continuing to the present time. Part of the problem faced in this exercise is that there is a paucity of data available to document and understand the large changes that have taken place in the biosphere and social systems over the past century or so (Turner et al., 1990).
Additional problems arise in that integrated assessment models cannot be expected to simulate all the nuances of the evolution of natural and social systems over past centuries. For instance, we wouldn't expect an IA model to internally generate a stock market crash in the 1930's, world and regional wars, famines, colonial and imperialist campaigns, an oil shock event in the 1970's, and the demise of the Berlin Wall, simply because the real world is not a closed system. Yet our global and regional development trajectories seem to depend on these kinds of events. To close the model some boundary conditions need to specified for model variables or conditions. However, the more detailed the boundary conditions, the less skill there is in simulating historical trajectories. Since the evaluation of model trajectories is rendered less meaningful by specifying the right outcomes in advance through the boundary conditions, it makes more sense to focus on individual details and particular responses in the model in order to test its skill. For example, once an oil shock event is specified, how does the model respond to it? However, this starts to take on more of the character of an evaluation of processes and insights than of a forecast evaluation.
To be sure, evaluation of insights or forecasts in IA models is difficult and yields at best only partial confirmation. It is confounded by inherent limitations, conceptual difficulties, and lack of data. This may be one reason that the IA community has difficulty in resolving tension between the drive to do model development and the need to evaluate models. The current balance is grossly lopsided in favor of model development. The only major community effort relevant to evaluation at present is an intercomparison of integrated assessment models with one another as carried out in the Energy Modelling Forum (EMF) exercises. While this is laudable and necessary, it is not an adequate substitute for model evaluation against real world data and understanding. It is an open question as to whether even a relatively small effort to construct good data sets and use them to evaluate model insights would lead to greater understanding about natural and social system interactions than the grand sum of the next round of ``don't look back'' IA model integrations.