Why Is Interobserver Agreement Important

We examined the effects of multiple changes in the response rate on the calculation of total reliability, interval, exact and proportional match indices. Trained observers recorded computer-generated data that appeared on a computer screen. In Study 1, target responses occurred in separate sessions at low, moderate, and high rates, allowing for a comparison of reliability results based on the four calculations over a range of values. Overall reliability was consistently high, interval reliability was falsely high for high-throughput response, proportional reliability was slightly lower for high-throughput response, and exact match reliability was the lowest of measurements, especially for high-throughput response. In Study 2, we looked at the distinct effects of the response rate itself, bursting, and late-interval response. The results showed that the exact and proportional reliability of the low-rate response was equally high (Ms= 78.3% and 85.3%, respectively). However, the reliability of the exact match was significantly lower than the proportional reliability for responses at a medium rate (Ms = 59.5% and 76.8% respectively) and at a high rate (Ms = 50.3% and 88% respectively). These results suggest that reliability calculations are affected by the response rate, but they did not determine whether the lower exact match results were a function of the response rate itself or another characteristic of the high-rate response, such as periodic bursting. Finally, the overall quality of the results available for observation of human behaviour in in vivo sessions, which may include finer discrimination, detection of reactions by more than one sensory modality (i.e. assessment based on visual or auditory reaction characteristics), more distractions (i.e. the presence of non-targeted behaviours and other people in the session environment) and possibly more targeted responses. s, has not been investigated. However, it is important to note that the computer-generated data in this study allowed for the isolation and precise control of individual variables, ensuring uniformity of all response dimensions under all conditions.

Future studies could examine the overall quality of current outcomes by comparing reliability scores between video sessions and computer-generated data corresponding to video sessions in the rate and distribution of target responses. Variations in response rates had different effects on four common methods of calculating reliability (Study 1). Unlike overall and interval reliability, proportional reliability showed sensitivity to response rate, but was not affected as negatively by response rate as accurate reliability. .