Model predictive controllers (MPC) utilize a model of the process to optimize the future trajectory using an objective function to obtain a control move plan. Any new MPC implementation requires model identification. The quality of the identified model depends on the information content of the data. Performing step tests to obtain informative data is time-consuming and may not be economical. Since the process data are stored for long-term in industries, this data can be used for identification. But this historical data contain informative data scattered among regions of insignificant variation, long-term disturbance effects, process interruptions, etc. Informative data required for identification can be mined from historical data by using appropriate machine learning techniques. This paper focuses on generating high quality data segments from historical records that can be used for identification of reliable process models for use in any model-based controller such as MPC. An interval-halving-based hierarchical classification method is proposed to identify segments and label them based on their information content and presence of disturbance. The key distinction between the proposed method and the methods in literature is the ability to identify process models from historical records that might comprise regions of low quality data, beset with intermittent disturbance effects, and one that has not been annotated in terms of these characteristics. The proposed algorithm is tested on simulated systems, and the method was able to identify process models from historical data with little to no annotation. © 2019 American Chemical Society.