Abstract:In response to the lack of data samples for highway diseases both in China and abroad, this paper established a disease dataset that included 2D images and 3D morphologies of highways and clarified the principles of data collection, data processing methods, sample description, data quality control, and validation. The dataset contained 11 types of highway pavement diseases such as cracks, block cracks, longitudinal cracks, transverse cracks, subsidence, rutting, wave congestion, and potholes, in a total of 576 subdivided scenarios with different lanes, different light backgrounds, and different road structures. The dataset also comprised six types of traffic signs: warning signs, prohibition signs, directional signs, guide signs, tourist area signs, and road construction safety signs. This dataset could provide a large number of training samples for neural network models for defect detection on various highways and be used for small sample training of the YOLOV7 model, verifying the effectiveness of the dataset. The results of the study show that establishing an image dataset based on highway scenarios can effectively deepen the understanding of highway diseases and pavement scenarios, provide an intelligent information-based solution to highway pavement disease detection, and lay a solid foundation for the training of relevant algorithmic models and the subsequent construction of the dataset.