Abstract

This paper proposes a 3D autonomous navigation line extraction method for field roads in hilly regions based on a low-cost binocular vision system. Accurate guide path detection of field roads is a prerequisite for the automatic driving of agricultural machines. First, considering the lack of lane lines, blurred boundaries, and complex surroundings of field roads in hilly regions, a modified image processing method was established to strengthen shadow identification and information fusion to better distinguish the road area from its surroundings. Second, based on nonobvious shape characteristics and small differences in the gray values of the field roads inside the image, the centroid points of the road area as its statistical feature was extracted and smoothed and then used as the geometric primitives of stereo matching. Finally, an epipolar constraint and a homography matrix were applied for accurate matching and 3D reconstruction to obtain the autonomous navigation line of the field roads. Experiments on the automatic driving of a carrier on field roads showed that on straight roads, multicurvature complex roads and undulating roads, the mean deviations between the actual midline of the road and the automatically traveled trajectory were 0.031 m, 0.069 m, and 0.105 m, respectively, with maximum deviations of 0.133, 0.195 m, and 0.216 m, respectively. These test results demonstrate that the proposed method is feasible for road identification and 3D navigation line acquisition.

1. Introduction

Cultivated land in hilly regions accounts for 63.2% of the total cultivated area in China and is an important agricultural production base for various crops, such as grain, oil plants, and tobacco [1]. The transportation of agricultural materials and products on field roads, which accounts for 20% of the total workforce of agricultural production, is one of the most important tasks in agricultural production in hilly regions. Autonomous transportation machines are urgently needed in hilly regions due to the severe shortage of human labour and the intense requirement to improve productivity. With the development of rural construction, a large number of field roads of cement pavement with widths of 1.2 m to 2.5 m have been built in the hilly areas of China, thus providing basic conditions for agricultural mechanization. In fact, these field roads in hilly regions are often twisted, windy, and rolling; these characteristics, coupled with occlusion by different types of crops along both sides, make obtaining an accurate guide path extremely difficult. As a result, the development of automated field transport machines for field roads in hilly regions has been limited to date.

Obtaining a navigation line for a field road is a prerequisite for transport machines to drive automatically on the road. To solve this task, an autonomous transport machine must be equipped with a set of sensors that allow it to accurately determine its position relative to the surrounding limits. Currently, the most commonly used navigation systems for agricultural machines are the Global Navigation Satellite System (GNSS), Machine Vision Navigation System, Light Detection and Ranging (LIDAR), and Combined Navigation Systems composed of two or more subsystems [27]. The most affordable sensor for direct measurement of position, GNSS, does not reach this level of accuracy [8]. Furthermore, GNSS suffers from occasional outages in position due to communication link failures and loss of satellite lock due to occlusion by obstacles such as trees [9, 10]. LASER scanners (LIDAR), or laser rangefinders, are commonly employed to obtain three-dimensional point clouds of the area for off-road navigation [11], for urban search and rescue, or for agricultural applications [12, 13]. LASER-based sensors are able to directly measure distances and require less computer processing than vision-based techniques [14]. However, a drawback of this kind of sensors is that they are expensive. Therefore, the high cost of real-time kinematic navigation sensors has limited the commercialization of autonomously guided agricultural machines [15].

Machine vision systems are very suitable tools for wide perception of the environment, increasingly being used as a lower-cost alternative to LIDAR. Cameras are very inexpensive equipment. For example, the camera used in our prototype field road carrier costs 252 RMB in commercial shops. Another interesting point is that images convey a huge amount of information. In particular, binocular vision has good environmental perception ability [8, 16]. The guide path for autonomous transport machines can be extracted through recognizing the driving range, road conditions, and surroundings by binocular vision. Therefore, binocular vision can be used as one of the main methods of navigation line detection for autonomous field transportation machines in hilly regions.

The field roads in hilly regions are typically unstructured roads. For applications in vision navigation, the navigation line of unstructured roads is usually acquired by analyzing the differences in textures, colors, edges, and other characteristics between the road and its surroundings based on the assumption that the road surfaces are planar or an idealized treatment [1719]. Based on this idealized approach, the feasible guide path for vehicles can be identified. Liu et al. [20] proposed an online classifier based on the Support Vector Machine (SVM) to classify road scenes under different weather conditions in different seasons and presented an accurate road border model for the autonomous path detection of Unmanned Ground Vehicles (UGVs) by using the AdaBoost algorithm and random sample consensus (RANSAC) spline fitting algorithm. Wang et al. [21] proposed an unstructured road detection method based on an improved region growth with the Principal Component Analysis-SVM (PCA-SVM) method. A priori knowledge, such as the location of the road, the initial cell, and the characteristics of the road boundary cells, was used to improve the region growth method, and the classifier was used to select the cell growth method to eliminate the miscalculated area. Liu et al. [22] proposed an unstructured road detection approach based on the color of the Gaussian mixture model and the parabolic model. First, based on a combination of averaging filtering and subsampling, the color image was changed from high resolution to low resolution and given illumination compensation. Then, a Gaussian mixture model was formulated based on the -means algorithm to obtain the optimized clustering center of the road area and other areas, and the parameters of the right and left road parabolic models were solved by using the Least-Squares Method (LSM). Finally, the road information was extracted after fitting the boundary of the road.

Methods for acquiring unstructured road navigation lines based on the plane assumption or idealized treatment have limited the adaptability and effectiveness of automatic driving machines in actual environments to some extent, and the requirements of road models under complex conditions have not yet been met. Multidimensional road perception models have been studied, but these models remain mainly in the theoretical analysis stage. Jiang [23] proposed horizontal and vertical methods of modelling the road surface. In the horizontal direction, a 3D-parameterized free-shape lane model was established according to the relationships between the 3D geometric points of the double boundaries of the lane. In the vertical direction, 3D information in the vertical direction of the road surface was obtained using scale-invariant features. Wang [24] used two vertical omnidirectional cameras to capture 3D information of road images, establish a road space model, and calculate the road width. Byun et al. [25] proposed a novel method for road recognition using 3D point clouds based on a Markov Random Field (MRF) framework in unstructured and complex road environments. This method transformed a road recognition problem into a classification problem based on MRF modelling and presented guidelines for the optimal selection of the gradient value, the average height, the normal vectors, and the intensity value. Jia et al. [26] was concerned with the road reconstruction problem of on-road vehicles with shadows. To deal with the effects of shadows, images were transformed to the proposed illuminant invariant color space and fused with raw images. The road region was reconstructed from a geometric point of view. Deng et al. [27] proposed a binocular vision-based, real-time solution for detecting the traversable region outdoors. An appearance model based on multivariate Gaussian was constructed from a sample region in the left image. A fast, self-supervised segmentation scheme was proposed to classify the traversable and nontraversable regions.

In view of the characteristics of field roads in hilly regions, such as the lack of lane lines, blurred boundaries, and complex backgrounds, this paper proposed a new method of 3D navigation line extraction in field roads to obtain key information (i.e., the autonomous guide line and slope gradient) based on a low-cost binocular vision system. The modified methods of image processing, statistical feature extraction and 3D reconstruction were studied in detail. The novel features and contributions of this paper include the following: (i) the problem of image recognition with shadows in the field roads was studied; (ii) according to the facts that the field roads were characterized by nonobvious features, the centroid points of the road area were used as matching primitives; and (iii) the fitting curve of continuous centroid points was used as the navigation line for unmanned agricultural machinery on the field roads.

The object was to obtain the navigation line with 3D coordination information. First, after obtaining the road area by threshold segmentation and shadow recognition, the centroid of the road area as its statistical feature was extracted and then smoothed as the geometric primitives of stereo matching. Then, the homography matrix was solved through Speeded-Up Robust Features (SURF) detection based on the RANSAC algorithm, and the epipolar constraint was applied to achieve accurate feature matching. Furthermore, the 3D information of the navigation line was extracted from the matched centroid points. Finally, an automatic driving test of an autonomous carrier was conducted to verify the proposed method.

2. Image Processing

2.1. Image Processing Method Architecture

The objective of image processing is to distinguish the road area from its surroundings. The proposed image processing procedure consists of three main linked phases: (i) image segmentation, (ii) identification of the shadow areas, and (iii) the integration operation. Figure 1 shows the full structure of the proposed procedure as a flowchart.

Field roads in hilly regions are irregular and have blurred boundaries. These characteristics, coupled with the complex surface status, surroundings, such as trees and crops, covering the two sides of the road, and various water stains and shadows smearing the surface, makes acquiring information on field roads through original images extremely difficult. Therefore, the multiprocessing of original images is required to recognize field roads from their surroundings. First, the V component in the HSV color space is separated to perform Otsu threshold segmentation and postprocessing, and the obvious road area and nonroad area are obtained. Then, by selecting appropriate parameters, the S and V components are each subjected to point calculation and then weighted and merged according to different weights to extract the shadow features. Finally, the shadow area and the nonshadow road area are combined and postprocessed again to obtain the complete road area in a binary image.

2.2. Segmentation

Hundreds of field road images were captured at Chongqing, China, which is a typical hilly area. The Otsu threshold segmentation effects of these images in RGB, Lab, HSV, and HSI color spaces were compared. The results showed that the V component in the HSV color space has a better adaptation to the influence of water stains and weeds on roads, while the S component is insensitive to shadows on the road. Therefore, Otsu threshold segmentation based on the V component in the HSV color space was adopted to detect the road area. Because the target of image segmentation is the road scope, which is relatively large, and there is no detailed requirement for small parts, the morphological operations and connected region area treatment are gradually introduced to segment a road from its surroundings.

In morphological operations, the opening and majority operations (size ) are applied to remove insignificant small patches and spurious pixels over the binary image. Then connected domains are labeled with the 4-adjacent Seed-Filling method, and their areas are calculated. The contours of the connected domains with a small areas is discarded. The contour curves of the connected domains with larger areas are redrawn with the polygon fitting method. Then, the obvious road area and nonroad area are obtained.

2.3. Shadow Processing

Usually, crops or trees along both sides of the road will cast shadows on the surface with various shapes during different periods, which will hinder the road from being distinguished. The V component in the HSV color space has good adaptability to recognize the road areas inside the image but is not effective in identifying the shadows that are often classified as a part of the background. This inability directly affects the integrity of the road information; thus, recovery of the road area with shadows is particularly important. In this paper, the characteristics of the S component are utilized because the S component in the HSV color space is not sensitive to shadows. By selecting appropriate parameters, the S and V components are each subjected to point calculation and then weighted and merged according to different weights to extract the shadow features.

Image display effect can be changed by point calculation. Define as the input image and as the output image; the point calculation is then where is the coefficient, is the intercept, and are the pixel coordinates.

This paper chooses the straightforward method of Weighted Averaging (WA) to fuse the S and V components. Although this method weakens the details of the image to a certain extent, it is easy to implement, fast, and can improve the signal-to-noise ratio of the fused image. Let the image of the V component after the point operation be , the image of the S component after the point operation be , and the weighted and fused image be , then the mathematical expression between the images is where is the index value of the multidimensional array element; is the weight of the matrix element; and is the weight of the matrix element.

In order to better obtain the of the S and V component point operations and the and of the weighted fusion, Table 1 was designed to perform point operations and weighted fusion under different , and . Then, the threshold segmentation processing results were evaluated and represented on a scale of 1 to 10, where 1 indicates the worst effect and 10 indicates the best effect. The appropriate , , and values were selected by analyzing and comparing the processing results.

After a large number of experiments, the V component point operation slope , the S component point operation coefficient , the weight of the matrix element , and the weight of the matrix element were finally selected for shadow processing of the road. The road shadow detection results are shown in Figure 2.

It can be seen from Figure 2 that for the road shadows of different depths and areas, the weighted fusion shadow processing algorithm can extract the road shadows effectively and accurately and obtain a complete shadow area. In addition, the algorithm is simple to use without particular limitations on the scene composition of the original image, and thus has a broad application scope.

2.4. Image Merging

At this stage, the road area segmented by the V component and the area recovered from shadow recognition are merged through logic integrated operations and morphological operations. Then, the complete road area is distinguished from the nonroad area and presented as a binary image.

The results of the shadow recognition and image merging are shown in Figure 3.

3.1. Extraction of Statistical Features

The rural field road has no obvious features and also has little difference of the gray value. Under such conditions, the centroids of the road area are extracted as the road’s statistical feature points and as the stereo matching primitive in this paper. Moreover, these centroids are smoothed by the LSM to eliminate the disturbance factor in road area recognition.

The binocular camera used in this study was mounted on the front of a field transportation machine. Two digital images of the road and its surroundings were captured by the left and right cameras, respectively. In fact, only a reduced area inside the image is of interest either for applying site-specific treatments or as a reference for guiding, namely, the region of interest (ROI). The two-thirds area from the bottom to the upper portion of the binary image is specified as the ROI based on the camera installation and shooting angle. The ROI is divided into 12 equidistant segments by horizontal lines along the vertical direction of the image, and the pixel coordinates of the road area in each segment are extracted. Then, the centroid of the ROI is calculated. Let the target area (road area) be of each segment inside the binary image. The formulas for calculating the coordinates (, ) of the centroid of area are as follows: where is the total number of pixels in area , and and are the pixel coordinates.

The extracted centroids are shown in the original RGB image, such as Figure 4(a).

As shown in Figure 4(a), the centroids of the extracted road area can accurately express the direction and, to some extent, the midline of the road if being connected continuously. However, due to the influence of irregular factors, such as weeds and water stains, the distinguished field road area may be inaccurate, thus causing the road area centroids extracted through the above method to deviate from the actual centerline. Since the path of the actual field road is continuous, the line connected continuously by all centroids should be smooth. Therefore, the extracted centroids are smoothed through the following phases: (i) least-squares curve fitting for the centroid points, (ii) obtaining the fitting function, and (iii) recalculating the new abscissa value corresponding to the original ordinate of each centroid using the fitting function. This method can ensure the continuity of the path and eliminate the impact of incorrect path information. Figure 4(b) shows the reacquired centroids of the original images in Figure 4(a). Then, the reacquired centroids are taken as the statistical feature points of the road areas.

3.2. Characteristics Matching

Stereo matching is a critical step of the three-dimensional navigation information extraction of field roads. Based on image preprocessing, five sequential processed are carried out for characteristics matching: (i) first process the left image, extract and smooth the centroid points of the road area; (ii) then use SURF to detect the automatic matching of multiple sets of corresponding points in the left and right images to find the homography matrix; (iii) use the homography matrix and the centroid points extracted from the left image to find the correspondence points in the right image; (iv) perform the epipolar constraint test; and (v), use the obtained pairs of the corresponding centroid points in the right and left images to perform 3D reconstruction.

For a binocular camera, each actual centroid of the road corresponds to two related pixels, namely, one inside the left image and the other inside the right image. The relationship between the two pixels is described by the homography matrix. The matching relationship of the road area centroids in the left image and those in the right image can be obtained by solving the homography matrix.

Suppose that is the homogeneous coordinate of a 3D point in an image, and is the homogeneous coordinate of the corresponding point of point P in the matching image. The transformation from point to its corresponding point can be obtained through the homography matrix [28, 29]:

The homography matrix describes the transformation relationships of an actual point in two images, namely translation, rotation, and scaling. To obtain the relationships between the statistical features of two images of a field road more precisely, Speeded-Up Robust Features (SURF) detection based on the RANSAC algorithm [30] is used to match the corresponding feature points. The homography matrix is then calculated by finding the relationships between multiple pairs of matching points. In this case, the procedure is as follows: (a)The Hessian matrix of each pixel is constructed, and each pixel point processed by the Hessian matrix is compared with points in the neighborhoods of the 2D image’s spatial and scale spaces to initially locate the key points. Then, the key points with weak energy and fault localization are removed, and the final stable feature points are filtered out.(b)The main direction of the feature point is selected according to the Haar wavelet feature of the cycler neighborhood of the feature point, and the descriptor is determined.(c)The feature points are matched by the Euclidean distance and the Hessian matrix trace between two feature points, and the RANSAC algorithm is used to remove the pseudomatched points to ensure the effectiveness of the match. The matching results of an image in Figure 3(b) with its corresponding image captured by the other camera of the binocular are shown in Figure 5.(d)The homography matrix H is calculated using the findHomography function in the OpenCV visual library.(e)The unique matching point in the right image corresponding to each statistical feature point of the road area (reacquisition centroid) in the left image is calculated according to Equation (4). The matching results of the road centroids in Figure 4(b) are shown in Figure 4(c).

Usually homography is estimated between points that belong to the same plane. This paper uses the SURF algorithm for feature matching and uses the RANSAC method to remove mismatched points in the whole image plane based on the following considerations. (1) At present, China’s hilly field roads have basically been cement hardened, and the differences in gray scale and texture of the road surface are small, with no obvious structural features. If the homography is limited to the road plane, the matching accuracy may be reduced. (2) It is difficult to recognize and mark the boundaries of the field roads because the boundaries are nebulous. If the acquisition of the homography matrix is limited to the road plane, the image processing needs to be increased, such as dividing the boundaries of the road, and thus the time for image processing will be increased.

3.3. Validation of Matching Pairs

In an unknown environment, the disturbances are complex and changeable, and the single constraint may not accurately match feature points. Therefore, the epipolar constraint is introduced to further validate the matching pairs.

The epipolar constraint describes the constraint of the point to a line in two images, thus reducing the search for the corresponding matching point from the entire image to a line [31, 32]. Figure 6 shows two pinhole cameras, their projection centers, and , and image planes and . The vectors and refer to the projections of onto the left and right image planes, respectively, and are expressed in the corresponding reference frame. The line connecting the projection centers of the two cameras is the baseline. The plane defined by , , and is called the epipolar plane . The intersection points and of the baseline and the two camera planes are the epipoles. The intersection lines and between the polar plane and the two camera planes are the epipolar lines, defined as and , respectively.

Consider the triplet , and , and if is given, can lie anywhere on the ray from through . However, since the image of this ray in the right image is the epipolar line through the corresponding point , the correct match must lie on the epipolar line [32]. Lines and are called a pair of polar lines and constitute the epipolar constraints of the matching points. The epipolar constraint between two images can be described by the fundamental matrix .

Define and as the points in the pixel coordinates corresponding to and in the camera reference frame. According to epipolar geometry, for point on the left image, the corresponding epipolar line on the right image can be expressed as follows:

Correspondingly, for point on the right image, the corresponding epipolar line on the left image can be expressed as follows:

If the corresponding point in the left image is in the right image, point must be on line and satisfy the following condition:

The key to obtaining the epipolar lines is the calculation of the fundamental matrix . The fundamental matrix is a matrix, which represents the correspondence between the matching points and includes the information of the camera’s internal and external parameters. The matrix forms the foundation for the camera’s matching, tracking, and three-dimensional reconstruction.

Suppose that () and () are the coordinates of and , respectively, which can be written as () and () in the homogeneous reference frame. Then, according to equation (7), we have

To rewrite the elements of the fundamental matrix into a column vector , we use

Let be the coefficient matrix of the equation (9); then,

Thus, the fundamental matrix can be obtained through the eight-point algorithm [32] based on equation (10). By utilizing the multiple correspondence points obtained from the SURF detection based on RANSAC, the fundamental matrix is obtained through the findFundamentalMat function in the OpenCV visual library. Then, according to equation (5), the corresponding epipolar line in the right image of any point in the left image is obtained, and the search range of its matching point is reduced to a line.

After obtaining the epipolar line, an additional step is applied to extend the unique matching point obtained by the homography matrix processing to a rectangle, and then estimate the positional relationship between the rectangle and outer epipolar line. As shown in Figure 7, if the epipolar line intersects the rectangle, the matching point is retained. If it is not intersected, the point is eliminated due to the larger matching error. Therefore, the matching pairs obtained by the homography matrix processing are validated through the epipolar line constraint processing.

After homography matrix processing and epipolar line validation, the matching results of the field road’s statistical feature points of images in Figure 4(c) are as shown in Figure 8.

The matching results of the images in Figure 8 are evaluated by the matching error and running time of the program. The matching error includes two parts: (i) horizontal matching error: the ratio of the pixel difference and total pixel points in the horizontal direction and (ii) vertical matching error: the ratio of the pixel difference and total pixel points in the vertical direction. The pixel difference is defined as the difference value between the matching point and the precise position. The evaluation results are shown in Table 2.

Figure 8 and Table 2 show that the proposed matching method based on the homography matrix and epipolar line constraint has good matching accuracy, a good matching effect, and a fast matching speed due to the fewer matching primitives. Furthermore, more processing results of other images demonstrate that this method has good performance in suppressing noise, anti-interference, and is robust in image transformation.

3.4. 3D Reconstruction

Most roads in hilly regions fluctuate due to the rugged terrain. The 3D information of the navigation line not only provides the changes in the direction of a field road but also offers gradient variation, which has a considerable influence on the control of an autonomous transportation machine. According to the principle of binocular vision [33], the three-dimensional coordinate information of the road’s statistical features can be extracted through the LSM processing of the intrinsic parameters and the extrinsic parameters obtained by calibration and the coordinates of the statistical feature points of the road obtained from stereo matching. This process is also called 3D reconstruction of the binocular vision [34].

As previously described, the pixel positions in the right and left cameras of point are and , which can be obtained through characteristic matching. The projection matrixes for the right and left cameras, namely and , can be achieved by camera calibration.

The relation between the pixel coordinate and world coordinate of the left camera image can be expressed as follows:

Similarly, the corresponding relationship for the right camera image can be expressed as follows: where and are the coordinate values of on the two respective optical axes; (, , and ) and (, , and ) are the homogeneous coordinates of and in the image reference frame, respectively; (, , , and ) is the homogeneous coordinate of in the world reference frame; and is the th row and th column element of .

From equations (11) and (12), four linear equations about , , and are obtained:

The coordinate values of can be obtained by equation (13) because 3D point is the intersection of and (see Figure 6). In order to reduce the influence of data noise, LMS method is applied. Equation (13) can be rewritten as follows: where

According to the LSM, the following equation can be obtained:

According to equation (17), the 3D coordinates of the extracted road area centroids can be solved. Thus, the line that continuously connects all the extracted centroids can be used as the navigation line of the field road.

Furthermore, the slope gradient of the road can be calculated using the 3D coordinate information of the extracted road area centroids. The three-dimensional information can provide the vehicle with the slope change of the road, which has a great influence on the vehicle control of the carrier. Figure 9 shows a slope model in a vehicle reference frame. If calculated along the column indicated by the dotted line on the image, the obtained relative variation of the coordinate is the slope along the intersection of the surface and the ground, which represents the fluctuation of the road to be driven in front of the vehicle.

From the geometric relationship, the slope component along the direction is

The slope component along the direction is

The horizontal distance of the two spatial points and is , then the slope can be calculated as where , , and are the coordinate differences between two road centroids in the vehicle reference frame, in which is the direction of motion.

Using the three-dimensional coordinates of the road’s centroid point, the fluctuations of the road can be clearly obtained, providing data support for the subsequent vehicle control of the carrier. Figure 10 shows the 3D coordinates of the road area centroids and their connecting line extracted from the images in Figure 4. Table 3 shows the calculated slope gradient of the roads in Figure 4 and its error with the actual slope gradient of the road.

As shown Figure 10 and Table 3, the three-dimensional coordinates of the extracted road area centroids can clearly describe the fluctuations of the road, with the reconstruction error of the slope gradient remaining below 10%. In fact, many factors affect the accuracy of 3D reconstruction, including the accuracy of the intrinsic parameters of the camera, changes in the calibration environment, collocation and position of the camera, three-dimensional model of the camera, matching accuracy, and target size [35].

4. Experimental Results and Discussion

4.1. Methods

To verify the feasibility and accuracy of the proposed method to extract the navigation line, an autonomous field road carrier with a binocular vision navigation system was built, as shown in Figure 11. The length of the field road carrier is 1.13 m, with a wheelbase of 0.76 m, a tread of 0.45 m, and a maximum load of 150 kg. The experimental field road is 1.2 m wide with a significant change in altitude and curvature. The speed of the carrier is 2 m/s.

A low-cost binocular vision system, model RER-720P2CAM-90 by RERVISION Technology Co., Ltd. (Shenzhen, China), was mounted on the front of the carrier at a height of 0.8 m from the ground and with the optical axis inclined to 25° with respect to the ground and without lateral displacements. This device was equipped with two pixel cameras with a center spacing of 62 mm. The images were processed by OpenCV using the Microsoft Visual Studio (2010) integrated development environment (IDE). A high-accuracy real-time kinematic-global positioning system (RTK-GPS), which included a fixed base station and a rover on the carrier to reduce the carrier’s position error, was used to collect the real-time location coordinate information. The positioning accuracy of the RTK-GPS is 2 cm.

The images were obtained with the two cameras of the binocular vision system. A personal computer (PC) was used for image processing, road feature extraction, stereo matching, and 3D reconstruction. On this basis, the extracted 3D navigation line of the road was applied as the reference of the path tracking while the carrier automatically drives on the field road. Using a USB2UIS adapter board, the PC sent the navigation information to the carrier controller by RS-232 serial communication. The carrier controller directly controlled the steering servo motor and the drive motor of the carrier to realize the automatic driving of the carrier. A fuzzy neural network control algorithm was adopted to realize path tracking [36]. During visual navigation driving, an image frame was captured every 0.2 s and then the navigation line was extracted. Based on the navigation line, the lateral deviation, heading deviation, and path curvature were calculated and taken as the input parameters of the neural network controller. The output parameter of the controller was the turning angle of the carrier. Figure 12 shows a flowchart of the entire process.

To study the deviation between the carrier’s automatic travel trajectory and the actual midline of the road under various conditions in hilly regions, three types of field roads—straight, complex multicurvature, and fluctuating roads—were selected as test roads, as shown in Figure 13. The carrier drove twice on the same road.

In the first instance, the carrier drove accurately along the actual midline of the road by manual operation, and the trajectory and coordination values of the carrier were measured by the RTK-GPS; these values were taken as the midline of the road. To do so, first, we marked the actual midpoint every 10 cm on the road using a ruler to measure the width of the road; second, we drove the carrier along these midpoints at a low speed of 2 m/s. A rod was installed in front of the carrier head to mark the central position of the carrier. As the carrier drove, the steering was accurately controlled so that the marking rod passed through the midpoints of the road. In this way, the carrier was driven along the middle of the road with minor deviations. Accordingly, the collected coordinates could be taken as the ground-truth and used to validate the visual navigation approach.

In the second instance, the carrier drove automatically along the road under the guidance of the extracted 3D navigation line, and the travel trajectory and coordinate values of the center point of the carrier were measured by the RTK-GPS; these values were taken as the automatic travel trajectory of the carrier. The coordinate deviations between the midline of the road and the real-time traveling trajectory of the carrier were recorded and compared using MATLAB.

4.2. Results and Discussion

The midline of the road, the automatic travel trajectory of the carrier and the deviation between the midline and the travel trajectory for the straight road condition, the complex multicurvature road condition, and the fluctuating road condition are shown in Figures 1416, respectively. The deviation includes a left deviation and a right deviation. The left deviation value is positive, and the right deviation value is negative.

Figure 14 shows that under the straight road condition, due to the regular road conditions and lack of other unfavorable factors, the automatic travel trajectory and midline largely overlap, with a maximum deviation of 0.133 m and an average deviation of 0.031 m. It indicates that the carrier can drive automatically under the guidance of the extracted navigation line with a small deviation near the midline of the road in a straight path.

Figure 15 shows that under complex multicurvature road conditions, the maximum deviation between the automatic travel trajectory and the road midline is 0.195 m, and the average deviation is 0.069 m. Compared to that on the straight road, due to the influence of unfavorable factors such as curves, shadows, and water stains, which disturb the extraction of the navigation line, the deviation of the automatic travel trajectory on the complicated multicurvature road has increased. However, in the real test, the carrier can keep running along the midline of the road and meet the requirement of the carrier driving on the field road automatically without going off the road.

The fluctuating road that was tested is composed of multiple complex segments, including straight and multicurvature sections with shadows, water stains covering the surface and weeds or crops covering the two edges. Figure 16 shows that under the fluctuating road, the maximum deviation between the automatic travel trajectory and road midline is 0.216 m, and the average deviation is 0.105 m. The carrier can still automatically travel along the midline of the road without straying off the road.

Through the test results on various roads, the main contributors to the deviation between the automatic travel trajectory and midline include the following: (i) weeds or crops on the edges of the road that are classified as nonroad areas after image processing, which results in an extracted navigation line that differs from the actual midline of the road; (ii) real-time changes in the carrier posture, which lead to frequent changes in the extracted navigation line, resulting in a tracking deviation; (iii) the intrinsic error of the RTK-GPS, which is at least 2 cm; (iv) the measurement method of the actual midline of the road, which is imprecise under manual operation mode; (v) the measurement accuracy of the midline coordination of the road and the real-time position coordination of the carrier, which can be interrupted by the jittering of the cameras and the carrier; and (vi) the error of the automatic steering control based on the extracted navigation line.

In fact, the intrinsic errors of the RTK-GPS and real-time position measurement accuracy cannot be eliminated, but they have no effect on the autonomous driving of the carrier. In addition, test results have shown that the utilized fuzzy neural network control algorithm gives satisfactory results for automatic steering control [36] and the improvement obtained by changing the control parameters is small. The influences of the carrier posture change and the jittering of the cameras, which are related to the robustness of image capture, can be reduced by using image mosaics [37] and adopting an optimal cohesion algorithm for two adjacent images. Therefore, the extraction of the midline of field roads under various situations is the most critical factor responsible for deviations on the road during autonomous navigation driving.

5. Conclusions

This paper proposed a 3D autonomous navigation line extraction method for field roads in hilly regions based on a low-cost binocular vision system. A modified image processing method was presented to strengthen shadow identification. The centroid points of the road area as its statistical feature were extracted and smoothed and then used as the geometric primitives of stereo matching. The epipolar constraint and homography matrix were applied for accurate matching and 3D reconstruction to obtain the autonomous navigation line for the field roads. Finally, an automatic driving test of a carrier in hilly regions was carried out to verify the proposed method. The experimental results indicate the following findings: (a)On the straight road, the average deviation between the actual midline of the road and the automatic travel trajectory is 0.031 m, with a maximum deviation of 0.133 m. On the complex multicurvature road, the average deviation is 0.069 m, with a maximum deviation of 0.195 m. On the undulating road, the average deviation is 0.105 m, with a maximum deviation of 0.216 m. The carrier can travel automatically along the midline of the road without straying off the road.(b)The proposed 3D autonomous navigation line extraction method for field roads can realize road recognition and 3D coordination information acquisition and can meet the requirements for a carrier to drive automatically on a field road. To some extent, this method can also be applied for the automatic driving of other agricultural machines on field roads.

Data Availability

The experimental result data used to support the findings of this study are included within the article, and the source code data are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there is no conflict of interest regarding the publication of this paper.

Acknowledgments

This research was funded by the Fundamental Research Funds for the Central Universities (grant nos. XDJK2014C031 and XDJK2017C079).