Monocular visual odometry 구현 (OpenCV)

티스토리 뷰

연구 일지/구현

Monocular visual odometry 구현 (OpenCV)

감자보이 2021. 8. 25. 15:35

자율주행 및 로봇 관련 기술 중 SLAM (Simultaneous localization and mapping) 이 존재한다.

차량이나 로봇의 현재 위치 및 주변환경을 동시에 탐색하는 기술이다. 이때 카메라를 이용한 SLAM이 Visual SLAM이다.

Visual SLAM의 Front end 부분인 odometry를 수행하는 부분을 구현해보려고 한다.

쉽게 접근할 수 있는 단안 카메라를 이용해서 visual odometry 를 구현해보려고 한다.

구현한 내용은 깃허브에서 확인 가능하다.

https://github.com/Voilier-bsc/mono_vis_odom_python

Monocular visual odometry는 다양한 방법으로 구현하지만 다음과 같은 알고리즘으로 구현해보려 한다.

monocular Visual Odometry

visual odometry는 연속된 이미지를 이용해 카메라의 움직임(R과 t)을 output으로 얻는 작업을 의미한다.

하지만 monocular 특성상 깊이 값을 알 수 없기 때문에 GPS나 속도계등을 이용해 scale을 얻어야 한다.

Image preprocess

왜곡 보정 : 카메라 이미지에는 왜곡이 존재하기 때문에 이미지 전처리 시 왜곡 보정을 진행해야한다. 현재는 kitty dataset을 이용하므로 calibration은 따로 진행하지 않는다.
Gaussian Smoothing : 이미지에는 각종 노이즈가 존재하므로 smoothing을 진행해야 feature extraction 진행 시 좋은 품질의 feature를 뽑을 수 있게 된다. openCV에서는 다음과 같이 Gaussian Smoothing을 할 수 있다.

blur_img = cv2.GaussianBlur(Image,(3,3),0) #3X3 filter, sigma = 0

Feature extraction

영상처리에서 feature는 이미지 속에서 구별하기 쉬운 부분을 의미한다. 따라서 보통 corner (이미지 속에서 gradient의 변화가 크다)를 찾는 문제가 된다.

가장 유명하고 효율적인 feature extraction 알고리즘으로 알려져 있는 ORB(Oriented FAST and Rotated BRIEF)를 이용한다.

orb = cv2.cv2.ORB_create(
                        nfeatures=5000,
                        scaleFactor=1.2,
                        nlevels=8,
                        edgeThreshold=31,
                        firstLevel=0,
                        WTA_K=2,
                        scoreType=cv2.ORB_FAST_SCORE,
                        patchSize=31,
                        fastThreshold=25,
                        ) ##orb 선언

keypoint, descriptor = orb.detectAndCompute(Image,None) #Image에서 keypoint와 descriptor 추출

Feature matching

orb 알고리즘을 이용해 뽑은 key point의 descriptor(기술자)는 matching을 진행할 때 brute force matching 기술을 사용한다.

bf matching은 각 query 기술자와 train 기술자의 모든 거리를 계산해 거리가 작은 기술자 끼리 matching 한다.

orb의 descriptor는 이진 기술자이므로 hamming distance를 사용한다.

bf = cv2.BFMatcher(cv2.NORM_HAMMING) #brute force matcher 선언, 거리 판단 시 Hamming norm으로 판단

##matching 진행 후 거리가 짧은 순서대로 sorting 진행 후 상위 1000개의 matching만 선별
matches = bf.match(des1,des2) 
matches = sorted(matches, key = lambda x:x.distance)
idx = matches[0:1000]

Essential Matrix, Recover Pose

Epipolar geometry란 두 연속된 이미지 또는 stereo 이미지 상에서의 geometry를 의미하며 이를 이용해 Essential Matrix를 구한 후 R과 t를 추정할 수 있다. 아래 사진에서 자세하게 식을 유도했다.

5 point 또는 8 point 알고리즘과 Essential matrix를 이용해 R과 t 행렬을 구한다.

#openCV에서 구현된 Essential matrix를 구하는 함수
#camera intrinsic parameter를 알아야하며 RANSAC을 이용해 outlier를 제거한다.
E, mask = cv2.findEssentialMat(pts1,pts2,focal, principal point, method=cv2.RANSAC, prob = 0.999, threshold=1.0)

#구한 Essential matrix를 이용해 R과 t를 구한다. 
_, R, t, _ = cv2.recoverPose(E, pts1, pts2, focal = focal, pp = pp)

Update trajectory

R과 t를 업데이트 하기 전, 움직인 실제 scale을 얻어야한다. kitti dataset에서는 poses.txt를 이용해 scale을 구할 수 있다.

poses.txt의 한 줄 구조는 r11 r12 r13 tx r21 r22 r23 ty r31 r32 r33 tz 이므로 tx, ty, tz를 따로 파징하고 scale을 구한다.

def getScale(NumFrame, t_gt):

    txt_file = open('./dataset/poses/00.txt')
    
	# 이전 t 저장
    x_prev = float(t_gt[0])
    y_prev = float(t_gt[1])
    z_prev = float(t_gt[2])
		
	# txt파일을 읽은 후 띄어쓰기로 split
    line = txt_file.readlines()
    line_sp = line[NumFrame].split(' ')
		
	# tx, ty, tz 가져오기
    x = float(line_sp[3])
    y = float(line_sp[7])
    z = float(line_sp[11])

    t_gt[0] = x
    t_gt[1] = y
    t_gt[2] = z

    txt_file.close()
		
	# scale 계산
    scale = math.sqrt((x-x_prev)**2 + (y-y_prev)**2 + (z-z_prev)**2)
    return scale, t_gt

Calculate Root Mean Square Error

odometry의 정확도를 측정하기 위해 에러를 계산하는 코드를 작성했다.

# caculate Error
error = map(operator.sub,t_gt,t_f)
error_sum_square = sum(map(lambda x:x*x,error))
rmse = math.sqrt(error_sum_square/3)
rmse_total = rmse_total + rmse

print("rmse     = ",rmse_total/numFrame)

Result

결과는 다음과 같다. 초록색 선은 kitti dataset에서 제공한 ground truth 값이고, 빨간 색 선은 visual odometry를 이용한 결과이다.

초반에는 에러가 적지만 점점 draft가 쌓이면서 에러가 증가하게 된다.

0번째 seq에서는 중간에 회전에 대한 estimation에 대한 에러로 인해 에러가 급격히 증가한 반면,

sequence 01, 02에서는 dynamic object가 많아서 R과 t를 추정하는데에 방해가 되기 때문이다.

확실히 visual odomety를 이용해서 localization을 진행하게되면 에러도 많이 쌓이고 robust하지도 않다.

SLAM에서는 Backend 부분에서 loop closing 이후 전체 trajectory의 에러를 수정한다.

현재 어떤 방식으로 odometry시 error를 줄일지 고민중에 있다. 기회가 된다면 그 연구 내용을 게시할 예정이다.

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

github

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

글 보관함

감자보이로그

티스토리 뷰

Monocular visual odometry 구현 (OpenCV)

monocular Visual Odometry

Image preprocess

Feature extraction

Feature matching

Essential Matrix, Recover Pose

Update trajectory

Calculate Root Mean Square Error

Result

티스토리툴바