今日では、拡張現実(AR)はコンピュータビジョンやロボティクス分野における最先端の研究テーマの一つである。拡張現実における最も基本的な問題は、コンピュータビジョン分野では後続の3Dレンダリングを行うために、ロボティクスでは把持や操作のために物体の姿勢を得るために、物体に対するカメラ姿勢を推定することである。しかし、これは簡単に解ける問題ではない。なぜなら、画像処理で最もよくある課題は、人間にとっては基本的で即座に解ける問題を解くために、多くのアルゴリズムや数学的演算を適用する際の計算コストだからである。

目的

このチュートリアルでは、2D画像とそのテクスチャ付き3Dモデルが与えられたときに、6自由度を持つテクスチャ付き物体を追跡するためにカメラ姿勢を推定するリアルタイムアプリケーションの作り方を説明する。

このアプリケーションは以下の部分から構成される。

テクスチャ付き3D物体モデルと物体メッシュを読み込む。
カメラまたはビデオから入力を取得する。
シーンからORB特徴と記述子を抽出する。
Flannマッチャを使ってシーン記述子をモデル記述子とマッチングする。
PnP + Ransacによる姿勢推定。
不正な姿勢を棄却するための線形カルマンフィルタ。

理論

コンピュータビジョンにおいて、n個の3D-2D点対応からカメラ姿勢を推定することは、基本的でよく理解された問題である。最も一般的な形の問題では、姿勢の6自由度と5つのキャリブレーションパラメータ(焦点距離、主点、アスペクト比、スキュー)を推定する必要がある。これはよく知られたDirect Linear Transform (DLT)アルゴリズムを用いて、最小6個の対応で確立できる。ただし、DLTの精度を向上させる多様なアルゴリズムへとつながる、いくつかの簡略化が存在する。

最も一般的な簡略化は、キャリブレーションパラメータが既知であると仮定することであり、これはいわゆるPerspective-*n*-Point問題である。

問題の定式化: ワールド基準座標系で表された3D点 \(p_i\) と、それらの画像上への2D投影 \(u_i\) の対応の組が与えられたとき、ワールドに対するカメラの姿勢( \(R\) と \(t\) )および焦点距離 \(f\) を求めることを目指す。

OpenCVはPerspective-*n*-Point問題を解くための4つの異なるアプローチを提供しており、それらは \(R\) と \(t\) を返す。その後、以下の式を用いて3D点を画像平面へ投影できる。

\[s\ \left [ \begin{matrix} u \\ v \\ 1 \end{matrix} \right ] = \left [ \begin{matrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0 & 1 \end{matrix} \right ] \left [ \begin{matrix} r_{11} & r_{12} & r_{13} & t_1 \\ r_{21} & r_{22} & r_{23} & t_2 \\ r_{31} & r_{32} & r_{33} & t_3 \end{matrix} \right ] \left [ \begin{matrix} X \\ Y \\ Z\\ 1 \end{matrix} \right ]\]

これらの方程式の扱い方に関する完全なドキュメントは 3d にある。

ソースコード

このチュートリアルのソースコードは、OpenCVソースライブラリの samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/ フォルダにある。

このチュートリアルは2つの主要なプログラムから構成される。

モデル登録

このアプリケーションは、検出対象の物体のテクスチャ付き3Dモデルを持っていないユーザー向けである。このプログラムを使って独自のテクスチャ付き3Dモデルを作成できる。このプログラムは平面物体に対してのみ動作するため、複雑な形状の物体をモデル化したい場合は、高度なソフトウェアを使って作成すべきである。

このアプリケーションは、登録する物体の入力画像とその3Dメッシュを必要とする。さらに、入力画像が撮影されたカメラの内部パラメータも与える必要がある。すべてのファイルは、絶対パス、またはアプリケーションの作業ディレクトリからの相対パスで指定する必要がある。ファイルが指定されない場合、プログラムは用意されたデフォルトパラメータを開こうとする。

このアプリケーションは、入力画像からORB特徴と記述子を抽出するところから始まり、次にメッシュと Möller–Trumbore交差判定アルゴリズムを用いて、見つかった特徴の3D座標を計算する。最後に、3D点と記述子はYAML形式のファイル内の異なるリストに格納され、各行が異なる点となる。ファイルの格納方法に関する技術的背景は、XML / YAML / JSONファイルを用いたファイル入出力チュートリアルにある。

モデル検出

このアプリケーションの目的は、物体のテクスチャ付き3Dモデルが与えられたときに、その物体姿勢をリアルタイムで推定することである。

アプリケーションは、モデル登録プログラムで説明したのと同じ構造のYAMLファイル形式で3Dテクスチャ付きモデルを読み込んで起動する。シーンから、ORBの特徴量と記述子が検出・抽出される。次に、シーンの記述子とモデルの記述子間のマッチングを行うために cv::FlannBasedMatcher と cv::flann::GenericIndex が使われる。見つかったマッチと cv::solvePnPRansac 関数を用いて、カメラの R と t が計算される。最後に、不良なポーズを排除するためにKalmanFilterが適用される。

サンプル付きでOpenCVをコンパイルした場合は、opencv/build/bin/cpp-tutorial-pnp_detection にこれが見つかる。そして、アプリケーションを実行していくつかのパラメータを変更できる。
This program shows how to detect an object given its 3D textured model. You can choose to use a recorded video or the webcam.

Usage:

./cpp-tutorial-pnp_detection -help

Keys:

'esc' - to quit.

--------------------------------------------------------------------------

Usage: cpp-tutorial-pnp_detection [params]

-c, --confidence (value:0.95)

RANSAC confidence

-e, --error (value:2.0)

RANSAC reprojection error

-f, --fast (value:true)

use of robust fast match

-h, --help (value:true)

print this message

--in, --inliers (value:30)

minimum inliers for Kalman update

--it, --iterations (value:500)

RANSAC maximum iterations count

-k, --keypoints (value:2000)

number of keypoints to detect

--mesh

path to ply mesh

--method, --pnp (value:0)

PnP method: (0) ITERATIVE - (1) EPNP - (2) P3P - (3) DLS

--model

path to yml model

-r, --ratio (value:0.7)

threshold for ratio test

-v, --video

path to recorded video

たとえば、pnpメソッドを変更してアプリケーションを実行できる。
./cpp-tutorial-pnp_detection --method=2

解説

ここでは、リアルタイムアプリケーションのコードを詳細に説明する。

テクスチャ付き3D物体モデルと物体メッシュを読み込む。

テクスチャ付きモデルを読み込むために、YAMLファイルを開いて格納された3D点とそれに対応する記述子を取得する関数 load() を持つ クラス Model を実装した。テクスチャ付き3Dモデルの例は samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/cookies_ORB.yml にある。

void Model::load(const std::string &path)

{

cv::Mat points3d_mat;

cv::FileStorage storage(path, cv::FileStorage::READ);

storage["points_3d"] >> points3d_mat;

storage["descriptors"] >> descriptors_;

if (!storage["keypoints"].empty())

{

storage["keypoints"] >> list_keypoints_;

}

if (!storage["training_image_path"].empty())

{

storage["training_image_path"] >> training_img_path_;

}

points3d_mat.copyTo(list_points3d_in_);

storage.release();

}

cv::FileStorage::WRITE
@ WRITE
value, open the file for writing
Definition persistence.hpp:267

cv::Mat::release
void release()
Decrements the reference counter and deallocates the matrix if needed.

メインプログラムでは、モデルは次のように読み込まれる。
Model model; // instantiate Model object

model.load(yml_read_path); // load a 3D textured object model

モデルメッシュを読み込むために、\(*\).ply ファイルを開いて物体の3D点および構成する三角形を格納する関数 load() を持つ クラス Mesh を実装した。モデルメッシュの例は samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.ply にある。

void Mesh::load(const std::string& path)

{

cv::loadMesh(path, list_vertex_, list_triangles_);

// Update mesh attributes

num_vertices_ = (int)list_vertex_.size();

num_triangles_ = (int)list_triangles_.size();

}

メインプログラムでは、メッシュは次のように読み込まれる。
Mesh mesh; // instantiate Mesh object

mesh.load(ply_read_path); // load an object mesh

別のモデルやメッシュを読み込むこともできる。
./cpp-tutorial-pnp_detection --mesh=/absolute_path_to_your_mesh.ply --model=/absolute_path_to_your_model.yml
カメラまたはビデオから入力を取得する

検出するにはビデオをキャプチャする必要がある。これは、マシン上のビデオが置かれている絶対パスを渡して録画済みビデオを読み込むことで行う。アプリケーションをテストするには、samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/Data/box.mp4 に録画済みビデオがある。
cv::VideoCapture cap; // instantiate VideoCapture

cap.open(video_read_path); // open a recorded video

if(!cap.isOpened()) // check if we succeeded

{

std::cout << "Could not open the camera device" << std::endl;

return -1;

}

cv::VideoCapture
Class for video capturing from video files, image sequences or cameras.
Definition videoio.hpp:790

cv::VideoCapture::open
virtual bool open(const String &filename, int apiPreference=CAP_ANY)
Opens a video file or a capturing device or an IP video stream for video capturing.

cv::VideoCapture::isOpened
virtual bool isOpened() const
Returns true if video capturing has been initialized already.

そしてアルゴリズムはフレームごとに計算される。
cv::Mat frame, frame_vis;

while(cap.read(frame) && cv::waitKey(30) != 27) // capture frame until ESC is pressed

{

frame_vis = frame.clone(); // refresh visualisation frame

// MAIN ALGORITHM

}

cv::Mat
Comma-separated Matrix Initializer.
Definition mat.hpp:964

cv::Mat::clone
CV_NODISCARD_STD Mat clone() const
Creates a full copy of the array and the underlying data.

cv::VideoCapture::read
virtual bool read(OutputArray image)
Grabs, decodes and returns the next video frame.

cv::waitKey
int waitKey(int delay=0)
Waits for a pressed key.

別の録画済みビデオを読み込むこともできる。
./cpp-tutorial-pnp_detection --video=/absolute_path_to_your_video.mp4
シーンからORB特徴と記述子を抽出する

次のステップは、シーンの特徴量を検出してその記述子を抽出することである。このタスクのために、キーポイント検出と特徴量抽出のための関数を持つ class RobustMatcher を実装した。これは samples/cpp/tutorial_code/calib3d/real_time_pose_estimation/src/RobustMatcher.cpp にある。RobustMatch オブジェクトでは、OpenCVの任意の2D特徴量検出器を使用できる。今回は cv::ORB の特徴量を使用した。これはキーポイントの検出に cv::FAST を基にしており、記述子の抽出に cv::xfeatures2d::BriefDescriptorExtractor を使用するため、高速かつ回転に対してロバストである。ORB に関するより詳細な情報はドキュメントで確認できる。

以下のコードは、特徴検出器と記述子抽出器をインスタンス化して設定する方法である。

RobustMatcher rmatcher; // instantiate RobustMatcher

Ptr<FeatureDetector> detector, descriptor;

createFeatures(featureName, numKeyPoints, detector, descriptor);

rmatcher.setFeatureDetector(detector); // set feature detector

rmatcher.setDescriptorExtractor(descriptor); // set descriptor extractor

特徴と記述子は、マッチング関数の内部で RobustMatcher によって計算される。
Flannマッチャを使ってシーン記述子をモデル記述子とマッチングする

これは検出アルゴリズムの最初のステップである。主なアイデアは、現在のシーン内で見つかった特徴の3D座標を知るために、シーン記述子をモデル記述子とマッチングすることである。

まず、どのマッチャを使用するかを設定する必要がある。今回は cv::FlannBasedMatcher マッチャを使う。これは、学習済みの特徴コレクションが増えるにつれて、計算コストの面で cv::BFMatcher マッチャより高速である。そして、FlannBasedマッチャでは、ORB 記述子がバイナリであるため、作成されるインデックスは Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search となる。

LSH と探索パラメータを調整して、マッチングの効率を改善できる。
cv::Ptr<cv::flann::IndexParams> indexParams = cv::makePtr<cv::flann::LshIndexParams>(6, 12, 1); // instantiate LSH index parameters

cv::Ptr<cv::flann::SearchParams> searchParams = cv::makePtr<cv::flann::SearchParams>(50); // instantiate flann search parameters

cv::DescriptorMatcher * matcher = new cv::FlannBasedMatcher(indexParams, searchParams); // instantiate FlannBased matcher

rmatcher.setDescriptorMatcher(matcher); // set matcher

cv::DescriptorMatcher
Abstract base class for matching keypoint descriptors.
Definition features.hpp:961

cv::FlannBasedMatcher
Flann-based descriptor matcher.
Definition features.hpp:1260

cv::Ptr
std::shared_ptr< _Tp > Ptr
Definition cvstd_wrapper.hpp:23

cv::makePtr
static Ptr< _Tp > makePtr(const A1 &... a1)
Definition cvstd_wrapper.hpp:26

次に、robustMatch() または fastRobustMatch() 関数を使ってマッチャを呼び出す必要がある。この2つの関数を使う違いはその計算コストである。最初のメソッドは遅いが、2つの比率テストと対称性テストを使うため、良いマッチをフィルタリングする上でよりロバストである。対照的に、2番目のメソッドは高速だが、マッチに対して単一の比率テストしか適用しないためロバスト性は低い。

以下のコードは、モデルの3D点とその記述子を取得し、メインプログラムでマッチャを呼び出すものである。
// Get the MODEL INFO

std::vector<cv::Point3f> list_points3d_model = model.get_points3d(); // list with model 3D coordinates

cv::Mat descriptors_model = model.get_descriptors(); // list with descriptors of each 3D coordinate

vector<DMatch> good_matches; // to obtain the 3D points of the model

vector<KeyPoint> keypoints_scene; // to obtain the 2D points of the scene

if(fast_match)

{

rmatcher.fastRobustMatch(frame, good_matches, keypoints_scene, descriptors_model, keypoints_model);

}

else

{

rmatcher.robustMatch(frame, good_matches, keypoints_scene, descriptors_model, keypoints_model);

}

以下のコードは、RobustMatcher クラスに属する robustMatch() 関数に対応する。この関数は、与えられた画像を使ってキーポイントを検出して記述子を抽出し、抽出した記述子と与えられたモデル記述子を 2近傍(two Nearest Neighbour) を使って双方向にマッチングする。そして、最良マッチと2番目に良いマッチの距離比が与えられたしきい値より大きいマッチを除去するために、両方向のマッチに比率テストを適用する。最後に、非対称なマッチを除去するために対称性テストを適用する。

void RobustMatcher::robustMatch( const cv::Mat& frame, std::vector<cv::DMatch>& good_matches,

std::vector<cv::KeyPoint>& keypoints_frame, const cv::Mat& descriptors_model,

const std::vector<cv::KeyPoint>& keypoints_model)

{

// 1a. Detection of the ORB features

this->computeKeyPoints(frame, keypoints_frame);

// 1b. Extraction of the ORB descriptors

cv::Mat descriptors_frame;

this->computeDescriptors(frame, keypoints_frame, descriptors_frame);

// 2. Match the two image descriptors

std::vector<std::vector<cv::DMatch> > matches12, matches21;

// 2a. From image 1 to image 2

matcher_->knnMatch(descriptors_frame, descriptors_model, matches12, 2); // return 2 nearest neighbours

// 2b. From image 2 to image 1

matcher_->knnMatch(descriptors_model, descriptors_frame, matches21, 2); // return 2 nearest neighbours

// 3. Remove matches for which NN ratio is > than threshold

// clean image 1 -> image 2 matches

ratioTest(matches12);

// clean image 2 -> image 1 matches

ratioTest(matches21);

// 4. Remove non-symmetrical matches

symmetryTest(matches12, matches21, good_matches);

if (!training_img_.empty() && !keypoints_model.empty())

{

cv::drawMatches(frame, keypoints_frame, training_img_, keypoints_model, good_matches, img_matching_);

}

}

マッチのフィルタリング後、得られた DMatches ベクトルを使って、見つかったシーンキーポイントと3Dモデルから2Dと3Dの対応を抽出する必要がある。cv::DMatch に関するより詳しい情報はドキュメントを参照のこと。

vector<Point3f> list_points3d_model_match; // container for the model 3D coordinates found in the scene

vector<Point2f> list_points2d_scene_match; // container for the model 2D coordinates found in the scene

for(unsigned int match_index = 0; match_index < good_matches.size(); ++match_index)

{

Point3f point3d_model = list_points3d_model[ good_matches[match_index].trainIdx ]; // 3D point from model

Point2f point2d_scene = keypoints_scene[ good_matches[match_index].queryIdx ].pt; // 2D point from the scene

list_points3d_model_match.push_back(point3d_model); // add 3D point

list_points2d_scene_match.push_back(point2d_scene); // add 2D point

}

比率テストのしきい値や検出するキーポイント数を変更したり、ロバストマッチャを使うかどうかを変更したりすることもできる。
./cpp-tutorial-pnp_detection --ratio=0.8 --keypoints=1000 --fast=false
PnP + Ransacによる姿勢推定

2Dと3Dの対応が得られたら、カメラのポーズを推定するためにPnPアルゴリズムを適用する必要がある。cv::solvePnP ではなく cv::solvePnPRansac を使わなければならない理由は、マッチング後に見つかった対応がすべて正しいとは限らず、おそらく誤った対応、いわゆる 外れ値 (outliers) が存在するためである。Random Sample Consensus または Ransac は非決定論的な反復法であり、観測データから数学モデルのパラメータを推定し、反復回数が増えるにつれて近似的な結果を生成する。Ransac を適用した後、すべての 外れ値 が除去され、良好な解を一定の確率で得てカメラのポーズを推定する。

カメラ姿勢推定のために、クラス PnPProblem を実装した。この クラス は4つの属性を持つ。すなわち、与えられたキャリブレーション行列、回転行列、並進行列、回転並進行列である。姿勢推定に使用しているカメラの内部キャリブレーションパラメータが必要である。これらのパラメータを得るには、正方チェスボードによるカメラキャリブレーションと OpenCVによるカメラキャリブレーションのチュートリアルを確認できる。

以下のコードは、メインプログラムで PnPProblemクラス を宣言する方法である。

// Intrinsic camera parameters: UVC WEBCAM

double f = 55; // focal length in mm

double sx = 22.3, sy = 14.9; // sensor size

double width = 640, height = 480; // image size

double params_WEBCAM[] = { width*f/sx, // fx

height*f/sy, // fy

width/2, // cx

height/2}; // cy

PnPProblem pnp_detection(params_WEBCAM); // instantiate PnPProblem class

以下のコードは、PnPProblemクラス がその属性を初期化する方法である。

PnPProblem::PnPProblem(const double params[])

{

A_matrix_ = cv::Mat::zeros(3, 3, CV_64FC1); // intrinsic camera parameters

A_matrix_.at<double>(0, 0) = params[0]; // [ fx 0 cx ]

A_matrix_.at<double>(1, 1) = params[1]; // [ 0 fy cy ]

A_matrix_.at<double>(0, 2) = params[2]; // [ 0 0 1 ]

A_matrix_.at<double>(1, 2) = params[3];

A_matrix_.at<double>(2, 2) = 1;

R_matrix_ = cv::Mat::zeros(3, 3, CV_64FC1); // rotation matrix

t_matrix_ = cv::Mat::zeros(3, 1, CV_64FC1); // translation matrix

P_matrix_ = cv::Mat::zeros(3, 4, CV_64FC1); // rotation-translation matrix

}

OpenCVは4つのPnPメソッド、ITERATIVE、EPNP、P3P、DLSを提供する。アプリケーションの種類に応じて推定メソッドは異なる。リアルタイムアプリケーションを作りたい場合は、最適解を見つける際にITERATIVEやDLSより高速であるため、EPNPとP3Pがより適している。しかし、EPNPとP3Pは平面に対して特にロバストというわけではなく、姿勢推定が鏡像効果を持つように見えることがある。したがって、このチュートリアルでは、検出対象の物体が平面を持つため、ITERATIVEメソッドが使われる。

OpenCVのRANSAC実装では、3つのパラメータを与える必要がある。1) アルゴリズムが停止するまでの最大反復回数、2) 観測された点投影と計算された点投影との間で、それをインライアとみなすために許容される最大距離、3) 良い結果を得るための信頼度である。アルゴリズムの性能を向上させるために、これらのパラメータを調整できる。反復回数を増やすとより正確な解が得られるが、解を見つけるのに時間がかかる。再投影誤差を増やすと計算時間は減るが、解は不正確になる。信頼度を下げるとアルゴリズムは高速になるが、得られる解は不正確になる。

以下のパラメータがこのアプリケーションで機能する。
// RANSAC parameters

int iterationsCount = 500; // number of Ransac iterations.

float reprojectionError = 2.0; // maximum allowed distance to consider it an inlier.

float confidence = 0.95; // RANSAC successful confidence.

以下のコードは、PnPProblemクラス に属する estimatePoseRANSAC() 関数に対応する。この関数は、2D/3D対応の組、使用する所望のPnPメソッド、出力インライアコンテナ、Ransacパラメータが与えられたときに、回転行列と並進行列を推定する。

void PnPProblem::estimatePoseRANSAC( const std::vector<cv::Point3f> &list_points3d, // list with model 3D coordinates

const std::vector<cv::Point2f> &list_points2d, // list with scene 2D coordinates

int flags, cv::Mat &inliers, int iterationsCount, // PnP method; inliers container

float reprojectionError, double confidence ) // Ransac parameters

{

cv::Mat distCoeffs = cv::Mat::zeros(4, 1, CV_64FC1); // vector of distortion coefficients

cv::Mat rvec = cv::Mat::zeros(3, 1, CV_64FC1); // output rotation vector

cv::Mat tvec = cv::Mat::zeros(3, 1, CV_64FC1); // output translation vector

bool useExtrinsicGuess = false; // if true the function uses the provided rvec and tvec values as

// initial approximations of the rotation and translation vectors

cv::solvePnPRansac( list_points3d, list_points2d, A_matrix_, distCoeffs, rvec, tvec,

useExtrinsicGuess, iterationsCount, reprojectionError, confidence,

inliers, flags );

Rodrigues(rvec, R_matrix_); // converts Rotation Vector to Matrix

t_matrix_ = tvec; // set translation matrix

this->set_P_matrix(R_matrix_, t_matrix_); // set rotation-translation matrix

}

次のコードには、メインアルゴリズムの3番目と4番目のステップがある。1つ目は上記の関数を呼び出すこと、2つ目はRANSACから出力されたインライアのベクトルを取り出して、描画用の2Dシーン点を取得することである。コードに見られるように、マッチがある場合にRANSACを適用するように注意しなければならない。そうでない場合、cv::solvePnPRansac 関数は無効な入力（点が不足している）に対してアサートを投げる。
if(good_matches.size() > 4) // OpenCV requires solvePnPRANSAC to minimally have 4 set of points

{

// -- Step 3: Estimate the pose using RANSAC approach

pnp_detection.estimatePoseRANSAC( list_points3d_model_match, list_points2d_scene_match,

pnpMethod, inliers_idx, iterationsCount, reprojectionError, confidence );

// -- Step 4: Catch the inliers keypoints to draw

for(int inliers_index = 0; inliers_index < inliers_idx.rows; ++inliers_index)

{

int n = inliers_idx.at<int>(inliers_index); // i-inlier

cv::Point2f point2d = list_points2d_scene_match[n]; // i-inlier point 2D

list_points2d_inliers.push_back(point2d); // add i-inlier to list

}

cv::Point_< float >

最後に、カメラ姿勢が推定されたら、理論で示した式を用いて、ワールド基準座標系で表された与えられた3D点の画像上への2D投影を計算するために \(R\) と \(t\) を使用できる。

以下のコードは、PnPProblemクラス に属する backproject3DPoint() 関数に対応する。この関数は、ワールド基準座標系で表された与えられた3D点を2D画像上へ逆投影する。

cv::Point2f PnPProblem::backproject3DPoint(const cv::Point3f &point3d)

{

// 3D point vector [x y z 1]'

cv::Mat point3d_vec = cv::Mat(4, 1, CV_64FC1);

point3d_vec.at<double>(0) = point3d.x;

point3d_vec.at<double>(1) = point3d.y;

point3d_vec.at<double>(2) = point3d.z;

point3d_vec.at<double>(3) = 1;

// 2D point vector [u v 1]'

cv::Mat point2d_vec = cv::Mat(4, 1, CV_64FC1);

point2d_vec = A_matrix_ * P_matrix_ * point3d_vec;

// Normalization of [u v]'

cv::Point2f point2d;

point2d.x = (float)(point2d_vec.at<double>(0) / point2d_vec.at<double>(2));

point2d.y = (float)(point2d_vec.at<double>(1) / point2d_vec.at<double>(2));

return point2d;

}

上記の関数は、物体の姿勢を表示するために物体 Mesh のすべての3D点を計算するのに使われる。

RANSACパラメータやPnPメソッドを変更することもできる。
./cpp-tutorial-pnp_detection --error=0.25 --confidence=0.90 --iterations=250 --method=3
不正な姿勢を棄却するための線形カルマンフィルタ

コンピュータビジョンやロボティクス分野では、検出や追跡の手法を適用した後、何らかのセンサ誤差により悪い結果が得られることはよくある。これらの不正な検出を避けるために、このチュートリアルでは線形カルマンフィルタの実装方法を説明する。カルマンフィルタは、与えられた数のインライアが検出された後に適用される。

カルマンフィルタが何であるかについては、より詳しい情報を見つけることができる。このチュートリアルでは、ダイナミクスモデルと観測モデルを設定するために、位置と姿勢の追跡のための線形カルマンフィルタに基づくOpenCV実装の cv::KalmanFilter を使用する。

まず、18個の状態を持つ状態ベクトルを定義する必要がある。位置データ (x,y,z) とその1次および2次導関数(速度と加速度)、次に3つのオイラー角(ロール、ピッチ、ヨー)の形で回転が追加され、それらの1次および2次導関数(角速度と角加速度)が加わる。

\[X = (x,y,z,\dot x,\dot y,\dot z,\ddot x,\ddot y,\ddot z,\psi,\theta,\phi,\dot \psi,\dot \theta,\dot \phi,\ddot \psi,\ddot \theta,\ddot \phi)^T\]

次に、観測数を定義する必要がある。これは6となる。すなわち \(R\) と \(t\) から \((x,y,z)\) と \((\psi,\theta,\phi)\) を取り出せる。さらに、システムに適用する制御入力の数を定義する必要があるが、この場合はゼロとなる。最後に、観測間の差分時間を定義する必要があり、この場合は \(1/T\) である。ここで T は動画のフレームレートである。

KalmanFilter KF; // instantiate Kalman Filter

int nStates = 18; // the number of states

int nMeasurements = 6; // the number of measured states

int nInputs = 0; // the number of control actions

double dt = 0.125; // time between measurements (1/FPS)

initKalmanFilter(KF, nStates, nMeasurements, nInputs, dt); // init function

以下のコードはカルマンフィルタの初期化に対応する。まず、プロセスノイズ、観測ノイズ、誤差共分散行列を設定する。次に、動的モデルである遷移行列を設定し、最後に観測モデルである観測行列を設定する。

プロセスノイズと観測ノイズを調整してカルマンフィルタの性能を向上させることができる。観測ノイズを小さくするほど収束は速くなるが、その分アルゴリズムは不良な観測に対して敏感になる。

void initKalmanFilter(KalmanFilter &KF, int nStates, int nMeasurements, int nInputs, double dt)

{

KF.init(nStates, nMeasurements, nInputs, CV_64F); // init Kalman Filter

setIdentity(KF.processNoiseCov, Scalar::all(1e-5)); // set process noise

setIdentity(KF.measurementNoiseCov, Scalar::all(1e-2)); // set measurement noise

setIdentity(KF.errorCovPost, Scalar::all(1)); // error covariance

// [1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0 0 0]

// [0 1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0 0]

// [0 0 1 0 0 dt 0 0 dt2 0 0 0 0 0 0 0 0 0]

// [0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0 0 0]

// [0 0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0 0]

// [0 0 0 0 0 1 0 0 dt 0 0 0 0 0 0 0 0 0]

// [0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0]

// [0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0]

// [0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0]

// [0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2 0 0]

// [0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2 0]

// [0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0 dt2]

// [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0 0]

// [0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt 0]

// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 dt]

// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0]

// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0]

// [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1]

// position

KF.transitionMatrix.at<double>(0,3) = dt;

KF.transitionMatrix.at<double>(1,4) = dt;

KF.transitionMatrix.at<double>(2,5) = dt;

KF.transitionMatrix.at<double>(3,6) = dt;

KF.transitionMatrix.at<double>(4,7) = dt;

KF.transitionMatrix.at<double>(5,8) = dt;

KF.transitionMatrix.at<double>(0,6) = 0.5*std::pow(dt,2);

KF.transitionMatrix.at<double>(1,7) = 0.5*std::pow(dt,2);

KF.transitionMatrix.at<double>(2,8) = 0.5*std::pow(dt,2);

// orientation

KF.transitionMatrix.at<double>(9,12) = dt;

KF.transitionMatrix.at<double>(10,13) = dt;

KF.transitionMatrix.at<double>(11,14) = dt;

KF.transitionMatrix.at<double>(12,15) = dt;

KF.transitionMatrix.at<double>(13,16) = dt;

KF.transitionMatrix.at<double>(14,17) = dt;

KF.transitionMatrix.at<double>(9,15) = 0.5*std::pow(dt,2);

KF.transitionMatrix.at<double>(10,16) = 0.5*std::pow(dt,2);

KF.transitionMatrix.at<double>(11,17) = 0.5*std::pow(dt,2);

// [1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

// [0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

// [0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

// [0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0]

// [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0]

// [0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0]

KF.measurementMatrix.at<double>(0,0) = 1; // x

KF.measurementMatrix.at<double>(1,1) = 1; // y

KF.measurementMatrix.at<double>(2,2) = 1; // z

KF.measurementMatrix.at<double>(3,9) = 1; // roll

KF.measurementMatrix.at<double>(4,10) = 1; // pitch

KF.measurementMatrix.at<double>(5,11) = 1; // yaw

}

以下のコードはメインアルゴリズムの第5ステップである。Ransac後に得られたインライア数がしきい値を超えたとき、観測行列が埋められ、続いてカルマンフィルタが更新される:

// GOOD MEASUREMENT

if( inliers_idx.rows >= minInliersKalman )

{

// Get the measured translation

Mat translation_measured = pnp_detection.get_t_matrix();

// Get the measured rotation

Mat rotation_measured = pnp_detection.get_R_matrix();

// fill the measurements vector

fillMeasurements(measurements, translation_measured, rotation_measured);

good_measurement = true;

}

// update the Kalman filter with good measurements, otherwise with previous valid measurements

Mat translation_estimated(3, 1, CV_64FC1);

Mat rotation_estimated(3, 3, CV_64FC1);

updateKalmanFilter( KF, measurements,

translation_estimated, rotation_estimated);

以下のコードはfillMeasurements()関数に対応する。この関数は、観測された回転行列をオイラー角に変換し、観測された並進ベクトルとともに観測行列を埋める:

void fillMeasurements( Mat &measurements,

const Mat &translation_measured, const Mat &rotation_measured)

{

// Convert rotation matrix to euler angles

Mat measured_eulers(3, 1, CV_64F);

measured_eulers = rot2euler(rotation_measured);

// Set measurement to predict

measurements.at<double>(0) = translation_measured.at<double>(0); // x

measurements.at<double>(1) = translation_measured.at<double>(1); // y

measurements.at<double>(2) = translation_measured.at<double>(2); // z

measurements.at<double>(3) = measured_eulers.at<double>(0); // roll

measurements.at<double>(4) = measured_eulers.at<double>(1); // pitch

measurements.at<double>(5) = measured_eulers.at<double>(2); // yaw

}

CV_64FC1
#define CV_64FC1
Definition interface.h:113

以下のコードはupdateKalmanFilter()関数に対応する。この関数はカルマンフィルタを更新し、推定された回転行列と並進ベクトルを設定する。推定された回転行列は、推定されたオイラー角から回転行列へ変換することで得られる。

void updateKalmanFilter( KalmanFilter &KF, Mat &measurement,

Mat &translation_estimated, Mat &rotation_estimated )

{

// First predict, to update the internal statePre variable

Mat prediction = KF.predict();

// The "correct" phase that is going to use the predicted value and our measurement

Mat estimated = KF.correct(measurement);

// Estimated translation

translation_estimated.at<double>(0) = estimated.at<double>(0);

translation_estimated.at<double>(1) = estimated.at<double>(1);

translation_estimated.at<double>(2) = estimated.at<double>(2);

// Estimated euler angles

Mat eulers_estimated(3, 1, CV_64F);

eulers_estimated.at<double>(0) = estimated.at<double>(9);

eulers_estimated.at<double>(1) = estimated.at<double>(10);

eulers_estimated.at<double>(2) = estimated.at<double>(11);

// Convert estimated quaternion to rotation matrix

rotation_estimated = euler2rot(eulers_estimated);

}

第6ステップは、推定された回転・並進行列を設定することである:
// -- Step 6: Set estimated projection matrix

pnp_detection_est.set_P_matrix(rotation_estimated, translation_estimated);

最後の省略可能なステップは、求めた姿勢を描画することである。これを行うために、メッシュの3D点をすべて描画し、さらに参照座標軸を描画する関数を実装した:

float l = 5;

vector<Point2f> pose_points2d;

if (!good_measurement || displayFilteredPose)

{

drawObjectMesh(frame_vis, &mesh, &pnp_detection_est, yellow); // draw estimated pose

pose_points2d.push_back(pnp_detection_est.backproject3DPoint(Point3f(0,0,0))); // axis center

pose_points2d.push_back(pnp_detection_est.backproject3DPoint(Point3f(l,0,0))); // axis x

pose_points2d.push_back(pnp_detection_est.backproject3DPoint(Point3f(0,l,0))); // axis y

pose_points2d.push_back(pnp_detection_est.backproject3DPoint(Point3f(0,0,l))); // axis z

draw3DCoordinateAxes(frame_vis, pose_points2d); // draw axes

}

else

{

drawObjectMesh(frame_vis, &mesh, &pnp_detection, green); // draw current pose

pose_points2d.push_back(pnp_detection.backproject3DPoint(Point3f(0,0,0))); // axis center

pose_points2d.push_back(pnp_detection.backproject3DPoint(Point3f(l,0,0))); // axis x

pose_points2d.push_back(pnp_detection.backproject3DPoint(Point3f(0,l,0))); // axis y

pose_points2d.push_back(pnp_detection.backproject3DPoint(Point3f(0,0,l))); // axis z

draw3DCoordinateAxes(frame_vis, pose_points2d); // draw axes

}

カルマンフィルタを更新するための最小インライア数を変更することもできる:
./cpp-tutorial-pnp_detection --inliers=20

結果

以下の動画は、説明した検出アルゴリズムを用いて以下の引数でリアルタイム姿勢推定を行った結果である:

// Robust Matcher parameters
 
int numKeyPoints = 2000;      // number of detected keypoints
float ratio = 0.70f;          // ratio test
bool fast_match = true;       // fastRobustMatch() or robustMatch()
 
// RANSAC parameters
 
int iterationsCount = 500;    // number of Ransac iterations.
int reprojectionError = 2.0;  // maximum allowed distance to consider it an inlier.
float confidence = 0.95;      // ransac successful confidence.
 
// Kalman Filter parameters
 
int minInliersKalman = 30;    // Kalman threshold updating

リアルタイムの姿勢推定はこちらのYouTubeで視聴できる。


原著者	Edgar Riba
互換性	OpenCV >= 5.0

目次

目的

理論

ソースコード

解説

結果