はじめに

このチュートリアルでは以下を学ぶ:

顔の美化アルゴリズムのサンプルの基礎；
G-APIを使ってパイプライン内で異なるネットワークを推論する方法；
ビデオストリーム上でG-APIパイプラインを実行する方法。

前提条件

このサンプルには以下が必要である：

GNU/LinuxまたはMicrosoft WindowsのPC（Apple macOSもサポートされているが、テストはされていない）；
Intel® Distribution of OpenVINO™ Toolkit でビルドされたOpenCV 4.2以降（Intel® TBB でビルドするとなお良い）；
The following topologies from OpenVINO™ Toolkit Open Model Zoo:
- face-detection-adas-0001;
- facial-landmarks-35-adas-0002.

顔の美化アルゴリズム

ここでは、最新のディープラーニング技術と従来のコンピュータビジョンを組み合わせて、シンプルな顔の美化アルゴリズムを実装する。このアルゴリズムの基本的な考え方は、目や口のコントラストといった顔の特徴を保ちつつ、顔の肌をより滑らかにすることである。アルゴリズムはDNN推論を使って顔の各部位を識別し、見つかった部位ごとに異なるフィルタを適用し、基本的な画像演算を使ってそれらを最終結果に合成する：

アルゴリズムを簡潔に説明すると以下のとおりである：

入力画像 \(I\) はアンシャープマスクとバイラテラルフィルタ（それぞれ \(U\) と \(L\)）に渡される；
入力画像 \(I\) はSSDベースの顔検出器に渡される；
SSDの結果（\([1 \times 1 \times 200 \times 7]\) のblob）が解析され、顔の配列に変換される；
各顔はランドマーク検出器に渡される；
Based on landmarks found for every face, three image masks are generated:
- 背景マスク \(b\) — 元画像のどの領域をそのまま保持するかを示す；
- 顔の部位マスク \(p\) — 保持（シャープ化）すべき領域を識別する。
- 顔の肌マスク \(s\) — 平滑化すべき領域を識別する；
最終結果 \(O\) は上記の特徴の合成であり、\(O = b*I + p*U + s*L\) として計算される。

限られた特徴量の集合（顔ごとに全部位を含めてわずか35点）に基づいて顔要素のマスクを生成するのはそれほど単純ではなく、以下のセクションで説明する。

G-APIパイプラインの構築

ディープラーニングトポロジーの宣言

このサンプルでは2つのDNN検出器を使用する。各ネットワークは1つの入力を受け取り、1つの出力を生成する。G-APIでは、ネットワークはマクロ G_API_NET() で定義される：

G_API_NET(FaceDetector, <cv::GMat(cv::GMat)>, "face_detector");

G_API_NET(LandmDetector, <cv::GMat(cv::GMat)>, "landm_detector");

より詳しい情報については、「Face Analytics pipeline」チュートリアルで説明されているディープラーニングトポロジーの宣言を参照のこと。

処理グラフの記述

以下のコードは、上記のアルゴリズムのグラフを生成する：

 cv::GComputation pipeline([=]()
    {
 cv::GMat  gimgIn;                                                                           // input
 
 cv::GMat  faceOut  = cv::gapi::infer<custom::FaceDetector>(gimgIn);
        GArrayROI garRects = custom::GFacePostProc::on(faceOut, gimgIn, config::kConfThresh);       // post-proc
 
 cv::GArray<cv::GMat> landmOut  = cv::gapi::infer<custom::LandmDetector>(garRects, gimgIn);
 cv::GArray<Landmarks> garElems;                                                             // |
 cv::GArray<Contour>   garJaws;                                                              // |output arrays
        std::tie(garElems, garJaws)    = custom::GLandmPostProc::on(landmOut, garRects);            // post-proc
 cv::GArray<Contour> garElsConts;                                                            // face elements
 cv::GArray<Contour> garFaceConts;                                                           // whole faces
        std::tie(garElsConts, garFaceConts) = custom::GGetContours::on(garElems, garJaws);          // interpolation
 
 cv::GMat mskSharp        = custom::GFillPolyGContours::on(gimgIn, garElsConts);             // |
 cv::GMat mskSharpG       = cv::gapi::gaussianBlur(mskSharp, config::kGKernelSize,           // |
                                                          config::kGSigma);                         // |
 cv::GMat mskBlur         = custom::GFillPolyGContours::on(gimgIn, garFaceConts);            // |
 cv::GMat mskBlurG        = cv::gapi::gaussianBlur(mskBlur, config::kGKernelSize,            // |
                                                          config::kGSigma);                         // |draw masks
 // The first argument in mask() is Blur as we want to subtract from                         // |
 // BlurG the next step:                                                                     // |
 cv::GMat mskBlurFinal    = mskBlurG - cv::gapi::mask(mskBlurG, mskSharpG);                  // |
 cv::GMat mskFacesGaussed = mskBlurFinal + mskSharpG;                                        // |
 cv::GMat mskFacesWhite   = cv::gapi::threshold(mskFacesGaussed, 0, 255, cv::THRESH_BINARY); // |
 cv::GMat mskNoFaces      = cv::gapi::bitwise_not(mskFacesWhite);                            // |
 
 cv::GMat gimgBilat       = custom::GBilatFilter::on(gimgIn, config::kBSize,
                                                            config::kBSigmaCol, config::kBSigmaSp);
 cv::GMat gimgSharp       = custom::unsharpMask(gimgIn, config::kUnshSigma,
                                                       config::kUnshStrength);
 // Applying the masks
 // Custom function mask3C() should be used instead of just gapi::mask()
 //  as mask() provides CV_8UC1 source only (and we have CV_8U3C)
 cv::GMat gimgBilatMasked = custom::mask3C(gimgBilat, mskBlurFinal);
 cv::GMat gimgSharpMasked = custom::mask3C(gimgSharp, mskSharpG);
 cv::GMat gimgInMasked    = custom::mask3C(gimgIn,    mskNoFaces);
 cv::GMat gimgBeautif = gimgBilatMasked + gimgSharpMasked + gimgInMasked;
 return cv::GComputation(cv::GIn(gimgIn), cv::GOut(gimgBeautif,
 cv::gapi::copy(gimgIn),
                                                          garFaceConts,
                                                          garElsConts,
                                                          garRects));
    });

結果として得られるグラフは、G-APIの標準演算、ユーザー定義演算（名前空間 custom::）、およびDNN推論の混合である。汎用関数 cv::gapi::infer<>() により、パイプライン内で推論をトリガーできる。推論するネットワークはテンプレート引数として指定する。このサンプルコードでは、cv::gapi::infer<>() の2つのバージョンを使用している：

フレーム指向のものは、入力フレーム上の顔を検出するために使用される。
ROIリスト指向のものは、顔のリスト上でランドマーク推論を実行するために使用される — このバージョンは顔ごとにランドマークの配列を生成する。

詳細は「Face Analytics pipeline」（GComputationの構築のセクション）を参照のこと。

G-APIによるアンシャープマスク

画像 \(I\) に対するアンシャープマスク \(U\) は次のように定義される：

\[U = I - s * L(M(I)),\]

ここで \(M()\) はメディアンフィルタ、\(L()\) はラプラス演算子、\(s\) は強度係数である。G-APIはこの関数を標準で提供していないが、既存のG-API演算で自然に表現できる：

inline cv::GMat custom::unsharpMask(const cv::GMat &src,
 const int       sigma,
 const float     strength)
{
 cv::GMat blurred   = cv::gapi::medianBlur(src, sigma);
 cv::GMat laplacian = custom::GLaplacian::on(blurred, CV_8U);
 return (src - (laplacian * strength));
}

上記のコードスニペットは、G-APIの型で定義された通常のC++関数であることに注意。ユーザーはこのような関数を記述してグラフの構築を簡素化できる。呼び出されると、この関数は使用先のパイプラインに関連するノードを追加するだけである。

カスタム演算

顔の美化グラフはカスタム演算を多用している。本章では最も興味深いカーネルに焦点を当てる。G-APIにおける演算の定義とカーネルの実装に関する一般的な情報については、G-API Kernel API を参照のこと。

顔検出器の後処理

顔検出器の出力は、以下のカーネルで顔の配列に変換される：

using VectorROI = std::vector<cv::Rect>;

GAPI_OCV_KERNEL(GCPUFacePostProc, GFacePostProc)
{
 static void run(const cv::Mat   &inDetectResult,
 const cv::Mat   &inFrame,
 const float      faceConfThreshold,
                          VectorROI &outFaces)
    {
 const int kObjectSize  = 7;
 const int imgCols = inFrame.size().width;
 const int imgRows = inFrame.size().height;
 const cv::Rect borders({0, 0}, inFrame.size());
        outFaces.clear();
 const int    numOfDetections = inDetectResult.size[2];
 const float *data            = inDetectResult.ptr<float>();
 for (int i = 0; i < numOfDetections; i++)
        {
 const float faceId         = data[i * kObjectSize + 0];
 if (faceId < 0.f)  // indicates the end of detections
            {
 break;
            }
 const float faceConfidence = data[i * kObjectSize + 2];
 // We can cut detections by the `conf` field
 //  to avoid mistakes of the detector.
 if (faceConfidence > faceConfThreshold)
            {
 const float left   = data[i * kObjectSize + 3];
 const float top    = data[i * kObjectSize + 4];
 const float right  = data[i * kObjectSize + 5];
 const float bottom = data[i * kObjectSize + 6];
 // These are normalized coordinates and are between 0 and 1;
 //  to get the real pixel coordinates we should multiply it by
 //  the image sizes respectively to the directions:
 cv::Point tl(toIntRounded(left   * imgCols),
                             toIntRounded(top    * imgRows));
 cv::Point br(toIntRounded(right  * imgCols),
                             toIntRounded(bottom * imgRows));
                outFaces.push_back(cv::Rect(tl, br) & borders);
            }
        }
    }
};

顔のランドマークの後処理

このアルゴリズムは、OpenVINO™ Open Model Zoo の汎用顔ランドマーク検出器（詳細）を使って、目・口・頭部の輪郭そのものといった顔要素の位置を推論する。しかし、検出されたランドマークはそのままではマスクを生成するのに十分ではない — この操作には閉じた輪郭で表される顔上の関心領域が必要であるため、それらを得るために補間が適用される。このランドマーク処理と補間は、以下のカーネルで実行される：

GAPI_OCV_KERNEL(GCPUGetContours, GGetContours)
{
 static void run(const std::vector<Landmarks> &vctPtsFaceElems,  // 18 landmarks of the facial elements
 const std::vector<Contour>   &vctCntJaw,        // 17 landmarks of a jaw
                          std::vector<Contour>   &vctElemsContours,
                          std::vector<Contour>   &vctFaceContours)
    {
 size_t numFaces = vctCntJaw.size();
 CV_Assert(numFaces == vctPtsFaceElems.size());
 CV_Assert(vctElemsContours.size() == 0ul);
 CV_Assert(vctFaceContours.size()  == 0ul);
 // vctFaceElemsContours will store all the face elements' contours found
 //  in an input image, namely 4 elements (two eyes, nose, mouth) for every detected face:
        vctElemsContours.reserve(numFaces * 4);
 // vctFaceElemsContours will store all the faces' contours found in an input image:
        vctFaceContours.reserve(numFaces);
 
        Contour cntFace, cntLeftEye, cntRightEye, cntNose, cntMouth;
        cntNose.reserve(4);
 
 for (size_t i = 0ul; i < numFaces; i++)
        {
 // The face elements contours
 
 // A left eye:
 // Approximating the lower eye contour by half-ellipse (using eye points) and storing in cntLeftEye:
            cntLeftEye = getEyeEllipse(vctPtsFaceElems[i][1], vctPtsFaceElems[i][0]);
 // Pushing the left eyebrow clock-wise:
            cntLeftEye.insert(cntLeftEye.end(), {vctPtsFaceElems[i][12], vctPtsFaceElems[i][13],
                                                 vctPtsFaceElems[i][14]});
 
 // A right eye:
 // Approximating the lower eye contour by half-ellipse (using eye points) and storing in vctRightEye:
            cntRightEye = getEyeEllipse(vctPtsFaceElems[i][2], vctPtsFaceElems[i][3]);
 // Pushing the right eyebrow clock-wise:
            cntRightEye.insert(cntRightEye.end(), {vctPtsFaceElems[i][15], vctPtsFaceElems[i][16],
                                                   vctPtsFaceElems[i][17]});
 
 // A nose:
 // Storing the nose points clock-wise
            cntNose.clear();
            cntNose.insert(cntNose.end(), {vctPtsFaceElems[i][4], vctPtsFaceElems[i][7],
                                           vctPtsFaceElems[i][5], vctPtsFaceElems[i][6]});
 
 // A mouth:
 // Approximating the mouth contour by two half-ellipses (using mouth points) and storing in vctMouth:
            cntMouth = getPatchedEllipse(vctPtsFaceElems[i][8], vctPtsFaceElems[i][9],
                                         vctPtsFaceElems[i][10], vctPtsFaceElems[i][11]);
 
 // Storing all the elements in a vector:
            vctElemsContours.insert(vctElemsContours.end(), {cntLeftEye, cntRightEye, cntNose, cntMouth});
 
 // The face contour:
 // Approximating the forehead contour by half-ellipse (using jaw points) and storing in vctFace:
            cntFace = getForeheadEllipse(vctCntJaw[i][0], vctCntJaw[i][16], vctCntJaw[i][8]);
 // The ellipse is drawn clock-wise, but jaw contour points goes vice versa, so it's necessary to push
 //  cntJaw from the end to the begin using a reverse iterator:
            std::copy(vctCntJaw[i].crbegin(), vctCntJaw[i].crend(), std::back_inserter(cntFace));
 // Storing the face contour in another vector:
            vctFaceContours.push_back(cntFace);
        }
    }
};

このカーネルは、非正規化されたランドマーク座標の2つの配列を受け取り、要素の閉じた輪郭の配列と顔の閉じた輪郭の配列を返す。言い換えると、出力は、1つ目がシャープ化される画像領域の輪郭の配列、2つ目が平滑化される領域の輪郭のもう1つの配列である。

ここおよび以下では、Contour は点のベクトルである。

目の輪郭の取得

目の輪郭は以下の関数で推定される：

inline int custom::getLineInclinationAngleDegrees(const cv::Point &ptLeft, const cv::Point &ptRight)
{
 const cv::Point residual = ptRight - ptLeft;
 if (residual.y == 0 && residual.x == 0)
 return 0;
 else
 return toIntRounded(atan2(toDouble(residual.y), toDouble(residual.x)) * 180.0 / CV_PI);
}

inline Contour custom::getEyeEllipse(const cv::Point &ptLeft, const cv::Point &ptRight)
{
    Contour cntEyeBottom;
 const cv::Point ptEyeCenter((ptRight + ptLeft) / 2);
 const int angle = getLineInclinationAngleDegrees(ptLeft, ptRight);
 const int axisX = toIntRounded(cv::norm(ptRight - ptLeft) / 2.0);
 // According to research, in average a Y axis of an eye is approximately
 //  1/3 of an X one.
 const int axisY = axisX / 3;
 // We need the lower part of an ellipse:
 static constexpr int kAngEyeStart = 0;
 static constexpr int kAngEyeEnd   = 180;
 cv::ellipse2Poly(ptEyeCenter, cv::Size(axisX, axisY), angle, kAngEyeStart, kAngEyeEnd, config::kAngDelta,
                     cntEyeBottom);
 return cntEyeBottom;
}

簡潔に言うと、この関数は左右の目尻の2点に基づいて半楕円で目の下側を復元する。実際には、cv::ellipse2Poly() を使って目の領域を近似し、関数はわずか2点に基づいて楕円のパラメータを定義するだけである：

楕円の中心と \(X\) 半軸は2つの目のPointから計算される；
\(Y\) 半軸は、平均的な目の幅がその長さの \(1/3\) であるという仮定に従って計算される；
開始角と終了角はそれぞれ0と180である（cv::ellipse() のドキュメントを参照）；
角度デルタ：輪郭にどれだけの点を生成するか；
軸の傾斜角。

関数 custom::getLineInclinationAngleDegrees() において、単なる atan() ではなく atan2() を使うことは本質的に重要である。なぜなら、x と y の符号に応じて負の値を返すことができるため、顔が上下逆さまに配置されている場合でも（もちろん点を正しい順序で与えれば）正しい角度を得られるからである。

額の輪郭の取得

この関数は額の輪郭を近似する：

inline Contour custom::getForeheadEllipse(const cv::Point &ptJawLeft,
 const cv::Point &ptJawRight,
 const cv::Point &ptJawLower)
{
    Contour cntForehead;
 // The point amid the top two points of a jaw:
 const cv::Point ptFaceCenter((ptJawLeft + ptJawRight) / 2);
 // This will be the center of the ellipse.
 
 // The angle between the jaw and the vertical:
 const int angFace = getLineInclinationAngleDegrees(ptJawLeft, ptJawRight);
 // This will be the inclination of the ellipse
 
 // Counting the half-axis of the ellipse:
 const double jawWidth  = cv::norm(ptJawLeft - ptJawRight);
 // A forehead width equals the jaw width, and we need a half-axis:
 const int axisX        = toIntRounded(jawWidth / 2.0);
 
 const double jawHeight = cv::norm(ptFaceCenter - ptJawLower);
 // According to research, in average a forehead is approximately 2/3 of
 //  a jaw:
 const int axisY        = toIntRounded(jawHeight * 2 / 3.0);
 
 // We need the upper part of an ellipse:
 static constexpr int kAngForeheadStart = 180;
 static constexpr int kAngForeheadEnd   = 360;
 cv::ellipse2Poly(ptFaceCenter, cv::Size(axisX, axisY), angFace, kAngForeheadStart, kAngForeheadEnd,
                     config::kAngDelta, cntForehead);
 return cntForehead;
}

検出されたランドマークにはあごの点しかないため、あごの3点（最も左、最も右、最も下の点）に基づいて半楕円を求める必要がある。あごの幅は額の幅と等しいと仮定し、後者は左右の点を使って計算する。\(Y\) 軸については、直接求める点がないため、代わりに額の高さがあごの高さの約 \(2/3\) であると仮定する。これは顔の中心（左右の点の中間）と最も下のあごの点から求められる。

マスクの描画

必要なすべての輪郭が揃ったら、マスクを描画できる：

 cv::GMat mskSharp        = custom::GFillPolyGContours::on(gimgIn, garElsConts);             // |
 cv::GMat mskSharpG       = cv::gapi::gaussianBlur(mskSharp, config::kGKernelSize,           // |
                                                          config::kGSigma);                         // |
 cv::GMat mskBlur         = custom::GFillPolyGContours::on(gimgIn, garFaceConts);            // |
 cv::GMat mskBlurG        = cv::gapi::gaussianBlur(mskBlur, config::kGKernelSize,            // |
                                                          config::kGSigma);                         // |draw masks
 // The first argument in mask() is Blur as we want to subtract from                         // |
 // BlurG the next step:                                                                     // |
 cv::GMat mskBlurFinal    = mskBlurG - cv::gapi::mask(mskBlurG, mskSharpG);                  // |
 cv::GMat mskFacesGaussed = mskBlurFinal + mskSharpG;                                        // |
 cv::GMat mskFacesWhite   = cv::gapi::threshold(mskFacesGaussed, 0, 255, cv::THRESH_BINARY); // |
 cv::GMat mskNoFaces      = cv::gapi::bitwise_not(mskFacesWhite);                            // |

マスクを取得する手順は以下のとおりである：

the "sharp" mask calculation:
- シャープ化すべき輪郭を塗りつぶす；
- それを平滑化して「シャープ」マスク（mskSharpG）を得る；
the "bilateral" mask calculation:
- すべての顔の輪郭を完全に塗りつぶす；
- それを平滑化する；
- 「シャープ」マスクと交差する領域を減算する — そして「バイラテラル」マスク（mskBlurFinal）を得る；
the background mask calculation:
- 前の2つのマスクを加算する
- 結果の非ゼロのピクセルをすべて255に設定する（cv::gapi::threshold() による）
- 出力を反転して（cv::gapi::bitwise_not による）、背景マスク（mskNoFaces）を得る。

パイプラインの設定と実行

グラフが完全に表現されたら、ようやくそれをコンパイルして実データ上で実行できる。G-APIグラフのコンパイルは、G-APIフレームワークがどのカーネルとネットワークを使用するかを実際に把握する段階である。この設定はG-APIコンパイル引数を介して行われる。

DNNパラメータ

このサンプルでは、DL推論にOpenVINO™ Toolkit Inference Engineバックエンドを使用しており、次のように構成されている:

 auto faceParams  = cv::gapi::ie::Params<custom::FaceDetector>
    {
 /*std::string*/ faceXmlPath,
 /*std::string*/ faceBinPath,
 /*std::string*/ faceDevice
    };
 auto landmParams = cv::gapi::ie::Params<custom::LandmDetector>
    {
 /*std::string*/ landmXmlPath,
 /*std::string*/ landmBinPath,
 /*std::string*/ landmDevice
    };

すべての cv::gapi::ie::Params<> オブジェクトは、そのテンプレート引数で指定されたネットワークに関連付けられる。チュートリアルの冒頭で G_API_NET() によって定義したネットワーク型を、ここに渡す必要がある。

続いて、ネットワーク引数を cv::gapi::NetworkPackage でラップする:

auto networks = cv::gapi::networks(faceParams, landmParams);

詳細は "Face Analytics Pipeline" の(Configuring the pipeline 節)を参照。

カーネルパッケージ

この例では多数のカスタムカーネルを使用し、加えて、該当する箇所ではFluidバックエンドを使ってG-APIの標準カーネルのメモリを最適化している。結果として得られるカーネルパッケージは次のように構成される:

 auto customKernels = cv::gapi::kernels<custom::GCPUBilateralFilter,
                                           custom::GCPULaplacian,
                                           custom::GCPUFillPolyGContours,
                                           custom::GCPUPolyLines,
                                           custom::GCPURectangle,
                                           custom::GCPUFacePostProc,
                                           custom::GCPULandmPostProc,
                                           custom::GCPUGetContours>();
 auto kernels       = cv::gapi::combine(cv::gapi::core::fluid::kernels(),
                                           customKernels);

ストリーミングパイプラインのコンパイル

G-APIは、"Streaming" モードでコンパイルすると、ビデオストリームの実行を最適化する。

cv::GStreamingCompiled stream = pipeline.compileStreaming(cv::compile_args(kernels, networks));

これについての詳細は "Face Analytics Pipeline" の(Configuring the pipeline 節)を参照。

ストリーミングパイプラインの実行

G-APIストリーミングパイプラインを実行するために必要なのは、入力ビデオソースを指定し、cv::GStreamingCompiled::start() を呼び出して、パイプラインの処理結果を取得することだけである:

 if (parser.has("input"))
        {
            stream.setSource(cv::gapi::wip::make_src<cv::gapi::wip::GCaptureSource>(parser.get<cv::String>("input")));
        }

 auto out_vector = cv::gout(imgBeautif, imgShow, vctFaceConts,
                                       vctElsConts, vctRects);
            stream.start();
            avg.start();
 while (stream.running())
            {
 if (!stream.try_pull(std::move(out_vector)))
                {
 // Use a try_pull() to obtain data.
 // If there's no data, let UI refresh (and handle keypress)
 if (cv::waitKey(1) >= 0) break;
 else continue;
                }
                frames++;
 // Drawing face boxes and landmarks if necessary:
 if (flgLandmarks == true)
                {
 cv::polylines(imgShow, vctFaceConts, config::kClosedLine,
                                  config::kClrYellow);
 cv::polylines(imgShow, vctElsConts, config::kClosedLine,
                                  config::kClrYellow);
                }
 if (flgBoxes == true)
 for (auto rect : vctRects)
 cv::rectangle(imgShow, rect, config::kClrGreen);
 cv::imshow(config::kWinInput,              imgShow);
 cv::imshow(config::kWinFaceBeautification, imgBeautif);
            }

結果が準備でき、パイプラインから取得できるようになったら、それを画面に表示し、GUIイベントを処理する。

詳細は "Face Analytics Pipeline" チュートリアルの Running the pipeline 節を参照。

まとめ

このチュートリアルには2つの目的がある。OpenCV 4.2で導入されたG-APIの新機能の使い方を示すことと、サンプルの顔の美化アルゴリズムの基本的な理解を与えることである。

アルゴリズムを適用した結果:

Face Beautification example

テストマシン(Intel® Core™ i7-8700)では、G-APIで最適化されたビデオパイプラインは、シリアル(非パイプライン)版を 2.7 倍上回る性能を示した。つまり、このような自明でないグラフであっても、適切なパイプライン化によって性能を3倍近く向上させられるということである。

目次