CUDAストリームで使用するためのBufferPool。続きを読む...

#include <opencv2/core/cuda.hpp>

Collaboration diagram for cv::cuda::BufferPool:

公開メンバ関数
	BufferPool (Stream &stream)
	指定されたストリームのBufferPoolを取得する。

Ptr< GpuMat::Allocator >	getAllocator () const
	ストリームに関連付けられたアロケータを返す。

GpuMat	getBuffer (int rows, int cols, int type)
	指定されたサイズと型の新しいGpuMatを確保する。

GpuMat	getBuffer (Size size, int type)
	指定されたサイズと型の新しいGpuMatを確保する。

詳細説明

CUDAストリームで使用するためのBufferPool。

BufferPoolはStreamのアロケータを利用してGpuMat用の新しいバッファを作成する。これはsetBufferPoolUsageで有効化したときにのみ有用である。

setBufferPoolUsage(true);

cv::cuda::setBufferPoolUsage

void setBufferPoolUsage(bool on)

BufferPool management (must be called before Stream creation)

覚え書き: setBufferPoolUsageは、いかなるStreamの宣言よりも前に呼び出す必要がある。

ユーザはStream用にカスタムアロケータを指定でき、同じ基盤となるGPUメモリ管理を利用した独自のストリームベース関数を実装できる。

カスタムアロケータが指定されない場合、BufferPoolはデフォルトでStackAllocatorを利用する。StackAllocatorはあらかじめGPUデバイスメモリのチャンクを確保しておき、後でGpuMatが宣言されると、その事前確保されたメモリが与えられる。この種の戦略により、cudaMallocやcudaMallocPitchといったメモリ確保APIの呼び出し回数が削減される。

以下はStackAllocatorを使ったBufferPoolの例である:

#include <opencv2/opencv.hpp>
 
using namespace cv;
using namespace cv::cuda
 
int main()
{
 setBufferPoolUsage(true);                               // Tell OpenCV that we are going to utilize BufferPool
 setBufferPoolConfig(getDevice(), 1024 * 1024 * 64, 2);  // Allocate 64 MB, 2 stacks (default is 10 MB, 5 stacks)
 
 Stream stream1, stream2;                                // Each stream uses 1 stack
 BufferPool pool1(stream1), pool2(stream2);
 
 GpuMat d_src1 = pool1.getBuffer(4096, 4096, CV_8UC1);   // 16MB
 GpuMat d_dst1 = pool1.getBuffer(4096, 4096, CV_8UC3);   // 48MB, pool1 is now full
 
 GpuMat d_src2 = pool2.getBuffer(1024, 1024, CV_8UC1);   // 1MB
 GpuMat d_dst2 = pool2.getBuffer(1024, 1024, CV_8UC3);   // 3MB
 
 cvtColor(d_src1, d_dst1, cv::COLOR_GRAY2BGR, 0, stream1);
 cvtColor(d_src2, d_dst2, cv::COLOR_GRAY2BGR, 0, stream2);
}

上記の例でpool1にもう1つGpuMatを確保すると、pool1のスタックが満杯であるため、それはDefaultAllocatorによって実行される。

GpuMat d_add1 = pool1.getBuffer(1024, 1024, CV_8UC1); // Stack for pool1 is full, memory is allocated with DefaultAllocator

上記の例で3つ目のストリームが宣言され、そのストリーム内でgetBufferで確保する場合、スタックを使い果たしているため、これもDefaultAllocatorによって実行される。

Stream stream3;                                         // Only 2 stacks were allocated, we've run out of stacks
BufferPool pool3(stream3);
GpuMat d_src3 = pool3.getBuffer(1024, 1024, CV_8UC1);   // Memory is allocated with DefaultAllocator

警告: StackAllocatorを利用する場合、解放の順序が重要である。

スタックと同様に、解放はLIFO順で行わなければならない。以下はLIFO規則に違反する誤った使用例である。OpenCVがDebugモードでコンパイルされている場合、このサンプルコードはCV_Assertエラーを発生させる。

int main()
{
 setBufferPoolUsage(true);                               // Tell OpenCV that we are going to utilize BufferPool
 Stream stream;                                          // A default size (10 MB) stack is allocated to this stream
 BufferPool pool(stream);
 
 GpuMat mat1 = pool.getBuffer(1024, 1024, CV_8UC1);      // Allocate mat1 (1MB)
 GpuMat mat2 = pool.getBuffer(1024, 1024, CV_8UC1);      // Allocate mat2 (1MB)
 
    mat1.release();                                         // erroneous usage : mat2 must be deallocated before mat1
}

C++のローカル変数は構築の逆順で破棄されるため、以下のサンプルコードはLIFO規則を満たす。ローカルのGpuMatは解放され、対応するメモリは後で再利用するために自動的にプールへ返却される。

int main()
{
 setBufferPoolUsage(true);                               // Tell OpenCV that we are going to utilize BufferPool
 setBufferPoolConfig(getDevice(), 1024 * 1024 * 64, 2);  // Allocate 64 MB, 2 stacks (default is 10 MB, 5 stacks)
 
 Stream stream1, stream2;                                // Each stream uses 1 stack
 BufferPool pool1(stream1), pool2(stream2);
 
 for (int i = 0; i < 10; i++)
    {
 GpuMat d_src1 = pool1.getBuffer(4096, 4096, CV_8UC1);   // 16MB
 GpuMat d_dst1 = pool1.getBuffer(4096, 4096, CV_8UC3);   // 48MB, pool1 is now full
 
 GpuMat d_src2 = pool2.getBuffer(1024, 1024, CV_8UC1);   // 1MB
 GpuMat d_dst2 = pool2.getBuffer(1024, 1024, CV_8UC3);   // 3MB
 
        d_src1.setTo(Scalar(i), stream1);
        d_src2.setTo(Scalar(i), stream2);
 
 cvtColor(d_src1, d_dst1, cv::COLOR_GRAY2BGR, 0, stream1);
 cvtColor(d_src2, d_dst2, cv::COLOR_GRAY2BGR, 0, stream2);
 // The order of destruction of the local variables is:
 //   d_dst2 => d_src2 => d_dst1 => d_src1
 // LIFO rule is satisfied, this code runs without error
    }
}

構築子と解体子の詳解

◆ BufferPool()

cv::cuda::BufferPool::BufferPool ( Stream & stream )

explicit

Python:
	cv.cuda.BufferPool(	stream	) ->	<cuda_BufferPool object>

指定されたストリームのBufferPoolを取得する。

メンバ関数詳解

◆ getAllocator()

Ptr< GpuMat::Allocator > cv::cuda::BufferPool::getAllocator ( ) const

inline

Python:
	cv.cuda.BufferPool.getAllocator(		) ->	retval

ストリームに関連付けられたアロケータを返す。

◆ getBuffer() [1/2]

GpuMat cv::cuda::BufferPool::getBuffer	(	int	rows,
		int	cols,
		int	type )

Python:
	cv.cuda.BufferPool.getBuffer(	rows, cols, type	) ->	retval
	cv.cuda.BufferPool.getBuffer(	size, type	) ->	retval

指定されたサイズと型の新しいGpuMatを確保する。

◆ getBuffer() [2/2]

GpuMat cv::cuda::BufferPool::getBuffer	(	Size	size,
		int	type )

inline

Python:
	cv.cuda.BufferPool.getBuffer(	rows, cols, type	) ->	retval
	cv.cuda.BufferPool.getBuffer(	size, type	) ->	retval

指定されたサイズと型の新しいGpuMatを確保する。

この関数の呼び出しグラフ:

このクラス詳解は次のファイルから抽出されました:

opencv2/core/cuda.hpp