NVIDIA Jetson nanoで 4Kカメラは扱えるか？ GPU（CUDA）を使った魚眼レンズ補正処理の性能を徹底検証

はじめに
用意するもの
魚眼レンズ補正処理性能の検証
おわりに
関連記事

はじめに

広い範囲を撮影したい場合に利用する魚眼レンズですが、魚眼レンズで撮影した動画は四隅が歪んでしまいます。物体検出や動体検知の精度を上げるためには、この歪みを補正する必要があり、歪み補正の処理のことをキャリブレーション（calibration）と呼びます。

このキャリブレーション処理、動画の各フレームごとに複雑な行列計算をする必要があるので、結構な処理量となります。これを４K動画に行うとなると大変です。「Gstreamerのcameracalibrateを使って、簡単に魚眼レンズの歪み補正を行う方法」や「OpenCV＆Pythonで、簡単に魚眼レンズの歪み補正（Calibration）を行う方法」で紹介した、CPUを使った処理では、４K動画に対する30FPSでの処理は困難でした。

そこで、今回はJetson nanoに搭載のGPUを使って、このキャリブレーション処理を行い、Jetson nanoでリアルタイムキャリブレーション（4K@30fps）ができるか検証してみましたので、まとめます。

用意するもの

①ボードコンピュータ

今回の主役Jetson nanoです。Jetson nanoは安価な2GBモデルが発表になりましたが、今回は従来からある４Gモデルを利用します。

NVIDIA Jetson Nano 開発者キット B01

・Jetson Nano 開発者キットは、最新の画像分類、物体検出、セグメンテーション、音声処理などのアプリケーションを小型サイズ、低電力、低コストで実現可能です。・Jetson Nano 開発者キットは様々なセンサーをGPIO、CSI等を含む様々なI/Oでサポートしており、マイクロUSBによって電力を給電できます。...

②魚眼レンズ付きカメラモジュール

魚眼レンズ付きカメラモジュールです。今回は180度まで撮影可能な以下のカメラを使います。USB接続で、4K@30fps（MJPG）までの動画撮影にも対応しています。

ELP カメラ 4K Usb ウェブカメラミニカメラ広角170度魚眼レンズ WebカメラフルHD 2160P 30FPS カメラカメラ Sony IMX317/UVCサポート/プラグアンドプレイ/無料ドライバー動画配信家庭会議ゲーム実況授業カメラ対応Windows/Android/Mac/Linux カメラモジュール（モデル：ELP-USB4KHDR01-L170-JP）

ELP カメラ 4K Usb ウェブカメラミニカメラ広角170度魚眼レンズ WebカメラフルHD 2160P 30FPS カメラカメラ Sony IMX317/UVCサポート/プラグアンドプレイ/無料ドライバー動画配信家庭会議ゲーム実況授業カメラ対応Windows/Android/Mac/Linux カ...

参考までに、このカメラの「v4l2-ctl –list-formats-ext」の結果を載せておきます。

root@Jetson:~# v4l2-ctl --list-formats-ext
ioctl: VIDIOC_ENUM_FMT
	Index       : 0
	Type        : Video Capture
	Pixel Format: 'MJPG' (compressed)
	Name        : Motion-JPEG
		Size: Discrete 3840x2160
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 1920x1080
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 2592x1944
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 2048x1536
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 1600x1200
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 1280x960
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 1280x720
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 1024x768
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 800x600
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 640x480
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 320x240
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
 
	Index       : 1
	Type        : Video Capture
	Pixel Format: 'YUYV'
	Name        : YUYV 4:2:2
		Size: Discrete 3840x2160
			Interval: Discrete 1.000s (1.000 fps)
		Size: Discrete 1920x1080
			Interval: Discrete 0.333s (3.000 fps)
		Size: Discrete 2592x1944
			Interval: Discrete 1.000s (1.000 fps)
		Size: Discrete 2048x1536
			Interval: Discrete 0.333s (3.000 fps)
		Size: Discrete 1600x1200
			Interval: Discrete 0.333s (3.000 fps)
		Size: Discrete 1280x960
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 1280x720
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 1024x768
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 800x600
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 640x480
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
		Size: Discrete 320x240
			Interval: Discrete 0.033s (30.000 fps)
			Interval: Discrete 0.040s (25.000 fps)
			Interval: Discrete 0.050s (20.000 fps)
			Interval: Discrete 0.067s (15.000 fps)
			Interval: Discrete 0.100s (10.000 fps)
			Interval: Discrete 0.200s (5.000 fps)
 
root@Jetson:~#

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

root@Jetson:~# v4l2-ctl --list-formats-ext

ioctl: VIDIOC_ENUM_FMT

Index : 0

Type : Video Capture

Pixel Format: 'MJPG' (compressed)

Name : Motion-JPEG

Size: Discrete 3840x2160

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 1920x1080

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 2592x1944

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 2048x1536

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 1600x1200

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 1280x960

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 1280x720

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 1024x768

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 800x600

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 640x480

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 320x240

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Index : 1

Type : Video Capture

Pixel Format: 'YUYV'

Name : YUYV 4:2:2

Size: Discrete 3840x2160

Interval: Discrete 1.000s (1.000 fps)

Size: Discrete 1920x1080

Interval: Discrete 0.333s (3.000 fps)

Size: Discrete 2592x1944

Interval: Discrete 1.000s (1.000 fps)

Size: Discrete 2048x1536

Interval: Discrete 0.333s (3.000 fps)

Size: Discrete 1600x1200

Interval: Discrete 0.333s (3.000 fps)

Size: Discrete 1280x960

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 1280x720

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 1024x768

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 800x600

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 640x480

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

Size: Discrete 320x240

Interval: Discrete 0.033s (30.000 fps)

Interval: Discrete 0.040s (25.000 fps)

Interval: Discrete 0.050s (20.000 fps)

Interval: Discrete 0.067s (15.000 fps)

Interval: Discrete 0.100s (10.000 fps)

Interval: Discrete 0.200s (5.000 fps)

root@Jetson:~#

魚眼レンズ補正処理性能の検証

補正処理の実装

Gstreamerには、GPUを使って魚眼レンズの補正処理を行うエレメントは、用意されていませんので、自分でプログラムを実装する必要があります。

①環境の確認

魚眼レンズ補正処理の実装にあたって、以下の環境を用意しました。CUDAに対応したOpenCVがポイントです。CUDAに対応したOpenCVのインストール方法は、ここにあるシェススクリプトの通りです。

②Gstreamerパイプラインの設計

まずは、Gstreamerのパイプラインを設計します。以下が、USBカメラから取得した動画に対して歪み補正を行い、h264形式で保存するパイプラインです。

ポイントは２点です。1点目は、soライブラリを読み込んでオリジナルの処理を実行できるnvivafilterエレメントを利用する点です。ここで、nvivafilterに「cuda-process=true」のオプションをつけるとCUDAが利用可能です。２点目は、高速に処理するために、GPU上のメモリ「NVMM」でデータを扱う点です。

実際のパイプライン定義は以下の通りです。

$gst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ¥
! image/jpeg, width=3840, height=2160, framerate=30/1 ¥
! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ¥
! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ¥
! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ¥
! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ¥
! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=4K_h264HW_calib3.mp4

$gst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ¥

! image/jpeg, width=3840, height=2160, framerate=30/1 ¥

! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ¥

! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ¥

! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ¥

! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ¥

! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=4K_h264HW_calib3.mp4

このパイプラインは４〜６行目の歪み補正の部分を除けば、4K@30fpsでの動画取得に成功しています。詳しくは「NVIDIA Jetson nanoで 4Kカメラは扱えるか？エンコード性能を徹底検証」を参照下さい。

③歪み補正処理の実装

次に、歪み補正処理を行うプログラムを以下の通り作成します。ベースは、NVIDIA提供の「nvsample_cudaprocess_src.tbz2」に含まれる「nvsample_cudaprocess.cu」です。私が作成したプログラムとMakefileも置いておきます。

cuda_calibration ダウンロード

#include <stdio.h>
#include <stdlib.h>
#include <cuda.h>
#include "customer_functions.h"
#include "cudaEGL.h"
#include "iva_metadata.h"
#include "opencv2/core.hpp"
#include "opencv2/calib3d.hpp"
#include "opencv2/cudawarping.hpp" 
#include <opencv2/cudaarithm.hpp>

const int max_width = 3840;
const int max_height = 2160;

static cv::cuda::GpuMat gpu_xmap, gpu_ymap;
cv::cuda::Stream stream[1];

static void pre_process (void **sBaseAddr,unsigned int *smemsize,unsigned int *swidth,unsigned int *sheight,unsigned int *spitch,ColorFormat  *sformat,unsigned int nsurfcount, void ** usrptr){}

static void post_process (void **sBaseAddr,unsigned int *smemsize,unsigned int *swidth,unsigned int *sheight,unsigned int *spitch,ColorFormat  *sformat,unsigned int nsurfcount,void ** usrptr){}

/**
  * 魚眼レンズ補正を行う
  *
 */
static void cv_process_RGBA(void *pdata, int32_t width, int32_t height)
{
    cv::cuda::GpuMat d_Mat_RGBA(height, width, CV_8UC4, pdata);
    cv::cuda::GpuMat d_Mat_RGBA_Src;
    
    d_Mat_RGBA.copyTo(d_Mat_RGBA_Src,stream[0]);
    
    cv::cuda::remap(d_Mat_RGBA_Src, d_Mat_RGBA, gpu_xmap, gpu_ymap, cv::INTER_NEAREST, cv::BORDER_CONSTANT, cv::Scalar(0.f, 0.f, 0.f, 0.f),stream[0]);
}

/**
  * Performs CUDA Operations on egl image.
  * @param image : EGL image
  */
static void gpu_process (EGLImageKHR image, void ** usrptr)
{
  CUresult status;
  CUeglFrame eglFrame;
  CUgraphicsResource pResource = NULL;

  cudaFree(0);
  status = cuGraphicsEGLRegisterImage(&pResource, image, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);
  if (status != CUDA_SUCCESS) {
    printf("cuGraphicsEGLRegisterImage failed : %d \n", status);
    return;
  }

  status = cuGraphicsResourceGetMappedEglFrame( &eglFrame, pResource, 0, 0);
  if (status != CUDA_SUCCESS) {
    printf ("cuGraphicsSubResourceGetMappedArray failed\n");
  }

  //printf ("魚眼補正処理を開始します！ \n");

  if (eglFrame.frameType == CU_EGL_FRAME_TYPE_PITCH) {
    if (eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_ABGR) {
 	cv_process_RGBA(eglFrame.frame.pPitch[0], eglFrame.width, eglFrame.height);
    } else
        printf ("Invalid eglcolorformat %d\n", eglFrame.eglColorFormat);
  }

  status = cuGraphicsUnregisterResource(pResource);
  if (status != CUDA_SUCCESS) {
    printf("cuGraphicsEGLUnRegisterResource failed: %d \n", status);
  }
}

extern "C" void
init (CustomerFunction * pFuncs)
{
  printf ("魚眼補正エレメントをロードしました！ \n");

  pFuncs->fPreProcess = pre_process;
  pFuncs->fGPUProcess = gpu_process;
  pFuncs->fPostProcess = post_process;

  cv::Mat xmap(max_height, max_width, CV_32FC1);
  cv::Mat ymap(max_height, max_width, CV_32FC1);
  cv::Mat cam(3, 3, cv::DataType<float>::type);
  cv::Mat dist(4, 1, cv::DataType<float>::type);
  
  cam.at<float>(0, 0) = 1569.08571848262500f;
  cam.at<float>(0, 1) = 0.0f;
  cam.at<float>(0, 2) = 1897.44471769631582f;
  cam.at<float>(1, 0) = 0.0f;
  cam.at<float>(1, 1) = 1590.39243583626353f;
  cam.at<float>(1, 2) = 1063.73079452522120f;
  cam.at<float>(2, 0) = 0.0f;
  cam.at<float>(2, 1) = 0.0f;
  cam.at<float>(2, 2) = 1.0f;
  	
  dist.at<float>(0, 0) = -0.20963013646547f;
  dist.at<float>(1, 0) = 0.03491832650144f;
  dist.at<float>(2, 0) = 0.00160953572073f;
  dist.at<float>(3, 0) = -0.00896893124032f;
  dist.at<float>(4, 0) = -0.00194072642273f;

  cv::fisheye::initUndistortRectifyMap(cam, dist, cv::Mat(), cam, cv::Size(max_width, max_height), CV_32FC1, xmap, ymap);
	 
  /* upload to GpuMats */
  gpu_xmap.upload(xmap);
  gpu_ymap.upload(ymap);
}

extern "C" void
deinit (void)
{
  /* deinitialization */
}

int main(int argc, char* argv[]) {}

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

#include <stdio.h>

#include <stdlib.h>

#include <cuda.h>

#include "customer_functions.h"

#include "cudaEGL.h"

#include "iva_metadata.h"

#include "opencv2/core.hpp"

#include "opencv2/calib3d.hpp"

#include "opencv2/cudawarping.hpp"

#include <opencv2/cudaarithm.hpp>

const int max_width = 3840;

const int max_height = 2160;

static cv::cuda::GpuMat gpu_xmap, gpu_ymap;

cv::cuda::Stream stream[1];

static void pre_process (void **sBaseAddr,unsigned int *smemsize,unsigned int *swidth,unsigned int *sheight,unsigned int *spitch,ColorFormat *sformat,unsigned int nsurfcount, void ** usrptr){}

static void post_process (void **sBaseAddr,unsigned int *smemsize,unsigned int *swidth,unsigned int *sheight,unsigned int *spitch,ColorFormat *sformat,unsigned int nsurfcount,void ** usrptr){}

/**

* 魚眼レンズ補正を行う

static void cv_process_RGBA(void *pdata, int32_t width, int32_t height)

{

cv::cuda::GpuMat d_Mat_RGBA(height, width, CV_8UC4, pdata);

cv::cuda::GpuMat d_Mat_RGBA_Src;

d_Mat_RGBA.copyTo(d_Mat_RGBA_Src,stream[0]);

cv::cuda::remap(d_Mat_RGBA_Src, d_Mat_RGBA, gpu_xmap, gpu_ymap, cv::INTER_NEAREST, cv::BORDER_CONSTANT, cv::Scalar(0.f, 0.f, 0.f, 0.f),stream[0]);

}

/**

* Performs CUDA Operations on egl image.

* @param image : EGL image

static void gpu_process (EGLImageKHR image, void ** usrptr)

{

CUresult status;

CUeglFrame eglFrame;

CUgraphicsResource pResource = NULL;

cudaFree(0);

status = cuGraphicsEGLRegisterImage(&pResource, image, CU_GRAPHICS_MAP_RESOURCE_FLAGS_NONE);

if (status != CUDA_SUCCESS) {

printf("cuGraphicsEGLRegisterImage failed : %d \n", status);

return;

}

status = cuGraphicsResourceGetMappedEglFrame( &eglFrame, pResource, 0, 0);

if (status != CUDA_SUCCESS) {

printf ("cuGraphicsSubResourceGetMappedArray failed\n");

}

//printf ("魚眼補正処理を開始します！ \n");

if (eglFrame.frameType == CU_EGL_FRAME_TYPE_PITCH) {

if (eglFrame.eglColorFormat == CU_EGL_COLOR_FORMAT_ABGR) {

cv_process_RGBA(eglFrame.frame.pPitch[0], eglFrame.width, eglFrame.height);

} else

printf ("Invalid eglcolorformat %d\n", eglFrame.eglColorFormat);

}

status = cuGraphicsUnregisterResource(pResource);

if (status != CUDA_SUCCESS) {

printf("cuGraphicsEGLUnRegisterResource failed: %d \n", status);

}

extern "C" void

init (CustomerFunction * pFuncs)

{

printf ("魚眼補正エレメントをロードしました！ \n");

pFuncs->fPreProcess = pre_process;

pFuncs->fGPUProcess = gpu_process;

pFuncs->fPostProcess = post_process;

cv::Mat xmap(max_height, max_width, CV_32FC1);

cv::Mat ymap(max_height, max_width, CV_32FC1);

cv::Mat cam(3, 3, cv::DataType<float>::type);

cv::Mat dist(4, 1, cv::DataType<float>::type);

cam.at<float>(0, 0) = 1569.08571848262500f;

cam.at<float>(0, 1) = 0.0f;

cam.at<float>(0, 2) = 1897.44471769631582f;

cam.at<float>(1, 0) = 0.0f;

cam.at<float>(1, 1) = 1590.39243583626353f;

cam.at<float>(1, 2) = 1063.73079452522120f;

cam.at<float>(2, 0) = 0.0f;

cam.at<float>(2, 1) = 0.0f;

cam.at<float>(2, 2) = 1.0f;

dist.at<float>(0, 0) = -0.20963013646547f;

dist.at<float>(1, 0) = 0.03491832650144f;

dist.at<float>(2, 0) = 0.00160953572073f;

dist.at<float>(3, 0) = -0.00896893124032f;

dist.at<float>(4, 0) = -0.00194072642273f;

cv::fisheye::initUndistortRectifyMap(cam, dist, cv::Mat(), cam, cv::Size(max_width, max_height), CV_32FC1, xmap, ymap);

/* upload to GpuMats */

gpu_xmap.upload(xmap);

gpu_ymap.upload(ymap);

}

extern "C" void

deinit (void)

{

/* deinitialization */

}

int main(int argc, char* argv[]) {}

ポイントだけ解説しておきます。まず、74行目〜108行目の「init()」関数で、補正処理の初期設定を行います。82行目〜110行目では、補正処理に必要なカメラの歪み係数としてcamera.csvとdist.csvの値を設定します。次に103行目で、歪み計算に必要なxmap・ymapの値を計算し、GPUメモリにアップロードしておきます。カメラの歪み係数（camera.csvとdist.csv）の測定方法は「OpenCV＆Pythonで、簡単に魚眼レンズの歪み補正（Calibration）を行う方法」で紹介していますので参照下さい。

また、40行目〜71行目の「gpu_process()」関数が、nvivafilterから呼ばれるフレームごとの処理を行う関数です。この関数の中ではEGLImageからRBGA形式のフレームを取得して後述する「cv_process_RGBA()」関数を呼び出しています。

次の26行目〜34行目の「cv_process_RGBA()」関数が、歪み補正を行う処理になります。28・29行目でGPU上のメモリを確保した後に、31行目で一度元の画像をコピーしておきます。そして、33行目で「cv::cuda::remap()」関数を使って歪み補正を行います。「cv::cuda::remap()」関数を用いることで、GPUを使って高速に処理が可能となっています。

④コンパイル

プログラムを作成したらコンパイルして、ライブラリファイル「libnvsample_cudaprocess.so」を作成します。コンパイルには、上からダウンロードしたMakefileを使うのが簡単ですが、もちろん手打ちでもOKです。ポイントとしては、OpenCVとCUDAライブラリ「-lopencv_core -lopencv_calib3d -lopencv_cudawarping」へのリンクを追加することです。

test@Jetson:~/nvsample_cudaprocess$ make
/usr/local/cuda/bin/nvcc -ccbin g++ -I./  --shared  -Xcompiler -fPIC  -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_50,code=compute_50 -gencode arch=compute_72,code=compute_72 -o nvsample_cudaprocess.o -c nvsample_cudaprocess.cu
/usr/local/cuda/bin/nvcc -ccbin g++   --shared  -Xcompiler -fPIC  -Xlinker --dynamic-linker=/lib/ld-linux-aarch64.so.1  -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_50,code=compute_50 -gencode arch=compute_72,code=compute_72 -L/usr/local/lib -lopencv_core -lopencv_calib3d -lopencv_cudawarping -lopencv_cudaarithm -o libnvsample_cudaprocess.so nvsample_cudaprocess.o -L/usr/lib/aarch64-linux-gnu -lEGL -lGLESv2 -L/usr/lib/aarch64-linux-gnu/tegra -lcuda -lrt

test@Jetson:~/nvsample_cudaprocess$ ls -l *.so
-rwxrwxr-x 1 test test 588848 10月 13 21:44 libnvsample_cudaprocess.so

test@Jetson:~/nvsample_cudaprocess$ make

/usr/local/cuda/bin/nvcc -ccbin g++ -I./ --shared -Xcompiler -fPIC -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_50,code=compute_50 -gencode arch=compute_72,code=compute_72 -o nvsample_cudaprocess.o -c nvsample_cudaprocess.cu

/usr/local/cuda/bin/nvcc -ccbin g++ --shared -Xcompiler -fPIC -Xlinker --dynamic-linker=/lib/ld-linux-aarch64.so.1 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_32,code=sm_32 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_53,code=sm_53 -gencode arch=compute_62,code=sm_62 -gencode arch=compute_72,code=sm_72 -gencode arch=compute_50,code=compute_50 -gencode arch=compute_72,code=compute_72 -L/usr/local/lib -lopencv_core -lopencv_calib3d -lopencv_cudawarping -lopencv_cudaarithm -o libnvsample_cudaprocess.so nvsample_cudaprocess.o -L/usr/lib/aarch64-linux-gnu -lEGL -lGLESv2 -L/usr/lib/aarch64-linux-gnu/tegra -lcuda -lrt

test@Jetson:~/nvsample_cudaprocess$ ls -l *.so

-rwxrwxr-x 1 test test 588848 10月 13 21:44 libnvsample_cudaprocess.so

以上で、魚眼レンズ補正処理の実装は完了です。

補正処理の性能検証

ここからが本題のGPU（CUDA）を用いた魚眼レンズ補正処理の性能検証です。検証方法としては、上記の補正処理を含むパイプラインを実行してh264形式の動画を保存し、保存された動画をmediainfoコマンドで確認することで、FPSを確認します。また、パイプライン実行中のCPU/GPUの使用状況をJTOPコマンドで確認します。

①800×600での検証

まずは、小手調べということで、800×600での検証結果です。結果は29.046FPS。30fps出ていると言って良いでしょう！JTOPの結果を見ても、CUDAでGPUを使って補正処理を行なっていることが分かります。

$ggst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ! image/jpeg, width=800, height=600, framerate=30/1 ! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=800_h264HW_calib.mp4

$mediainfo 800_h264HW_calib.mp4
・・・省略・・・
Frame rate                               : 29.046 FPS
Minimum frame rate                       : 2.016 FPS
Maximum frame rate                       : 31.579 FPS

$ggst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ! image/jpeg, width=800, height=600, framerate=30/1 ! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=800_h264HW_calib.mp4

$mediainfo 800_h264HW_calib.mp4

・・・省略・・・

Frame rate : 29.046 FPS

Minimum frame rate : 2.016 FPS

Maximum frame rate : 31.579 FPS

②1600×1200

次に、1600×1200の検証結果です。結果は29.458FPS。まだまだ大丈夫ですね。800×600の時と比較するとGPUの周波数が230MHzに上がっています。

$gst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ! image/jpeg, width=1600, height=1200, framerate=30/1 ! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=1600_h264HW_calib.mp4

$mediainfo 1600_h264HW_calib.mp4
・・・省略・・・
Frame rate                               : 29.458 FPS
Minimum frame rate                       : 2.016 FPS
Maximum frame rate                       : 31.579 FPS

$gst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ! image/jpeg, width=1600, height=1200, framerate=30/1 ! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=1600_h264HW_calib.mp4

$mediainfo 1600_h264HW_calib.mp4

・・・省略・・・

Frame rate : 29.458 FPS

Minimum frame rate : 2.016 FPS

Maximum frame rate : 31.579 FPS

③1920×1080（フルHD）

次は、1920×1080（フルHD）での検証結果です。結果は28.927FPS！フルHDでも、ほぼリアルタイムに補正処理ができています❗️

$gst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ! image/jpeg, width=1920, height=1080, framerate=30/1 ! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=1920_h264HW_calib.mp4

$mediainfo 1920_h264HW_calib.mp4
・・・省略・・・
Frame rate                               : 28.927 FPS
Minimum frame rate                       : 1.880 FPS
Maximum frame rate                       : 31.579 FPS

$gst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ! image/jpeg, width=1920, height=1080, framerate=30/1 ! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=1920_h264HW_calib.mp4

$mediainfo 1920_h264HW_calib.mp4

・・・省略・・・

Frame rate : 28.927 FPS

Minimum frame rate : 1.880 FPS

Maximum frame rate : 31.579 FPS

④3840×2160（４K）

いよいよ、3840×2160（４K）での検証結果です。結果は19.452FPS・・・・。30FPS出ない😭GPUの周波数も307MHzと、フルHDの時と比べて高くなっていますが、30FPSは難しいようです。

$gst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ! image/jpeg, width=3840, height=2160, framerate=30/1 ! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=3840_h264HW_calib.mp4

$mediainfo 3840_h264HW_calib.mp4
・・・省略・・・
Frame rate                               : 19.452 FPS
Minimum frame rate                       : 1.506 FPS
Maximum frame rate                       : 31.250 FPS

$gst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ! image/jpeg, width=3840, height=2160, framerate=30/1 ! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ! queue ! 'video/x-raw(memory:NVMM), format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=3840_h264HW_calib.mp4

$mediainfo 3840_h264HW_calib.mp4

・・・省略・・・

Frame rate : 19.452 FPS

Minimum frame rate : 1.506 FPS

Maximum frame rate : 31.250 FPS

ボトルネックの検証

3840×2160（４K）で、30FPSが出ない原因を調査するために、NVIDIA Visual ProfilerでGPUの使用状況の詳細を調査しました。

①GPU使用情報を採取

NVIDIA Visual Profilerで、GPUの使用状況を可視化するためには、アプリケーション実行時の情報をnvprofコマンドを用いて取得する必要があります。

以下のように、Gstreamerのパイプラインを一旦test.shに保存しておき、nvprofコマンドを通じてtest.shを実行します。この時、nvivafilterから呼ばれるオリジナルの処理は子プロセスとして実行されるので「–profile-child-processes」オプションをつけるのをお忘れなく！

nvprofコマンドを実行して、動画をしばらく保存し、「Ctrl+C」で中断すると、以下のようにnvvpファイルが作成されます。このファイルにGPUの使用情報が記録されます。

$echo "gst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ! image/jpeg, width=3840, height=2160,¥
 framerate=30/1 ! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ¥
! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ! queue ¥
! 'video/x-raw(memory:NVMM), format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ¥
! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=3840_h264HW_calib.mp4" > test.sh

#/usr/local/cuda-10.2/bin/nvprof -o ./pid%p.nvvp --profile-child-processes sh test.sh
・・・省略・・・
==7892== Generated result file: /home/test/nvsample_cudaprocess/pid7892.nvvp

$echo "gst-launch-1.0 -e v4l2src device=/dev/video0 io-mode=2 ! image/jpeg, width=3840, height=2160,¥

framerate=30/1 ! nvv4l2decoder mjpeg=1 ! 'video/x-raw(memory:NVMM),format=NV12' ! queue ¥

! nvivafilter customer-lib-name=./libnvsample_cudaprocess.so cuda-process=true ! queue ¥

! 'video/x-raw(memory:NVMM), format=RGBA' ! nvvidconv ! 'video/x-raw(memory:NVMM), format=NV12' ¥

! nvv4l2h264enc ! h264parse ! queue2 ! qtmux ! filesink location=3840_h264HW_calib.mp4" > test.sh

#/usr/local/cuda-10.2/bin/nvprof -o ./pid%p.nvvp --profile-child-processes sh test.sh

・・・省略・・・

==7892== Generated result file: /home/test/nvsample_cudaprocess/pid7892.nvvp

②GPU処理情報の解析

取得したnvvpファイルをNVIDIA Visual Profilerで開きます。すると、以下の画像のように時間軸に従って、GPU内でどのような処理が行われているか、可視化されます。

Runtime APIの行の規則的に現れるオレンジのバーの部分が、動画の各フレームに対して歪み補正処理を行なっている時間です。この中では、メモリ確保（cudaMallocPitch）処理、茶色のバーで表示されている画像のコピー処理（ソースコードの31行目）、緑色バーで表示されている補正処理（ソースコードの33行目）にかかる時間が分かります。

この結果より、１フレームの歪み補正処理に52m秒かかっていることが分かります。1000m/52m=19.23なので、結果として19FPSしか出ないということです💦30FPSを出すためには、１フレームにかかる時間を30mまで縮める必要があります。メモリ確保処理に時間が掛かっており、これを縮めたいのですが、方法が分かりません・・・。分かる方いらっしゃったら、教えて下さい。