画像から文字を取り出す
普通にOCR試したかっただけなのですが、結果から申しますと、期待していたほど精度もさほど良くなく、日本語の抽出はまったくうまくいきませんでした。(やり方がまずかったのか...)
設定と色々インストール
このマシンにはGeForce GTX 1050が刺さっているの、openclとopenglの動作する実装をeselectで切り替えておく
ugui7 ~ # eselect opencl list
Available OpenCL implementations:
[1] mesa
[2] nvidia *
ugui7 ~ # eselect opengl list
Available OpenGL implementations:
[1] nvidia *
[2] xorg-x11
ugui7 ~ #
ちなみにopenclの実装をmesaに切り替えると、/usr/lib64/libtesseract.so.3の呼び出しでSEGVで腐る、多分、OpenclDeviceが帰ってこないようです。
#0 0x00007fffefd624b6 in strlen () from /lib64/libc.so.6
#1 0x00007fffee5ab5bf in OpenclDevice::getDeviceSelection() () from /usr/lib64/libtesseract.so.3
#2 0x00007fffee5ad1f8 in OpenclDevice::InitOpenclRunEnv_DeviceSelection(int) () from /usr/lib64/libtesseract.so.3
#3 0x00007fffee5ad25b in OpenclDevice::InitEnv() () from /usr/lib64/libtesseract.so.3
#4 0x00007fffee3b2e6a in tesseract::TessBaseAPI::Init(char const*, char const*, tesseract::OcrEngineMode, char**, int, GenericVector<STRING> const*, GenericVector<STRING> const*, bool) () from /usr/lib64/libtesseract.so.3
#5 0x00007ffff38ae61e in cv::text::OCRTesseract::create(char const*, char const*, char const*, int, int) () from /usr/lib64/libopencv_text.so.3.4
#6 0x000055555555785d in main () at ocrtesseract.cpp:16
話戻して、パッケージはこんな感じ
ugui7 ~ # emerge -pv app-text/tesseract media-libs/opencv
* IMPORTANT: config file '/etc/portage/package.keywords' needs updating.
* See the CONFIGURATION FILES and CONFIGURATION FILES UPDATE TOOLS
* sections of the emerge man page to learn how to update config files.
These are the packages that would be merged, in order:
Calculating dependencies... done!
[ebuild R ] app-text/tesseract-3.05.01::gentoo USE="doc jpeg opencl png tiff -examples -math -osd -scrollview -static-libs -training -webp" L10N="ja -ar -bg -ca -chr -cs -da -de -el -es -fi -fr -he -hi -hu -id -it -ko -lt -lv -nl -no -pl -pt -ro -ru -sk -sl -sr -sv -th -tl -tr -uk -vi -zh-CN -zh-TW" 0 KiB
[ebuild R ~] media-libs/opencv-3.4.1-r2:0/3.4.1::gentoo USE="contrib contrib_dnn eigen ffmpeg gtk ieee1394 jpeg jpeg2k opencl opengl openmp png python tesseract threads tiff -contrib_cvv -contrib_hdf -contrib_sfm -contrib_xfeatures2d -cuda -debug -dnn_samples -examples -gdal -gflags -glog -gphoto2 -gstreamer (-ipp) -java -lapack -libav -openexr -pch -qt5 -testprograms -v4l -vaapi -vtk -webp -xine" ABI_X86="32 (64) (-x32)" CPU_FLAGS_X86="sse sse2 -avx -avx2 -fma3 -popcnt -sse3 -sse4_1 -sse4_2 -ssse3" PYTHON_TARGETS="python2_7 python3_6 -python3_4 -python3_5" 0 KiB
Total: 2 packages (2 reinstalls), Size of downloads: 0 KiB
* IMPORTANT: 36 news items need reading for repository 'gentoo'.
* Use eselect news read to view new items.
ugui7 ~ #
取り出してみる
画像はこちら
OCRTesseract::create関数のengをjpnにすれば、日本語の解析ができるようですが、抽出はうまく出来ませんでした、なので英語版で動作確認です。
#include <opencv2/opencv.hpp>
#include <opencv2/text.hpp>
using namespace std;
int main(void)
{
auto image = cv::imread("test.jpg");
cv::Mat gray;
cv::cvtColor(image, gray, cv::COLOR_RGB2GRAY);
string result;
vector<cv::Rect> boxes;
vector<string> words;
vector<float> confidences;
printf("Initialize OCRTesseract...\n");
auto ocr = cv::text::OCRTesseract::create("/usr/share/tessdata", "eng", NULL, cv::text::OEM_DEFAULT, cv::text::PSM_AUTO);
ocr->run(gray, result, &boxes, &words, &confidences);
cout << " String | Posistion | Size | confidences" << endl;
cout << "---------------------+------------+------------+------------" << endl;
for (int i = 0; i < boxes.size(); i++) {
printf("%-20s | (%3d, %3d) | (%3d, %3d) | %f\n",
words[i].c_str(),
boxes[i].x, boxes[i].y,
boxes[i].width, boxes[i].height,
confidences[i]);
}
cout << endl << "Result:\n-------" << endl;
cout << result.c_str();
return 0;
}
そして、ビルド、実行結果
cuomo@ugui7 ~/opencv $ gcc -g -O0 `pkg-config opencv --cflags --libs` -lstdc++ ocrtesseract.cpp
cuomo@ugui7 ~/opencv $ ./a.out
Initialize OCRTesseract...
[DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 1:GeForce GTX 1050 score is 0.087704
[DS] Device[2] 0:(null) score is 0.372587
[DS] Selected Device[1]: "GeForce GTX 1050" (OpenCL)
String | Posistion | Size | confidences
---------------------+------------+------------+------------
The | ( 39, 54) | ( 37, 17) | 87.536118
first | ( 91, 54) | ( 63, 17) | 71.520576
step | (169, 56) | ( 50, 19) | 85.904022
is | (234, 54) | ( 23, 17) | 73.084877
always | (273, 54) | ( 75, 21) | 78.761841
the | (364, 54) | ( 37, 17) | 88.950020
hardest | (417, 54) | ( 97, 17) | 46.006390
There | ( 39, 80) | ( 63, 17) | 85.972023
is | (117, 80) | ( 23, 17) | 73.084877
no | (157, 85) | ( 23, 12) | 85.654243
royal | (196, 80) | ( 62, 21) | 79.272545
road | (274, 80) | ( 48, 17) | 85.654243
to | (338, 82) | ( 24, 15) | 85.466003
learning | (377, 80) | (111, 21) | 46.006390
It s | ( 39, 105) | ( 49, 18) | 26.470589
never | (105, 111) | ( 61, 12) | 85.544647
too | (182, 108) | ( 37, 15) | 85.466003
late | (234, 106) | ( 50, 17) | 78.761841
to | (299, 108) | ( 24, 15) | 85.466003
learn | (338, 106) | ( 72, 17) | 46.006390
Kuso | ( 39, 134) | ( 50, 15) | 85.498596
ka2 | (104, 132) | ( 36, 17) | 72.482178
Dnaeha | (156, 132) | (111, 17) | 46.006390
Result:
-------
The first step is always the hardest
There is no royal road to learning
It s never too late to learn
Kuso ka2 Dnaeha
普通にかかれている英文はうまく取れましたが、「?」とか入っているとなんか腐るようです。
ほんとにGPU使ってんのか?
x11-drivers/nvidia-drivers-396.24パッケージにnvidia-smiというコマンドが入っているのでこれで確認してみた。
cuomo@ugui7 ~ $ nvidia-smi -l 1 -i 0
Sat Jun 2 10:00:37 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.24 Driver Version: 396.24 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 1050 Off | 00000000:01:00.0 On | N/A |
| 35% 35C P0 N/A / 75W | 244MiB / 1999MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 445 G /usr/bin/X 189MiB |
| 0 23555 C ./a.out 43MiB |
+-----------------------------------------------------------------------------+
...
Type CなのでCompute Processでa.out動いてるっぽい、が、OCRてきなところは期待していたより、さほど良くないみたい。