李颖玉西安交通大学教师主页管理系统 Multilingual corpus construction based on printed and handwritten character separation 中文主页

当前位置：中文主页 > 科学研究 > 论文成果

李颖玉

Personal profile

个人简介

暂未填写

论文成果

Multilingual corpus construction based on printed and handwritten character separation

发布时间：2025-04-30 点击次数：

发布时间：2025-04-30

论文名称：Multilingual corpus construction based on printed and handwritten character separation

发表刊物：Multimedia Tools & Applications

摘要：This paper proposes an effective method to extract printed and handwritten characters
from multilingual document images to build corpus. To extract the characters from the
document images, a connected component analysis method is used to remove the graphics.
After that, multiple types of features and AdaBoost algorithm are introduced to classify printed
and handwritten characters in a more versatile and robust way. Firstly, the content of the image
is divided into several text patches which are then used to distinguish different languages.
Secondly, we use the multiple types of features and AdaBoost algorithm to train the classifiers
based on the segmented patches. Finally, we can separate printed and handwritten parts of new
image set by the trained classifiers. The proposed method improves the precision of the
extraction of written materials in text images of different languages. Experimental results
demonstrate that the proposed method is more accurate in terms of precision and recall rate
compared with the state-of the-art methods.

DOI 10.1007/s11042-015-2995-5

合写作者：Yuping Lin, Yonghong Song, Yingyu Li, et al

是否译文：否

发表时间：2015-10-24

上一条：语境对翻译选词的影响——以《骆驼祥子》英译本中的“闹”为例下一条：Fast document image comparison in multilingual corpus without OCR