Diacritics Recognition Based Urdu Nastalique OCR System
Abstract
Improvements and new developments in the field of Artificial Intelligence have opened new horizons in the advancement of machines that originally have limited intelligence. As compared to human brain, machines have already better computational speed and storage however there is still much room to improve the capability to acquire and process data and draw conclusions from it on its own. Optical Character Recognition (OCR) deals exclusively with printed designs and hand written text in nature. Plenty of developments have been made in OCR so far in recognition of Latin, Asian, Arabic and Western texts. As far as Urdu is concerned the work is almost non-existent when compared with the languages cited above. One of its main reasons is the use of extremely complex characters of Nastalique style in Urdu. A methodology for the recognition and processing of the diacritics of Nastalique script is presented in this research work. The proposed technique is effective in recognizing cursive texts with invariant font size of 48. A dataset of 6728 main Urdu Nastalique ligatures is used for the testing purposes which shows that this new technique has the capacity to recognize Nastalique ligatures by having an accuracy of 97.40%. The proposed research work also focuses to improve the existing base mark association process of the Urdu OCR system.References
A.Ul-Hasan, S.B. Ahmed, F.Rashid, F.Shafait and T.M.Breuel,
th International Conference on IEEE 7 (2013) pp. 1061-1065.
N. Fareen, M.A.Khan and A.Durrani, Survey of Urdu OCR: an
Offline Approach, Proceedings of the Conference on Language
& Technology Lahore, Pakistan, 9-10 November (2012)
pp. 67-72.
A. Wali and S. Hussain, Context Sensitive Shape-Substitution in
Nastaliq Writing System: Analysis and Formulation,
Proceedings of 2nd International Joint Conferences on
Computer, Information, and Systems Sciences, and Engineering
(2006), Bridgeport, USA, 4-14 December (2006) pp. 53-58.
S. Naz, K. Hayat, M.I. Razzak, M.W. Anwar and H. Akbar,
Arabic Script Based Character Segmentation: A Review,
Computer and Information Technology (WCCIT), World
Congress on. IEEE, 22-24 June (2013).
S.A. Husain and S.H. Amin, A Multi-tier Holistic approach for
Urdu Nastaliq Recognition, Proceedings of International Multi
Topic Conference, Karachi, Pakistan (2002) pp. 84-84.
M. Kherallah, N. Tagougui, A. Alimi, H.E. Abed and
V. Margner, Online Arabic Handwriting Recognition
Competition, Document Analysis and Recognition (ICDAR),
International Conference on IEEE, Beijing, China, 18-21
September (2011) pp.1454-1458.
A. Das and U. Bhattacharya, ISI graphy: A Tool for Online
Handwriting Sample Database Generation, Computer Vision,
Pattern Recognition, Image Processing and Graphics
(NCVPRIPG), Fourth National Conference, Jodhpur, India
(2013) pp. 1 - 4
S. Naz, K. Hayat, M.I. Razzak, M.V.Anwar and H. Akbar,
Challenges in Baseline Detection of Cursive Script Languages,
Science and Information Conference, London, UK, 7-9 October
(2013) pp. 551-556.
D.A. Satti and K. Saleem, Complexities and Implementation
Challenges in Offline Urdu Nastaliq OCR, Proceedings of the
th Conference on Language & Technology, Lahore, Pakistan,
-10 November (2012) pp. 85-91.
M.I. Razzak, S.A. Hussain, M. Sher and Z.S. Khan, Combining
offline and Online Preprocessing for Online Urdu Character
Recognition, Proceedings of the International Multi Conference
of Engineers and Computer Scientists, Hong Kong, China,18-20
March, 1 (2009) 18.
R.J. Rodrigues and A.C.G. Thome, Cursive Character
Recognition – A Character Segmentation Method using
Projection Profile-based Technique, Proceedings of SCI /ISAS
V (2000).
S. Naz, K. Hayat, M.I. Razzak, M.V. Anwar and H. Akbar,
Arabic Script Based Language Character Recognition: Nasta'liq
vs Naskh analysis, IEEE World Congress on Computer and
Information Technology, Sousse, Tunisia, 22-24 June (2013)
pp. 1-7.
N. Sankaran and C. V. Jawahar, Recognition of Printed
Devanagari Text Using BLSTM Neural Network, ICPR. IEEE,
Tsukuba, Japan (2012) pp. 322-325.
I.K. Pathan, A.A. Ali and R.J. Ramteke, International Journal on
Advances in Computational Research 4 (2012) 117.
A. Jain, A. Dubey, G. Rachit, N. Jain and T. Pooja, International
Journal of Innovative Research and Studies 2 (2013) 86.
G.S. Lehal, Choice of Recognizable Units for Urdu OCR,
Proceeding of the Workshop on Document Analysis and
Recognition, Mumbai, India, 16 December (2012) pp. 79-85.
F.O. Deborah, O.E. Olusayo and F.A. Alade, Innovative
Systems Design and Engineering 3 (2012) 10.
D. Megherbi, S.M. Lodhi and J.A. Boulenouar, A Fuzzy LogicBased Technique for Urdu Character Representation and
Recognition, Proceedings of SPIE 3962, San Jose, California
(2000) pp. 13-24.
S.T. Javed, S. Hussain, A. Maqbool, S. Asloob, S. Jamil and
H. Moin, Proceedings of World Academy of Science,
Engineering and Technology 46 (2010) 456.
A. Vivek and A.M. Meggiolaro, Sign language Recognition
using Competitive Learning in the HAVENET Neural Network,
Proceedings of SPIE 3962, San Jose, California (2000).
U. Pal and A. Sarkar, Recognition for Printed Urdu Script,
Proceedings of the Seventh International Conference on
Document Analysis and Recognition 2, IEEE Computer Society,
Edinburgh, Scotland, 3-6 August (2003) pp. 1183-1187.
Z. Shah and F. Saleem, Ligature Based Optical Character
Recognition of Urdu, Nastaleeq Font, 6th Multi Topic
International Conference, Karachi, Pakistan, 27-28 December
(2002) pp. 25, 27-28.
V. Frinken, A. Fischer, R. Manmatha and H. Bunke, IEEE
Trans. Pattern Anal. Mach. Intell. 34 (2012) 211.
N. Durrani and S. Hussain. Urdu Word Segmentation, Human
Language Technologies, The 2010 Annual Conference of the
North American Chapter of the Association for Computational
Linguistics (HLT '10). Association for Computational
Linguistics, Stroudsburg, PA, USA, 2 June (2010) pp. 528-536.
S. Hussain and M. Afzal, Urdu Computing Standards: UZT 1.01,
Proceedings of the IEEE International Multi-Topic Conference,
Lahore, Pakistan (2001) pp. 223-228.
Z. Ahmad, J.K. Orakzai, I. Shamsher and A. Adnan, Urdu
Nastaleeq Optical Character Recognition, Proceedings of World
Academy of Science, Engineering and Technology 26 (2007)
pp. 2380-2383.
I. Shamsher, Z. Ahmad, J.K. Orakzai and A. Adnan, Int. J.
Comp. Info. Sys. & Control Engg. 1 (2007) 2978.
A. Hussain, F. Anwar and A. Sajjad, Online Urdu Character
Recognition System, MVA2007 IAPR Conference on Machine
Vision Applications, 16-18 May, Tokyo, Japan (2007)pp.
-101.
J. Tariq, U. Nauman and M.U. Naru, Soft Converter: A Novel
Approach to Construct OCR for Printed Urdu Isolated
Characters, IEEE 2nd International Conference on Computer in
Engineering and Technology, Chengdu, China 3 (2010) pp. V3-
F. Iqbal, A. Latif, N. Kanwal and T. Altaf, Conversion of Urdu
Nastaliq to Roman Urdu using OCR, IEEE 4th International
Conference on Interaction Sciences, Busan, Korea (2011)
pp. 19-22.
T. Nawaz, S.A.H.S. Naqvi, H. Rehman and A. Faiz, Int. J. of
Image Processing 3 (2009) 92.
S. Belongie, J. Malik, and J. Puzicha, IEEE Trans. Pattern Anal.
Mach. Intell. 24 (2002) 509
S.T. Javed and S. Hussain, Improving Nastalique Specific
Pre-Recognition Process for Urdu OCR, Proceedings of 13th
IEEE International Multitopic Conference, Islamabad, Pakistan,
-15 December (2009) pp. 1-6.