Diacritics Recognition Based Urdu Nastalique OCR System

Authors

  • S. Nazir Department of Software Engineering, University of Engineering and Technology, Taxila, Pakistan
  • A. Javed Department of Software Engineering, University of Engineering and Technology, Taxila, Pakistan

Abstract

Improvements and new developments in the field of Artificial Intelligence have opened new horizons in the advancement of machines that originally have limited intelligence. As compared to human brain, machines have already better computational speed and storage however there is still much room to improve the capability to acquire and process data and draw conclusions from it on its own. Optical Character Recognition (OCR) deals exclusively with printed designs and hand written text in nature. Plenty of developments have been made in OCR so far in recognition of Latin, Asian, Arabic and Western texts. As far as Urdu is concerned the work is almost non-existent when compared with the languages cited above. One of its main reasons is the use of extremely complex characters of Nastalique style in Urdu. A methodology for the recognition and processing of the diacritics of Nastalique script is presented in this research work. The proposed technique is effective in recognizing cursive texts with invariant font size of 48. A dataset of 6728 main Urdu Nastalique ligatures is used for the testing purposes which shows that this new technique has the capacity to recognize Nastalique ligatures by having an accuracy of 97.40%. The proposed research work also focuses to improve the existing base mark association process of the Urdu OCR system.

References

A.Ul-Hasan, S.B. Ahmed, F.Rashid, F.Shafait and T.M.Breuel,

th International Conference on IEEE 7 (2013) pp. 1061-1065.

N. Fareen, M.A.Khan and A.Durrani, Survey of Urdu OCR: an

Offline Approach, Proceedings of the Conference on Language

& Technology Lahore, Pakistan, 9-10 November (2012)

pp. 67-72.

A. Wali and S. Hussain, Context Sensitive Shape-Substitution in

Nastaliq Writing System: Analysis and Formulation,

Proceedings of 2nd International Joint Conferences on

Computer, Information, and Systems Sciences, and Engineering

(2006), Bridgeport, USA, 4-14 December (2006) pp. 53-58.

S. Naz, K. Hayat, M.I. Razzak, M.W. Anwar and H. Akbar,

Arabic Script Based Character Segmentation: A Review,

Computer and Information Technology (WCCIT), World

Congress on. IEEE, 22-24 June (2013).

S.A. Husain and S.H. Amin, A Multi-tier Holistic approach for

Urdu Nastaliq Recognition, Proceedings of International Multi

Topic Conference, Karachi, Pakistan (2002) pp. 84-84.

M. Kherallah, N. Tagougui, A. Alimi, H.E. Abed and

V. Margner, Online Arabic Handwriting Recognition

Competition, Document Analysis and Recognition (ICDAR),

International Conference on IEEE, Beijing, China, 18-21

September (2011) pp.1454-1458.

A. Das and U. Bhattacharya, ISI graphy: A Tool for Online

Handwriting Sample Database Generation, Computer Vision,

Pattern Recognition, Image Processing and Graphics

(NCVPRIPG), Fourth National Conference, Jodhpur, India

(2013) pp. 1 - 4

S. Naz, K. Hayat, M.I. Razzak, M.V.Anwar and H. Akbar,

Challenges in Baseline Detection of Cursive Script Languages,

Science and Information Conference, London, UK, 7-9 October

(2013) pp. 551-556.

D.A. Satti and K. Saleem, Complexities and Implementation

Challenges in Offline Urdu Nastaliq OCR, Proceedings of the

th Conference on Language & Technology, Lahore, Pakistan,

-10 November (2012) pp. 85-91.

M.I. Razzak, S.A. Hussain, M. Sher and Z.S. Khan, Combining

offline and Online Preprocessing for Online Urdu Character

Recognition, Proceedings of the International Multi Conference

of Engineers and Computer Scientists, Hong Kong, China,18-20

March, 1 (2009) 18.

R.J. Rodrigues and A.C.G. Thome, Cursive Character

Recognition – A Character Segmentation Method using

Projection Profile-based Technique, Proceedings of SCI /ISAS

V (2000).

S. Naz, K. Hayat, M.I. Razzak, M.V. Anwar and H. Akbar,

Arabic Script Based Language Character Recognition: Nasta'liq

vs Naskh analysis, IEEE World Congress on Computer and

Information Technology, Sousse, Tunisia, 22-24 June (2013)

pp. 1-7.

N. Sankaran and C. V. Jawahar, Recognition of Printed

Devanagari Text Using BLSTM Neural Network, ICPR. IEEE,

Tsukuba, Japan (2012) pp. 322-325.

I.K. Pathan, A.A. Ali and R.J. Ramteke, International Journal on

Advances in Computational Research 4 (2012) 117.

A. Jain, A. Dubey, G. Rachit, N. Jain and T. Pooja, International

Journal of Innovative Research and Studies 2 (2013) 86.

G.S. Lehal, Choice of Recognizable Units for Urdu OCR,

Proceeding of the Workshop on Document Analysis and

Recognition, Mumbai, India, 16 December (2012) pp. 79-85.

F.O. Deborah, O.E. Olusayo and F.A. Alade, Innovative

Systems Design and Engineering 3 (2012) 10.

D. Megherbi, S.M. Lodhi and J.A. Boulenouar, A Fuzzy LogicBased Technique for Urdu Character Representation and

Recognition, Proceedings of SPIE 3962, San Jose, California

(2000) pp. 13-24.

S.T. Javed, S. Hussain, A. Maqbool, S. Asloob, S. Jamil and

H. Moin, Proceedings of World Academy of Science,

Engineering and Technology 46 (2010) 456.

A. Vivek and A.M. Meggiolaro, Sign language Recognition

using Competitive Learning in the HAVENET Neural Network,

Proceedings of SPIE 3962, San Jose, California (2000).

U. Pal and A. Sarkar, Recognition for Printed Urdu Script,

Proceedings of the Seventh International Conference on

Document Analysis and Recognition 2, IEEE Computer Society,

Edinburgh, Scotland, 3-6 August (2003) pp. 1183-1187.

Z. Shah and F. Saleem, Ligature Based Optical Character

Recognition of Urdu, Nastaleeq Font, 6th Multi Topic

International Conference, Karachi, Pakistan, 27-28 December

(2002) pp. 25, 27-28.

V. Frinken, A. Fischer, R. Manmatha and H. Bunke, IEEE

Trans. Pattern Anal. Mach. Intell. 34 (2012) 211.

N. Durrani and S. Hussain. Urdu Word Segmentation, Human

Language Technologies, The 2010 Annual Conference of the

North American Chapter of the Association for Computational

Linguistics (HLT '10). Association for Computational

Linguistics, Stroudsburg, PA, USA, 2 June (2010) pp. 528-536.

S. Hussain and M. Afzal, Urdu Computing Standards: UZT 1.01,

Proceedings of the IEEE International Multi-Topic Conference,

Lahore, Pakistan (2001) pp. 223-228.

Z. Ahmad, J.K. Orakzai, I. Shamsher and A. Adnan, Urdu

Nastaleeq Optical Character Recognition, Proceedings of World

Academy of Science, Engineering and Technology 26 (2007)

pp. 2380-2383.

I. Shamsher, Z. Ahmad, J.K. Orakzai and A. Adnan, Int. J.

Comp. Info. Sys. & Control Engg. 1 (2007) 2978.

A. Hussain, F. Anwar and A. Sajjad, Online Urdu Character

Recognition System, MVA2007 IAPR Conference on Machine

Vision Applications, 16-18 May, Tokyo, Japan (2007)pp.

-101.

J. Tariq, U. Nauman and M.U. Naru, Soft Converter: A Novel

Approach to Construct OCR for Printed Urdu Isolated

Characters, IEEE 2nd International Conference on Computer in

Engineering and Technology, Chengdu, China 3 (2010) pp. V3-

F. Iqbal, A. Latif, N. Kanwal and T. Altaf, Conversion of Urdu

Nastaliq to Roman Urdu using OCR, IEEE 4th International

Conference on Interaction Sciences, Busan, Korea (2011)

pp. 19-22.

T. Nawaz, S.A.H.S. Naqvi, H. Rehman and A. Faiz, Int. J. of

Image Processing 3 (2009) 92.

S. Belongie, J. Malik, and J. Puzicha, IEEE Trans. Pattern Anal.

Mach. Intell. 24 (2002) 509

S.T. Javed and S. Hussain, Improving Nastalique Specific

Pre-Recognition Process for Urdu OCR, Proceedings of 13th

IEEE International Multitopic Conference, Islamabad, Pakistan,

-15 December (2009) pp. 1-6.

Downloads

Published

15-09-2014

How to Cite

[1]
S. Nazir and A. Javed, “Diacritics Recognition Based Urdu Nastalique OCR System”, The Nucleus, vol. 51, no. 3, pp. 361–367, Sep. 2014.

Issue

Section

Articles