A. Rehman, K. Javed, H. A. Babri


Although the nature of text data is different from ordinary non-text datasets in a number of ways, existing algorithms from Machine Learning domain have been borrowed for the classification of text data. Machine learning algorithms cannot be readily applied on raw text data. Text data needs to be transformed to a suitable form for the application of machine learning algorithms. The transformation produces further problems for feature selection and classification algorithms. In this paper we highlight the problems introduced by transformation of text data. We also show how different feature selection algorithms including bi-normal separation, information gain and ROC are affected by text data.

Full Text:




