SMGCD: METRICS FOR BIOLOGICAL SEQUENCE DATA

M. Idrees, M. U. G. Khan

Abstract


In the realm of bioinformatics, the key challenges are to manage, store and retrieve the biological data efficiently. It can be classified in to structured, unstructured and semi-structured contents. Typically, the semi-structured biological data comprised of biological sequences. The complex biological sequences produce huge volume of biological data which further produce much more problems for its management, storage and retrieval. This paper proposed metrics; namely, symmetry measure, molecular weight measure, similarity or diversity measure, size base measure, size gap measure, complexity measure and size complexity diversity measure to manage the raised problems in biological data sequences. These metrics measure the sequence complexity, molecular weights, length with gaps and without gaps, its symmetry and similarity through mathematical formulations. The metrics are demonstrated and validated using the proposed hybrid technique which combines empirical evidence with theoretical formulation. This research opens new horizons for efficient management to measure the functionality and quality of metadata for single and multiple biological sequences.

Full Text:

PDF

References


A. Stojmirovic and Y.K. Yu, Journal of

Computational Biology 16, No. 4 (2009) 579.

D. P. Miranker, Metric-Space Search in

Bioinformatics. National Science Foundation,

Institute of National Health 2, No. 2 (2010) 32.

W. J. MacMullen and S.O. Denn, Journal of the

American Society for Information Science and

Technology 56, No. 5 (2005) 447.

A. Martinez and J. Hammer, Making Quality Count

in Biological Data Sources, Proceedings of the 2nd

International Workshop on Information Quality in

Information Systems (June 2005)

pp. 16-27.

M. Schoniger and M.S. Waterman, Bulletin of

Mathematical Biology, Elsevier 54, No. 4 (1992)

W. R. Atchley, S. J. Zhao, A. D. Fernandes and T.

Druke, Proceedings of the National Academy of

Sciences of United States of America 102, No. 18

(2005) 6395.

J. Manicassamy and P. Dhavchelvan, International

Journal of Recent Trends in Engineering 1, No. 1

(2009) 550.

J. Lee and S. Kim, Cluster Utility: A New Metric

for Clustering Biological Sequences Proceedings of

the 2005 IEEE Computational Systems

Bioinformatics Conference Workshops and Poster

Abstracts, IEEE Computer Society (August, 2005)

pp.45-46.

V. Moulton, M. Zuker, M. Steel, R. Pointon and D.

Penny, Journal of Computational Biology 7, No. ½,

(2004) 277.

M. S. Waterman, T. F. Smith and W.A. Beyer,

Journal of Advances in Mathematics 20, No. 3

(1976) 367.

B. Louie, L. Detwiler, N. Dalvi, R. Shaker, P.T.

Hornoch and D. Suciu, Incorporating Uncertainty

Metrics into a General-Purpose Data Integration

System, 19th International Conference on Scientific

and Statistical Database Management, IEEE

Computer Society (July 2007) pp. 19.

M. Li, X. Chen, X. Li and B. Maw, IEEE

Transactions on Information Theory 50, No. 12

(2004) 3250.

R. Saidi, A. Saber, M. Mondher and M.N.

Engelbert, Novel Metrics for Feature Extraction

Stability in Protein Sequence Classification.

LIMOS: Blasé Pascal University 1 (November

pp. 1-7.

A.E. Darling, A. Tritt, J.A. Eisen and M.T.

Faccoitti, Journal of Bioinformatics 27, No. 19

(August 2011) 2756.

Shazia, M. Shoaib, Iqra, K. Kalsoom, S. Majid and

F. Majeed, Pakistan Journal of Science 63, No. 1

(2011) 26.

S. Shah, Applied Mathematics Corner: DNA

Computation and Algorithm Design, Harvard

University, Cambridge, MA 02138 (2009) pp.

-89.

R. Saidi and S. Aridhi, Feature Extraction in

Protein Sequences Classification: A New Stability

Measure, Proceedings of the ACM Conference on

Bioinformatics, Computational Biology and Bio -

Medicine (2012) pp. 683-689.


Refbacks

  • There are currently no refbacks.