A Comparison of Address Matching Techniques to Improve Geocoding: A Case Study of Islamabad, Pakistan

Authors

  • A.B. Bashir Applied Geo-Informatics Research Lab, Department of Meteorology, COMSATS University Islamabad (CUI), Park Road, Chak Shahzad, Islamabad 45550, Pakistan
  • M.F. Iqbal Applied Geo-Informatics Research Lab, Department of Meteorology, COMSATS University Islamabad (CUI), Park Road, Chak Shahzad, Islamabad 45550, Pakistan

Abstract

Geocoding is the process of converting addresses into spatial coordinates. It has become a need for the modern world in many fields such as strategy making, disaster management, location-based analysis, and planning of infrastructure, etc. Address matching is of critical importance in geocoding and is dependent on the language, address format, and components of the addresses. Different algorithms, tools, and applications have been designed for improving address matching. The goal of the research was to standardize geocode addresses using different Natural Language Processing (NPL) models. To standardize and geocode distinct types of addresses, Islamabad was chosen as the study area because it incorporates standard and unusual addresses. Address datasets were obtained from telecommunication industries operating in the study area. In this research, two NLP models including DeepParse (DP) and SpaCy’s Named Entity Recognition (NER) were utilized and trained for address standardization. Each of the models operates using different techniques for parsing the addresses. The SpaCy model performed well with an accuracy of over 80% in all types of addresses. The DP model only outperformed the SpaCy model in urban areas with an accuracy of 95%, the others were less than 80%. This study focused on what types of addresses must be used to improve geocoding. Methods discussed in this study would be helpful in achieving a higher success rate for improving address matching percentage in the geocoding process. This research will help in choosing the best technique to improve address matching of addresses in Pakistan.

References

F. Melo and B. Martins, "Automated geocoding of textual documents: A survey of current approaches," Transactions in GIS, vol. 27, no. 2, pp. 3-38, 2017.

Q. Tian, F. Ren, T. Hu, J. Liu, R. Li, and Q. Du, "Using an optimized Chinese address matching method to develop a geocoding service: A case study of Shenzhen, China," ISPRS International Journal of Geo-Information, vol. 5, no. 11, pp. 650-667, 2016.

Z. Yan, C. Yang, L. Hu, J. Zhao, L. Jiang, and J. Gong, "The integration of linguistic and geospatial features using global context embedding for automated text geocoding," International Journal of Geo-Information, vol. 10, no. 9, pp. 572, 2021.

S. E. K. M. Rompa et al., "An approach to the asthma-protective farm effect by geocoding: good farms and better farms," Pediatric Allergy Immunol, vol. 29, no. 3, pp. 275-282, 2018.

S. Montazeri, F. R. Gonzalez, and X. X. Zhu, "Geocoding error correction for InSAR point clouds," Remote Sensing, vol. 10, no. 10, pp. 1523, 2018.

C. White and D. Weisburd, "A co-responder model for policing mental health problems at crime hot spots: Findings from a pilot project," Policing: A Journal of Policy and Practice, vol. 12, no. 2, pp. 194-209, 2018.

K. J. Sydow, S. Stobernack, and S. Wienecke, "Accuracy of address geocoding with GIS: A data analysis of households in a German City," Transactions in GIS, vol. 12, no. 3, pp. 333-354, 2008.

B. Wilson, N. Wilson, and S. Martin, "Using GIS to advance social economics research: geocoding, aggregation, and spatial thinking," Forum for Social Economics, vol. 50, no. 4, pp. 480-504, 2021.

E. J. Kinnee, S. Tripathy, L. Schinasi, J. L. C. Shmool, P. E. Sheffield, F. Hpuguin, and J. E. Clougherty, "Geocoding error, implications for exposure assessment and environmental epidemiology," International Journal of Environmental Research and Public Health, vol. 17, no. 16, pp. 5845, 2020.

X. Qin, S. Parker, Y. Liu, J. A. Graettinger, and S. Forde, "Intelligent geocoding system to locate traffic crashes," Accident Analysis and Prevention, vol. 50, pp. 1034-1041, 2012.

J. M. Caplan, L. W. Kennedy, E. L. Piza, and J. D. Barnum, "Using vulnerability and exposure to improve robbery prediction and target area selection," Applied Spatial Analysis and Policy, vol. 12, no. 1, pp. 113-136, 2019.

N.S. Walford, "Bringing historical British population census records into the 21st century: A method for geocoding households and individuals at their early‐20th century addresses," Population Space and Place, doi: 10.1002/psp.2227.

V. I. Levenshtein, "Binary codes capable of correcting deletions, insertions, and reversals," Soviet Physics – Doklady, vol. 10, pp. 707–710, 1965.

D. Perez-Diez, D. Rodriguez, V. Blanco-Valero, A. Caceres, and G. Castellanos-Dominguez, "Identification of medical texts written in Spanish from radiology report using SpaCy'NER," Journal of Biomedical Informatics, vol. 113, pp. 103647, 2021.

J. Won, J. Seo, J. Lee, and Y. Choi, "Named entity recognition of historical places in Korean Buddhist scriptures," Digital Scholarship in the Humanities, vol. 33, no. 2, pp. 401-412.

G. B. Moore, "Accessing individual records from personal data files using nonunique identifiers. Final report. Computer science & technology series," 1977.

C.P. Haberman, D. Hatten, J.G. Carter, and E.L. Pizza, "The sensitivity of repeat and near repeat analysis to geocoding algorithms," Journal of Criminal Justice, vol. 71, pp. 101721, 2021.

P.A. Zandbergen, "A comparison of address point, parcel and street geocoding techniques," Computers, Environment and Urban Systems, vol. 32, pp. 214-232, 2007.

M. Andresen, T. W. Jørgensen, and P. J. Diggle, "Spatial point pattern analysis of crime incidents: An overview and some open research problems," Statistical Science, vol. 35, no. 2, pp. 283-298, 2020.

N. Abid, A. Hasan, and F. Shafait, "DeepParse: A trainable postal address parser," in Digital Image Computing: Techniques and Applications, pp. 1–8, 2018.

V. Srivastava, P. Tejaswin, L. Dhakad, M. Kumar, and A. Dani, "A geocoding framework powered by delivery data," in Proceedings of the 28th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2020, pp. 1-4, doi: 10.1145/3397536.3422254.

Y. Lin, M. Kang, Y. Wu, Q. Du, and T. Liu, "A deep learning architecture for semantic address matching," International Journal of Geographical Information Science, vol. 34, no. 3, pp. 559-576, 2020.

A. E. Dilek, C. Yan, H. Jin and M. Bektasoglu, "An optimized Chinese address matching method based on address standardization, modeling and matching," Journal of Cleaner Production, vol. 171, pp. 1213–1223, 2018.

D.K. Matchi and U. Avdan “Address standardization using the natural language process for improving geocoding results” Computers, Environment and Urban Systems vol.70, pp. 1–8, 2018.

D. Laumer, N. Lang, N. Van Doorn, O. M. Aodha, P. Perona, and J. D. Wegner, "Geocoding of trees from street addresses and street level images," ISPRS Journal of Photogrammetry and Remote Sensing, vol. 162, pp. 125-136, 2020.

J. Chopin and S. Caneppele, "Geocoding child sexual abuse: An explorative analysis on journey to crime and to victimization from French police data," Child Abuse and Neglect, vol. 91, pp. 116-130, 2019.

A. Aslam, I. A. Rana, and S. S. Bhatti, "The spatiotemporal dynamics of urbanisation and local climate: A case study of Islamabad, Pakistan," Environmental Impact Assessment Review, vol. 91, pp. 1-10, 2021.

M.J. Butt, A. Waqas, M.F. Iqbal, G. Muhammad and M.A.K. Lodhi “Assessment of Urban Sprawl of Islamabad Metropolitan Area Using Multi-Sensor and Multi-Temporal Satellite Data” Arabian Journal for Science and Engineering, vol. 37:pp. 101–114, 2012.

S. Hochreiter and J. Schmidhuber, "Long Short-Term Memory," Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997.

I. Santos, N. Nedjah, and L. M. Mourelle, "Sentiment analysis using convolutional neural network with fastText embeddings," in IEEE Latin American Conference on Computational Intelligence, 2017, doi: 10.1109/LA-CCI.2017.8285683.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, "Enriching word vectors with subword information," Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-146, 2017.

M.A. Andresen, "Testing for spatial dependence in the distribution of property crime using point pattern analysis and GIS," Journal of Quantitative Criminology, vol. 25, no. 3, pp. 339-366, 2009.

M. A. Andresen, "Canada-United States interregional trade: quasi-points and spatial change," The Canadian Geographer, vol. 54, no. 2, pp. 139-157, 2010.

M.A. Andresen and N. Malleson, "Spatial heterogeneity in crime analysis," in Crime Modeling and Mapping Using Geospatial Technologies, M. Leitner, Ed. New York, NY: Springer, pp. 3-23, 2013.

Downloads

Published

18-08-2023

How to Cite

[1]
A. B. Bashir and M. F. Iqbal, “A Comparison of Address Matching Techniques to Improve Geocoding: A Case Study of Islamabad, Pakistan”, The Nucleus, vol. 60, no. 2, pp. 145–152, Aug. 2023.

Issue

Section

Articles