Data clustering is a vital tool when it comes to understanding data items with similar characteristics in a data set for the sake of grouping. Clustering may be for understanding or utility. Clustering for understanding, which is the focus of this work deals with grouping items with common characteristics in order to better understand a dataset and to identify possible or pre-interest sub-groups that could be formed from such data. The HIV prevalence statistics in Nigeria is measured bi-annually across 36 states and FCT which were zoned under 6 geo-political zones happens to be a suitable data to implement this subject matter. Cluster Analysis was implemented through the general methods of Hierarchical (agglomerative nesting) and Partitioning methods (K-Means). These techniques where implemented on the platform of R (Statistical Computing Language) to cluster HIV prevalence rate in Nigeria so as to find out states that could be considered same category and to investigate the concentration of the disease in respect to geo-political zones. Relative type of validation was used for cluster validation (a mechanism for evaluating the correctness of clustering).
Keywords: Clustering analysis, HIV, Nigeria, Pregnant Women, data