Application of K-Means Clustering in Mapping of Central Java Crime Area

Crimes occur in many places and cause complex problems that have widespread impacts on all levels of society. Crime is related to several factors including crime index, the ratio of the number of police to the population, population density and poverty rates. In this study trying to develop an information system that is able to display and map crime-prone areas in Central Java. Based on these factors, it is used to classify regions in Central Java, namely the category of safe, quite vulnerable, vulnerable and very vulnerable. K-Means clustering method, is very suitable to be used in predicting and grouping which areas are included in the 4 categories. The formulation of the problem is to find out areas prone to crime in Central Java. Based on the results, there are 11 regions with safe categories, 4 areas with quite vulnerable categories, 13 regions with vulnerable categories and 6 regions with very vulnerable categories.


Introduction
Crime is a complicated problem that has wide impact on all levels of society. Crime is a common problem everywhere. Crimes often occur in various places with different time events, making it difficult to determine which areas have a degree of vulnerability to crime.
Information about the number of crimes is needed by the community and law enforcement in this case the police. For the wider community, this information is very useful for anticipatory actions. For the police, this information helps in making decisions about whether an area needs extra supervision or not. In addition, this information is needed to determine the intensity of the crime.
The solution to these problems is how to create a mapping application for crimeprone areas. This application provides information about crime-prone areas in Central Java.

K-Means does not require complicated mathematical operations and K-Means
minimizes the objective functions set in the clustering process, generally tries to minimize variations within a cluster, the weakness is that the centroids values given at the beginning can affect the results of the clustering if the values are different (sensitive to initial centroids value) [1]. Therefore the mapping of crime-prone areas in the Central Java region is based on the crime index criteria, the ratio of the number of police to the population, population density, and poverty. This study uses the K-Means method to cluster crime-prone areas in the Central Java.

2.1.
Clustering Method. Clustering can be considered the most important unsupervised learning problem, so as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. A loose definition of clustering could be "the process of organizing objects into groups whose members are similar in some way" [5]. A cluster is therefore a collection of objects which are "similar" between them and are "dissimilar" to the objects belonging to other clusters.
For the clustering algorithm to be advantageous and beneficial some of the conditions need to be satisfied. In this case we easily identify the 4 clusters into which the data can be divided; the similarity criterion is distance: two or more objects belong to the same cluster if they are "close" according to a given distance (in this case geometrical distance). This is called distance-based clustering. Another kind of clustering is conceptual clustering that two or more objects belong to the same cluster if this one defines a concept common to all those objects. In other words, objects are grouped according to their fit to descriptive concepts, not according to simple similarity measures.
Data clustering is a method of data mining that is unsupervised. K-Means is a nonhierarchical data clustering method that attempts to partition existing data into one or more clusters/groups. This method partitioned data into clusters/groups so that data that has the same characteristics are grouped into the same cluster and data that has different characteristics are grouped into other groups. Data clustering using the K-Means method is generally done with the basic algorithm as follows [6,7]: a. Determine the number of clusters.
b. Allocate data into clusters randomly.
c. Calculate the distance of each existing data against each cluster center with the following formula (1): If there is a large enough number of data between one variable with another can be difficult in the process of grouping. One solution used to reduce the amount of numbers between variables is to normalize the numbers in the variables using the following equation (2):

2.2.
Crime Index. Crime index is the percentage increase or decrease in crime during the year compared to one particular year (which is used as a base year). The higher the crime index of an area indicates the lower level of security in the community of the region.
By comparing the 2016 crime rate with the crime rate that occurred in 2014 (as a reference year) [8]. In 2014 the crime rate was relatively high. In that year the circulation of firearms was very high which increased social conflicts in the community such as brawl between villages that occurred in various regions. Conflicts between the TNI-Polri were recorded seven times as well as high rates of drug fraud and drug trafficking in 2014.
The formula for determining the crime index is as follows : namely the cohesion method that functions to measure how close the distance between objects in a cluster, and the separation method that functions to measure how far a cluster is separated from another cluster [9]. It can be seen in Table 1.
is the distance of the i-th data with the r-th data in one j cluster, whereas mj is the number of data in the j-th cluster. b. Calculate the average distance of an object with other objects in another cluster, then take the minimum value with the equation : c. Calculate the Silhouette coefficient value with the equation : The following Table 2 gives literature review on K-Means and its modifications.

System Design.
Context diagram is the input or output relationship which becomes a unity in a system. In the context diagram, the data is described globally which illustrates the flow of data sourced from the admin user which is then processed in the data processing to produce information such as Figure 1.
Data Flow Diagrams (DFD) are a data logic model or process created to describe where the data originated from and where the data is coming out of the system, where the data is stored, what processes produce the data and the interactions between the data saved and the processes that are subject to in that data. DFD shows the relationship between data in the clustering system using the K-Means algorithm can be illustrated in Figure 2. Java. Crime index is the percentage increase or decrease in crime during the year compared to a certain year (which is used as a base year). Obtained a crime index in districts/cities in Central Java using the formula (3). It can be seen in Table 3. District/city data taken in the form of 34 districts/cities in Central Java. Data taken in the form of crime index data, data on the ratio of the number of police to the population, population density and poverty. It can be seen in Table 4. In the clustering process using the K-Means method, it will be conducted on 34 districts/cities in Central Java.
a. Determine the initial random centroid b. Calculate the shortest distance, it can be seen in Table 5. From the research clustering process that has been manually calculated, that the repetition is carried out until the 4th iteration with the results shown in Table 6. Table 6 is the result of distance calculation in the 4th iteration. Then determine the members of each cluster by selecting the smallest value. To determine the cluster center by counting the number of members in each cluster, divide the number of cluster members to produce a cluster center. The results of these calculations are in Table 7.  System implementation, the mapping system by importing data from the database into the application and shown in Figure 3. This page is used to see the results of clustering and the results of the clustering page will be shown in Figure 4.
Black box testing or commonly known as functional testing is a software testing method used to test software without knowing the internal structure of the code or program.
Tests carried out on this application system are shown in Table 8.    (6).
From the results of Table 11, it can be concluded that the results of the clustering were tested by calculating the silhouette coefficient validation resulting in the value of SC in each cluster, there are 5 medium structure data and 29 weak structure data. It can be seen in Table 11. This mapping application uses the K-Means method, can map crime-prone areas in Central Java. Users can find out information on the location of crime-prone areas in the system in the form of maps. In the black box testing functionality, the Pamsimas mapping system is free of error syntax and functionally displays the expected results. In testing the validity using silhouette coefficients produce SC values in each cluster, there are 5 medium structure data and 29 weak structure data.
This system can be developed by adding or changing the factors that influence the level of crime vulnerability in the area of Central Java. This system can be developed by providing additional features of map data print out and clustered data as print data output, so that this system can provide data recapitulation.