Multivariate analysis of five chicken breed in Indonesia based on microsatellite allele frequency

Objective: This study tries to examine several multivariate methods in classifying genetic diversity using microsatellite allele frequency data. Methods: This study used microsatellite allele frequency data from White Leghorn (n = 48), Kampung (n = 48), Pelung (n = 24), Sentul (n = 24), and Black Kedu (n = 25) from Indonesian Research Institute for Animal Production. Allele frequency data were analyzed by the Neighbor-Joining (NJ) method using the POPTREE2 program. The data was also analyzed by the Principal Component Analysis (PCA), Correspondence Analysis (CA), and Hierarchical Clustering on Principal Components (HCPC) methods using the factoextra and FactoMineR packages in the R 4.0.0 program. Correspondence Analysis (80.7%) can explain greater variation than PCA (58.9%). However, CA method generated different results compared to NJ, PCA, and HCPC. NJ, PCA, and HCPC found four chicken clusters, namely cluster 1 (White Leghorn), cluster 2 (Pelung), cluster 3 (Black Kedu), and cluster 4 (Kampung and Sentul). Conclusions: In conclusion, HCPC is a better multivariate method for analyzing allele frequency data than PCA and CA. HCPC can be used to analyze allele frequency data better than PCA, because HCPC is a combination of methods from hierarchical clustering and principal components.


INTRODUCTION
Chicken is the most livestock raised at home. Chicken meat and eggs are the most popular livestock products. The poultry business model is usually used as a poverty alleviation program as practised in Indonesia [1]. Furthermore, Indonesia has many local chicken genetic resources aimed at egg production, meat production, and ornamental animals. According to Nataamijaya [2], Indonesia has 32 native chicken breeds such as Kampung, Pelung, White kedu, Black kedu, Sentul, and Balenggek with various uses such as fighting, egg production, meat production, ornamental, traditional medicine, and there are also classified as endangered animals. In Indonesia, native chicken breeds are based on demographic differences so that genetic information needs to be studied further.
Genetic differences cause variations in phenotypes. Variations in phenotypes do not arise due to new mutations but can appear later from alleles that are segregating in the population [3]. Genetic diversity can be studied using molecular technology. Molecular markers commonly used as indicators of diversity are mitochondria DNA, Y chromosome, and microsatellite. Microsatellites were highly polymorphic due to their instability [4]. Because of its polymorphic nature, microsatellite is used as an excellent marker to determine genetic diversity within populations and between populations [5].
One indicator to study genetic diversity is to look at frequency alleles. Allele frequency is used to calculate the genetic distance of a population. It is from this genetic distance that a dendogram is built using methods known as Neighbor-Joining, UPGMA, Minimum Evolution, and Fitch-Margoliash [6]. Another method that can explore genetic diversity is multivariate methods. Multivariate methods can summarize the genetic variability without making assumptions about an evolution model, the absence of linkage disequilibrium, and it does not rely on Hardy-Weinberg equilibrium [7]. Principal Component Analysis (PCA) is a multivariate method used in analyzing microsatellite data on Indonesian animal genetic resources and is able to distinguish between species of cattle [8]. The purpose of this study was to examine several multivariate methods in classifying genetic diversity, especially in Indonesian chickens and provides recommendations of multivariate methods that can be used to see genetic differences based on allele frequency data.

Data analysis
Allele frequency data used are derived from Sartika [9]. The samples used were White leghorn (n = 48), Kampung (n = 48), Pelung (n = 24), Sentul (n = 24), and Black kedu (n = 25) from Indonesian Research Institute for Animal Production. The data were analyzed by the POPTREE2 [10] program to create Neighbor-Joining (NJ) trees with Nei's standard genetic distance (DST). Allele frequency data were also analyzed using three multivariate methods, namely PCA, Correspondence Analysis (CA), and Hierarchical Clustering on Principal Components (HCPC). Both analyzes were carried out using the factoextra [11] and FactoMineR [12] packages in the R 4.0.0 [13].

RESULT
Frequency allele data from microsatellite can be used by multivariate methods. The results generated by Principle Component Analysis (PCA) and Hierarchical Clustering on Principal Components (HCPC) were the same as the results of Neighbor-Joining (NJ) (Figure 1-3). Kampung Chicken has a close relationship with Sentul chickens, while Black Kedu, Pelung, and White Leghorn chickens form their clusters. Using Correspondence Analysis (CA), Sentul is more relative to Black Kedu ( Figure 4). Based on the value of dim1 (Dimension 1), Correspondence Analysis (80.7%) can explain greater variation than PCA (58.9%). However, CA result is different from that generated by NJ, PCA, and HCPC. NJ, PCA, and HCPC found four clusters, namely cluster 1 (White Leghorn), cluster 2 (Pelung), cluster 3 (Black Kedu), and cluster 4 (Kampung and Sentul).

DISCUSSION
Kampung and Sentul chickens are in one cluster. This happens because the Sentul chicken is a Kampung chicken from the Ciamis Regency, which is used as a producer of meat and eggs [14]. Black kedu formed its cluster. Black kedu originated from the district of Temanggung Central Java which is one of Indonesia's rare chickens and has a high potential for egg production among native chickens [15]. According to Dharmayanthi et al [16], Black Kedu chicken had a close relationship with Silkie chickens based on EDN3 gene. Therefore, Black kedu formed its cluster. Pelung chicken is a chicken that was developed in Cianjur [17]. Asmara et al [18] expected Red jungle fowl (Gallus gallus) was the ancestor of Pelung but, there is no data regarding the pedigree of this chicken. From the results obtained in this study, it is suspected that Pelung chickens are the result of cross-breeding. However, to support this hypothesis, comprehensive research must be carried out to see the genetic structure of pelung chickens. On the other hand, White Leghorn origin from Tuscany, Italy which was developed in the United States and Europe. White Leghorn have eye size and focal length match the larger body size compared to Red Junglefowls [19]. However, White Leghorn showed less intense fear-induced behaviours than the ancestors (Red Junglefowls) [20]. So, Less fear is characteristics of moredomesticated chicken breeds. The Neighbor-Joining was a well known distance-based phylogenetic strategy that computes a tree metric from a dissimilarity from biological data [21]. The UPGMA method assumes that all taxa have constant evolutionary rates. to build a more accurate phylogenetic trees, the Neighbor Joining (NJ) method can be used [6]. When compared, UPGMA has a rooted tree type, while NJ has an unrooted tree type. In this study, using NJ with previous study has the same results. The Principal Component Analysis (PCA) and Hierarchical Clustering on Principal Components (HCPC) results showed similarities to NJ except Correspondence Analysis (CA). CA showed the closeness between Sentul and Black Kedu, but other results (NJ, PCA, and HCPC) showed Sentul closer to Kampung. Based on morphology, Sentul chickens have high similarities with native chickens [22]. This shows that sentul is a native chicken raised in Ciamis with certain criteria such as feather color.
The Principal Component Analysis (PCA) method is widely used to find out admixture in a population [23]. The advantages of PCA are identifying genetic structures in enormous datasets in computational time that can be ignored, and the absence of assumptions about the genetic model of the underlying population [24]. PCA itself has been widely used to analyze microsatellite data, and in this study, it can be used to analyze allele frequency data and have similar results with the NJ method. PCA has been carried out to analyze chicken genome data used to determine genetic structure in the population [25; 26; 27].
Correspondence Analysis (CA) is a method for visualizing the relationship between observations and variables by allowing their partitions to be sets that are interconnected, thus revealing which hypotheses can be proposed to help lead to the discovery [28]. Furthermore, CA is used to visualize relationships between genes for finding genetic links. In this study, CA results were different from NJ, PCA, and HCPC. Hence, CA analysis is not suitable for use in analyzing frequency allele data. Even though,  in dimension 1 the explainable variation of CA is higher than PCA, CA is more suitable for categorical data. CA is usually used to analyze two-way data tables, including several sizes of relationships between rows and columns [29]. Hierarchical Clustering on Principal Components (HCPC) is a combination of Hierarchical Clustering with Principal Components of the PCA model [30]. HCPC is objective grouping techniques on the results of principal component analysis, which leads to better cluster solutions [29]. Therefore, the HCPC results are similar to PCA and the addition of clusters in the PCA results.

CONCLUSION
The Hierarchical Clustering on Principal Components (HCPC) results found four clusters, namely, cluster 1 (White Leghorn), cluster 2 (Pelung), cluster 3 (Black Kedu), and cluster 4 (Kampung and Sentul). HCPC, PCA and NJ produced similar results compared to CA. HCPC can be used to analyze allele frequency data better than PCA, because HCPC is a combination of methods from hierarchical clustering and principal components.