## Abstract

### Background

Brain Computer Interface (BCI) is explored as a new technology for communicating with computer over past few decades. It uses signals collected from brain to communicate, control or instruct computer or electronic devices. Analyzing the signals collected is most important task. If the collected data contains overlapping classes, directly applying classification techniques is inefficient.

### Methodology

To examine and analyze the data, clustering can be useful to exploit information about dispersion of different classes. In this paper, we propose an agglomerative method for clustering high dimensional EEG signal data using multi prototype approach. This bi-phase algorithm chooses appropriate representatives in first phase, and combines them in second phase. We use squared error clustering in first phase to produce multiple prototypes located in highly dense region. A new combination measure is also proposed using fuzzy logic, to evaluate degree of prototypes can be combined.

### Results

The proposed algorithm has same run time complexity as k-means. The proposed algorithm cluster the data with complex shapes and disperse. Experiments, carried out using synthetic and real datasets, demonstrate the performance of the proposed method in terms of accuracy and time.

## Keywords

## 1. Introduction

To obtain and examine signals produced in brain, Brain Computer Interface (BCI) is required. Its goal is to act as direct communication bridge between an external device and a human. This indicates that machine perceives, elaborates, and utilizes the neural impulses generated by a person's brain. Thus for enabling communication between human and system using brain signals, BCI is considered the best feasible way. It enables patient to put their views just by thinking. EEG is one such instrument to obtain signals from brain, called Electroencephalogram. Collection of brain signals is done using non-invasive, partially invasive or invasive techniques. In non-invasive technique, electrodes are mounted on human scalp to collect brain signals in microvolts, which are amplified and stored into computer for further processing. Diagnosis of tumours, stroke, and other major brain disorders is done with the help of EEG, but with advancements in anatomical imaging techniques such as Computer Tomography (CT) and Magnetic Resonance Imaging (MRI), use of EEG is declining. Analysing EEG signals can exploit the intent or action that user wants to perform. Clustering is first step towards extracting information about disperse, shape and size of the data. Existing clustering algorithms like fuzzy

^{1}

^{2}

^{3}

^{4}

^{5}

- Kottaimalai R.
- Rajasekaran M Pallikonda
- Selvam V.
- Kannapiran B.

Eeg signal classification using principal component analysis with neural network in brain computer interface applications.

in: Emerging Trends in Computing, Communication and Nanotechnology (ICE-CCN), 2013 International Conference on. IEEE,
2013: 227-231

*k*-means,^{6}

*c*-means cannot extract such information, whereas hierarchical clustering methods are time expensive. We propose a new prototype based clustering method in this paper to analyze EEG signal data, which takes linear time. Following is an introduction and discussion about some existing clustering methods.We can categorize clustering into below mentioned techniques: partitioning, density based, hierarchical, model based, grid based and soft computing. Hierarchical and partitioning are correlated as in hierarchical clustering is a collection of nested sequence of partition based clustering, in which each partitioned cluster is composed of hard partition of the dataset into multiple mutually disjoint subsets. A hard partition of a dataset Y = {

*y*_{1}*,y*_{2},…,*y*_{N}}, where each*y*_{p}{*p*= 1,…,*N*} is an*n*-dimensional feature vector, is a collection B = {*B*_{1}*,B*_{2}*,…,B*_{q}} of*q*non-overlapping data sets*B*_{i}6 =*ϕ*(non-empty clusters) such that*B*_{1}*UB*_{2}*U …UB*_{q}= Y and*B*_{i}∩*B*_{j}=*ϕ*for*i*6 =*j*. On relaxing the mutual disjunction condition i.e.*B*_{i}∩*B*_{j}=*ϕ*for*i*6 =*j*, the resultant partitions so formed are of overlapping nature. Partitions resulting from overlapping algorithms are soft i.e. a particular object belongs to one or more clusters completely or fuzzy i.e. a particular object belongs to one or more clusters up to a certain degree.In hierarchical method of clustering, clusters are constructed by repeatedly combining/partitioning the instances in bottom-up/top-down fashion. It can be further classified into two categories: agglomerative and divisive. In divisive approach, initially all entities belong to a single cluster. Successively, clusters as divided into sub-clusters till desired number of clusters are obtained. In agglomerative approach, each entity is an isolated cluster. Afterwards, merging of clusters is performed until required number of clusters are obtained. Hierarchical methods result in dendrogram in which objects are grouped in a nested fashion and the groupings are based on similarity levels. Finally clusters are formed by cutting the dendrogram at desired similarity level.

A prototype is a representative element which captures the distribution set of data points based on similarity to prototype. General clustering methods use single prototype for representing a cluster. But a single prototype may not be sufficient to model complex dataset as evident in Fig. 1 as a result; we have adopted multi-prototype approach in our algorithm for better accuracy. In this paper, based on observations mentioned above, we study a multi-prototype based fuzzy clustering algorithm, which exploits advantage of multi-prototype as well as fuzzy clustering. Our approach consists of two parts: (1) first part deals with finding out number of prototypes so as to accurately represent the data distribution; and (2) second part deals with forming clusters with aforementioned number of clusters and then combining clusters to form desired clusters.

## 2. Related work

In this section, we review some existing work related to the proposed algorithm. We discuss fuzzy

*c*-means algorithm. Special case (*m*= 2) of fuzzy*c*-means was reported by Joe Dunn in 1974. Later, Jim Bezdek came up with the general case (for any*m*> 1).### 2.1 Fuzzy C-Means (FCM) algorithm

Fuzzy clustering is an unsupervised approach for soft partitioning of data in which each sample point can belong to all clusters by a certain degree. The degree is referred to as membership and is given by a membership function ranging from zero to one. In certain situations, fuzzy clustering adapts naturally than hard clustering. Sample in between classes have natural tendency to belong to several classes and not to fully belong to one class. They are thus allotted membership degrees ranging from 0 to 1 representing partial membership. Thus, fuzzy partitioning form the basis of FCM where in data point can belong to all clusters with different membership. The sum of all sample's memberships must be unity. The FCM algorithm is given in Algorithm 1. Here,

^{7}

*m*represents any real number larger than 1,*u*_{ij}represents membership of sample*x*_{i}in cluster*j*,*x*_{i}is*i*th dimension of measured data,*c*_{j}represents*d*- dimensional cluster center.According to Algorithm 1, each data point is assigned to a particular data center based on its distance from cluster centers. Memberships of each data point and cluster centers are updated after each iteration.

Though the algorithm is known to converge, it is computationally intensive. Also, it is sensitive to initialization of involved parameters as well as noise present in the dataset. While dealing with outliers, a low or no membership degree is expected.

### 2.2 Multi-prototype based clustering algorithm

In pattern space, highly dense continuous regions separated by sparse areas are said to contain natural clusters. Single prototype representing single cluster often results in cluster boundaries resulting in region of high density. In loses information about cluster disperse and density, since it relays on dimensionality reduction. The sample points that belong to two prototypes are projected on line, and histogram plot with Gaussian filter are used to measure the separation between them. Though the algorithm is known to produce good results on synthetic and real datasets, it fails to accurately cluster the dataset containing clusters of different disperse, like Zahn et al. compound data set. This dataset contains six clusters as shown in Fig. 2.

10

, relatively large numbers of prototypes were used with small clusters. Then using the proposed separation measure, sub-clusters are combined to form required number of clusters. The separation measure^{10}

^{11}

Clusters

*C*1 and*C*2 have different disperse, whereas*C*3 and*C*4 have varying density. Focusing on the information available within a prototype (ergo density or disperse), two prototypes can be moulded into one using different combination measure. For example, if our focus is density measure of the prototypes, clusters*C*1 and*C*2 can be obtained by using multiple prototypes easily to obtain good results. The 2 round MST produces good results for this data, but construction of MST for large/high dimensional dataset is time complex, and the algorithm is not robust to outliers.## 3. Proposed algorithm

Our proposed algorithm uses hierarchical agglomerative clustering method as basis. Following is standard agglomerative clustering from the literature and. Initially, we form

^{9}

^{10}

*p*prototypes for given dataset such that*p > k*, which are then combined iteratively in combination phase by using combination measure. Since number of prototypes chosen are comparatively less (*p « n*) combining process takes.Only linear time. Further we discuss these phases of our method, which includes the initialization and combination phase.

### 3.1 Initialization phase

Our algorithm uses multi-prototype strategy for accurate representation of data distribution. Initially,

*p*prototypes and corresponding*p*clusters are formed using squared error clustering algorithm. The selection of cluster prototypes is random. For accurate representation of data distribution, it is necessary for the clusters so formed to be compact as well as sufficiently separated from other clusters. So, in order to estimate value of*p*, we use density of data points. There are two cases possible:- 1.If the number of data points is less than a threshold value, then we remove the prototype and merge the cluster with its nearest neighbor and
- 2.If the number of data points in cluster is greater than a threshold value and squared-error is more than a threshold value, then we split the cluster using squared error clustering algorithm into two clusters.

We define one more threshold value, with which we control number of prototypes generated in initialization phase. Authors in Ref.

10

have generated huge number of prototypes for their proposed method. Since time complexity of their proposed method is *O*(*p*^{2}), using large value of*p*increases running cost of algorithm. But in our proposed method, we use*K*-Nearest Neighbor (KNN) graphs^{13}

^{,}to avoid depending on number of prototypes in combination phase. Hence, the overall time complexity of proposed method reduces to linear.Minimum Spanning Tree (MST) is a popular method when using graph based clustering. In

12

, have used two rounds of MST to cluster the data. Time complexity for constructing MST is *O*(*n*^{2}),*n*being total number of points in the dataset. More efficient way to generate graph is KNN, which uses only*O*(*v*log*k*) time to generate graph, where*v*is number of vertices in graph, and*k*as number of nearest neighbors. That is why, we use KNN graphs in initialization phase to reduce overall time complexity.The basis for initialization phase is that multiple prototypes co-existing within a continuous region of relatively high density are grouped, to form a cluster based on the combination measure.

Unlike MST, KNN graph may be a disconnected graph, and may contain number of connected components. The prototypes that compose a single dense cluster are much closer to each other than remainder prototypes. Fig. 3 shows KNN graph constructed for aggregation dataset with

^{15}

*k*= 3. Note that this graph is constructed after obtaining*p*prototypes from initialization phase. Observing Fig. 3, prototypes that form connected component together are dense clusters.Construct knn graph

*G*(*V,E*) Here*N*is total number of data points in dataset,*p*is number of prototypes randomly chosen,*N*_{i}is number of data points in*i*th cluster,*E*_{i}is squared-error for*i*th cluster,*E*_{k}is squared-error for*k*th cluster,*C*_{i}is*i*th cluster-center or prototype and*P*_{i}is*i*_{th}cluster. Please note that we use 0–1 normalized densities, since we need to calculate density deviation*Dmax*as percentage. For constructing KNN graph we consider*k*= 3, that is we consider up to only 3 nearest neighbors.### 3.2 Partitioning phase

In second phase we extract such group of prototypes as a partition. We propose Algorithm 4, which forms such partitions given a KNN graph. We adopted Depth First Search (DFS) traversal method in proposed algorithm. The proposed algorithm processes given graph to discover connected components and adds all the vertices that are embedded inside a cycle into one partition. This algorithm also discovers single link from one cycle to another. For example vertices pair (

*V*_{67}*,V*_{97}) shares a single linkage in Fig. 3. We call these vertices as*joinvertices*, since they are the potential candidates for joining two partitions.If less than

*C*partitions are discovered by this algorithm, we need to increase number of prototypes, since we need at least*C*partitions to proceed. If there are too few prototypes formed in initialization phase, a case arises where resulting KNN graph contains only one such connected component. And thus we need to increase number of prototypes, and so we use*Dmax*threshold value to control it. To increase number of prototypes, the upper bound*maxDeviationAllowed*is lowered by some constant value. We make sure that we find at least*C*partitions at the end of this phase.### 3.3 Combination phase

As discussed in section 2.2 combination measure is prime ingredient in agglomerative methods. The information that is considered as foundation for molding two prototypes changes the complete behavior of algorithm. For handling data containing clusters of different disperse (for example Zahn's Compound data) we need to consider most of the information that is available with a prototype. Our algorithm captures such information. n this section we propose combination measure for our algorithm. The combination measure is required to group the

^{11}

*P*sub-clusters formed during initialization phase to form final*C*clusters.We use fuzzy

*c*-means belongingness measure as basis for calculating the degree of combination between every two prototype. We also experimented with commonly known combination measure functions like Radial, Frechet distribution function. In following proposed algorithm, we include all three functions but exactly one of them is used at any time. In experiment section we enlisted comparative results obtained by all functions. For a given set of points X, belongingness of each point to prototype*P*_{i}with mean*C*_{j}is calculated. We define combination measure as ratio between measure of belongingness of data points of a cluster to itself and to other cluster.If number of partitions obtained from second phase are same as

*C*, then we skip third phase, or else we calculate combination measure for join vertices using Algorithm 5. The partitions sharing join vertices can then be combined iteratively, by picking up minimum measure of all. Partitions are combined until there are only*C*partitions left. These are final clusters.### 3.4 Complexity analysis

For calculating time complexity of our algorithm, we also take into account time complexity of In Initialization phase we use

*k*-means and fuzzy*c*-means.^{16}

*k*-means clustering, with*t*be number of iteration needed to find*P*prototypes. Thus time complexity for initialization is*O*(*PNt*). For calculating belongingness matrix U, time complexity needed is*O*(*PN*). For constructing KNN graph for*P*vertices, time complexity required is*O*(*P*log*k*). In second phase, every vertex of graph is visited only once, so considering*P*prototypes as vertices, time complexity of proposed algorithm in second phase is*O*(*P*). And for calculating belongingness measure between join vertices is linear. Therefore, total time complexity of our algorithm is*O*(*PNt*+*P*log*k*).## 4. Experimental results

### 4.1 Experimental setup

All the experiments were conducted using Matlab R2015a tool on system with RAM size of 4 Gb, and Intel i3 processor speed of 3.4 GHz. The proposed method is experimented on both synthetic and real datasets to assess its performance. The details of datasets are given in Table 1. The results are compared with

*k*-means and fuzzy*c*-means clustering algorithms in terms of accuracy, and time complexity.Table 1Accuracy and Time assessment of k-means, fuzzy c-means and proposed algorithm.

Dataset | Method | Dimensionality | # patterns | # clusters |
---|---|---|---|---|

Synthetic | Aggregation Compound | 2 2 | 788 399 | 7 6 |

Chameleon | 2 | 8000 | 6 | |

Real | EEG-Alcohol EEG-eye-state | 64 15 | 10960 14980 | 2 2 |

Iris | 4 | 150 | 4 | |

Wine quality | 12 | 4898 | 2 | |

Yeast | 8 | 1484 | 2 |

### 4.2 Results on synthetic data sets

The proposed method is experimented with three synthetic datasets namely Aggregation, Compound and Chameleon.

^{15}

^{11}

For more understanding, we have demonstrated working of our proposed algorithm on Compound dataset. These results are shown in Fig. 4. We represent clusters as ‘

*C*_{i}’ and prototypes as ‘*’. Fig. 4(a) represents the actual clusters in given dataset. Fig. 4(b) shows the constructed KNN for given prototypes. Fig. 4(c) shows the result obtained after completion of second phase. We discovered join vertices to be.{(

*V*_{15}*,V*_{49})}, by removing it results into four partitions, which are desired number of clusters. Therefore, we need not to go through the final phase. But if there are more than desired partitions, even after deleting first minimum join vertices pair, we go for second minimum value calculated in third phase, say*C*_{ij}. This implies that prototypes i and j are more similar among all the pairs. Therefore, we combine prototype*P*_{i}and*P*_{j}into one. We iteratively follow same procedure until there are only*C*partitions remaining. These*C*partitions are considered as final clusters.Further, we experimented our proposed algorithm on Aggregation data obtained from Fig. 5 shows clustering results for Aggregation data with 788 sample points, Fig. 5(a) shows actual clusters in the dataset, Fig. 5(b) and Fig. 5(c) are results obtained from fuzzy

^{15}

*c*-means and*k*-means clustering respectively, and Fig. 5(d) shows initial*P*prototypes located by initialization phase, and Fig. 5(e) is the result obtained by proposed method. Similarly, Fig. 6 shows the results for Chameleon dataset with 8000 sample points.From the above results, it is observed that the proposed algorithm generated better clusters on synthetic datasets when compared to

*k*-means and fuzzy*c*-means algorithms.### 4.3 Results on real data sets

Our proposed algorithm is experimented on two EEG signal datasets. Both the datasets are obtained from The first EEG brain dataset contains measurements from 64 electrodes placed on the scalp. Originally the dataset contains four attributes, but we pre-processed the dataset and extracted the data as 64 attributes. The following paragraph is brief description of dataset as well the extraction of data.

The dataset contains measurements from 64 electrodes placed on the scalp of human being sampled at 256 Hz. There are two groups of subjects, alcoholic and control. EEG signals from 64 electrodes are gathered for alcoholic as well as control person. Each person weather alcoholic or control goes through number of trials. Trial can be described as; person is presented with picture of Snodgrassor and Vanderwart, let us say S1 and S2. Experiment consisted of two conditions, non-matched condition where S1 differed from S2 and matched condition where S1 was identical to S2. 120 trials of varying stimuli are completed by each of 122 subjects. Each trial is stored in its own file, containing information about type of person (alcoholic or control), matching condition, trial number, sensor position, sample number (0–255), and sensor value (in micro volts). We extracted the data from each file as (person id, trail number, SN1, SN2,. . ., SN64, Type of person (alcoholic or control)), where SNi represents average value of ith sensor node obtained from its 255 sample values. We consolidated all data into a single file. After dropping missing trials and missing value tuples we got total 10960 records. Since person id, trail number and Type of person attributes are incompetent, we dropped these attributes. So the obtained data is in (SN1, SN2,

*…,*SN64) - 64 dimensional format. Out of 10960 tuples, 7035 belongs to alcoholic persons, and 3925 belongs to control persons. We balanced two classes by simply making two batches as (1)first 3518 record of alcoholic with 3925 records of control, (2)last 3516 record of alcoholic with 3925 records of control. We aggregated the final result into one for simplicity. We applied*k*-means, fuzzy*c*-means and our proposed algorithm to the processed dataset and comparison of obtained results with respect to time and efficiency is shown in Table 2.Table 2Comparison of different clustering techniques.

The obtained results imply that the given dataset contains overlapping clusters, and maximum efficiency that can be obtained is approximately equal to 64%.

Therefore, classification task for given EEG dataset cannot achieve more than 64% accuracy. We tried classification of the processed dataset by using artificial neural networks, 64% accuracy was obtained by Multi-Layer Feed Forward Neural Network trained with Scaled Conjugate Gradient method. The resultant trained network classified every test pattern to alcoholic person class. It is impractical to relay on such classifier.

Further, proposed method applied on EEG Eye State Dataset which is 14 dimensional data, where state of eye (closed or open) is recorded with EEG signals from brain. In case of this dataset we obtained maximum accuracy of 44.9%. This pre-analysis of data can easily conclude that, dimensionality reduction techniques

^{19}

^{,}may provide better results than some neural network based classifiers.We also experimented with popular real datasets like Iris, Wine and Yeast. We accessed performance of proposed algorithm on basis of error rate, which is ratio given by wrongly clustered points to total number of points and time. These results are given in Table 3. It is observed from these results that proposed method generated better clusters in less time.

Table 3Accuracy and time assessment of k-means, fuzzy c-means and proposed algorithm.

Dataset | Method | Accuracy (%) | Time (Secs) |
---|---|---|---|

Iris | k-meansFuzzy c-means | 89.33 89.33 | 0.15 1.38 |

Proposed method | 89.33 | 0.10 | |

Wine | k-meansFuzzy c-means | 51.69 68.54 | 0.26 1.96 |

Proposed method | 72.47 | 0.10 | |

Yeast | k-meansFuzzy c-means | 72.84 86.93 | 2.10 26.61 |

Proposed method | 89.56 | 1.2 |

## 5. Conclusion

In this paper, a novel clustering algorithm is designed to cluster high dimensional data. This proposed clustering algorithm consists of three phases namely initialization, partitioning and combination phases. Initialization phase uses squared error function for initialization of

*P*prototypes in high dense region, partitioning phase combines the prototypes to form partitions, and combination phase combines partitions to form*C*clusters. Moreover, for effective clustering a combination measure is proposed to compute the degree of combining the prototypes.The proposed algorithm is experimented on both synthetic and real datasets of different shapes. Experimental results proved that the proposed method cluster the data with different disperse, shape and size more accurately than

*k*-means and fuzzy*c*means, yet require less time and space as compared to commonly used hierarchical clustering algorithms.## Funding

There is no funding source.

## Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

## References

- Brain–computer interfaces: a gentle introduction.in: Brain-Computer Interfaces. Springer, 2009: 1-27
- Brain-computer interfaces using electrocorticographic signals.
*IEEE Rev Biomed Eng.*2011; 4: 140-154 - Eeg instrumentation. Recommendations for the Practice of clinical neurophysiology.
*Guidel Int Fed Clin Neurophysiol.*1999; 52 - Brain magnetic resonance imaging with contrast dependent on blood oxygenation.
*Proc Natl Acad Sci Unit States Am.*1990; 87: 9868-9872 - Eeg signal classification using principal component analysis with neural network in brain computer interface applications.in: Emerging Trends in Computing, Communication and Nanotechnology (ICE-CCN), 2013 International Conference on. IEEE, 2013: 227-231
- Some methods for classification and analysis of multivariate observations.in: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. vol. 1. 1967: 281-297 (Oakland, CA, USA.)
- Fcm: the fuzzy c-means clustering algorithm.
*Comput Geosci.*1984; 10: 191-203 - Data Mining and Knowledge Discovery Handbook. 2. Springer, 2005
- A survey of evolutionary algorithms for clustering.
*IEEE Trans Syst Man Cybern C Appl Rev.*2009; 39: 133-155 - A multi-prototype clustering algorithm.
*Pattern Recogn.*2009; 42: 689-698 - Graph-theoretical methods for detecting and describing gestalt clusters.
*IEEE Trans Comput.*1971; 100: 68-86 - A graph-theoretical clustering method based on two rounds of minimum spanning trees.
*Pattern Recogn.*2010; 43: 752-766 - Outlier detection using knearest neighbour graph. In Pattern Recognition.in: ICPR 2004. Proceedings of the 17th International Conference on. vol. 3. IEEE, 2004: 430-433 (2004)
- Influence of graph construction on graph-based clustering measures.in: Advances in Neural Information Processing Systems. 2009: 1025-1032
- Clustering aggregation.
*ACM Trans Knowl Discov Data.*2007; 1: 4 - An impossibility theorem for clustering.in: Advances in Neural Information Processing Systems. 2003: 463-470
- Chameleon: hierarchical clustering using dynamic modeling.
*Computer.*1999; 32: 68-75 - UCI Machine Learning Repository.2013
- A Survey of Dimension Reduction Techniques.Lawrence Livermore National Lab., CA (US)2002 (Technical report)
- Principal component analysis.
*Chemometr Intell Lab Syst.*1987; 2: 37-52

## Article info

### Publication history

Published online: April 09, 2020

Accepted:
March 24,
2020

Received in revised form:
March 20,
2020

Received:
October 22,
2019

### Identification

### Copyright

© 2020 INDIACLEN. Published by Elsevier, a division of RELX India, Pvt. Ltd. All rights reserved.