Wednesday, September 9, 2009

Activity 15 - Probabilistic Classification

From the previous activity, red blood cells (RBCs) are classified as normal or crenated by using visual information obtained by image processing as basic features that are the basis for discriminating the normal from the crenated RBCs. More specifically, minimum distance classification was used in classification by basically determining to which mean set of features (either for normal or for crenated) the object feature set to be classified is closest to.
In this activity, a different technique called linear discriminant analysis (LDA) is applied for classifying object features by minimizing the error in the classification of the objects. Linear discriminant analysis is based on conditional probability such that from the occurence of a given set of measurements, there is a probability for one object, for example, to be normal or crenated. Determining this probability is not direct. But it has already been shown [1] that this can be related to the conditional probability that an object or set of objects is known to be one class or the other (in this case normal or crenated), a set of measurements are taken for that class. More specifically, the classification is done based on the formula below:

where μi is the mean feature set for the class i, C is the pooled covariance matrix*, both obtained from the training set, xk is the test object feature set to be classified, and pi is the prior probability of that class**. The quantity fi represents the probability that the given set of features of a test object belongs to class i. This suggests that the larger fi means it belongs to that class. For example, if the calculated f for normal is larger than that calculated for the crenated, then, the object is classified as normal. Otherwise, it is crenated.

*pooled covariance matrix - weighted sum of the covariance matrix of the set of features of the object in class for all classes with weights depending on the number of objects in that class over the total number of objects in the test training set
**prior probability - assumed as the number of objects in that class in the training set over the total number of objects in the training set

Using the training set features and the test set features obtained from the previous activity, linear discriminant analysis is applied.



From the results presented in the table above, the improvement in the classification was only observed for trial 2. More specifically, the improvement was in the classification of the crenated RBCs. From the scatter plot of the object features, the crenated RBCs in trial 2 are very close to the region of the normal RBCs. LDA was able to discriminate more crenated RBCs in this region, however, only improving the classification by approximately 2%. Actually, the 2% increase represents only one additional correctly classified crenated RBC. There are no more improvements obtained for trial 1 and the classification of normal RBCs in trial 2 because minimum distance classification was already sufficient. The features correctly classified have probabilities that are already reflected in their distances from the mean for normal and crenated RBCs. Still, LDA can be used as a more stringent classifier as compared to the minimum distance classification.
In this activity, I would like to give myself a grade of 10 for successfully implementing LDA to the data obtained from the previous activity. I was also able to somehow discuss the process and show the advantage of LDA.
I would like to thank Dr. Gay Jane Perez for guiding is in this activity, and to Ms. Jica Monsanto for some discussions.

Reference
[1] http://people.revoledu.com/kardi/tutorial/LDA/LDA.html#LDA
[2] http://en.wikipedia.org/wiki/Conditional_probability

No comments:

Post a Comment