Interactive Network Exploration in the Kdd Process, Contributions in the Study of Population Variability of a Corn Fijivirus

Mario Alejandro Garcia, Mar

Abstract

The genetic variability of individuals of the same species can be studied through networks that represent the genetic distances between them. We studied the case of Mal de Rio Cuarto virus (MRCV), defining distance measures between genome profiles of different individuals and creating a network of haplotypes. Topological properties of the network were analyzed and this was examined in two dimensions, forming space-time environments. The examination led to the observation that, in the first crop years tested, the number of haplotypes and the distance between them was greater than in subsequent crops. A variability indicator was calculated for each environment and compared with its expected value, confirming the observation made during the examination and concluding that virus variability decreased after an epidemic occurred during the crop year 1996-97. An analysis of variability of MRCV through haplotype networks is presented. We propose the use of this tool, which is unusual in KDD processes, bringing a new approach that aspect the concepts of knowledge representation, structured data modeling, visualization, exploration and interactive discovery.\r\nThe main contribution of this case to the KDD process is the proposal of interactive exploration of networks, which turned out to be intuitive and easy to apply for analysis.

Relevant Publications in Data Mining in Genomics & Proteomics