X hits on this document

PDF document

Mining Software Engineering Data - page 2 / 2





2 / 2

Figure 1. Overview of mining SE data

  • 1.

    Appreciate the latest advancement and success stories in the field of mining SE data;

  • 2.

    Conduct leading-edge research in the field of mining SE data;

  • 3.

    Apply data mining techniques on their own SE data using advanced data mining analysis tools and algo- rithms;

  • 4.

    Contrast their results relative to other work within the field;

  • 5.

    Recognize open problems and possible research direc- tions within the field.

  • 2.

    Detailed Overview

The tutorial will provide a good understanding of exist- ing research on mining SE data. The tutorial will categorize the existing research [9] in this field into three major per- spectives: data sources being mined, tasks being assisted, and mining techniques being used. Figure 1 shows such a categorization with the bottom part as a set of software engi- neering data being mined, the middle part as a set of mining techniques being used, and the top part as a set of software engineering tasks being assisted.

From the categorization, we intend to investigate the fol- lowing four issues. First, we intend to identify inherent challenges of mining software engineering data. We shall elaborate the essential requirements in software engineer- ing, and analyze the differences between mining software engineering data and mining other types of scientific and engineering data. We shall discuss what types of data min- ing techniques are desired in software engineering, and how

they should be customized to fit the requirements and char- acteristics of SE data.

Second, we intend to understand the current research and development frontier of data mining practice in soft- ware engineering. We shall summarize several kinds of data mining problems in software engineering that are under ac- tive investigation based on three major perspectives: data sources being mined, tasks being assisted, and mining tech- niques being used. Through this discussion, researchers can rapidly join this active research area and gain immediate access to commonly available mining techniques for real problems.

Third, we intend to analyze successful cases of mining SE data. We shall review and demonstrate briefly several research prototypes of data-mining systems for software en- gineering. Through the case studies, the participants can understand how to build a testbed for research and develop- ment.

Finally, we intend to give an overview on commonly used data mining tools. Our overview will help the par- ticipants gain a better understanding of available tools. The participants can use such tools in order to explore their data and integrate data mining techniques in their research and day to day work.


[1] The R Project for Statistical Computing. Available online at

http://www.r-project.org/. [2] Weka 3: Data Mining Software in Java. Available online at

http://www.cs.waikato.ac.nz/ml/weka/. [3] A. Chen, E. Chou, J. Wong, A. Y. Yao, Q. Zhang, S. Zhang,

and A. Michail. CVSSearch: Searching through source code using CVS comments. In Proceedings of the 17th Interna- tional Conference on Software Maintenance, pages 364–374,

Florence, Italy, 2001. [4] H. Gall, K. Hajek, and M. Jazayeri. Detection of logical cou-

pling based on product release history. In Proceedings of the 14th International Conference on Software Maintenance,

pages 190–198, Bethesda, Washington D.C., 1998. [5] T. L. Graves, A. F. Karr, J. S. Marron, and H. Siy. Predicting

fault incidence using software change history. IEEE Trans.

Softw. Eng., 26(7):653–661, 2000. [6] A. E. Hassan, A. Mockus, R. C. Holt, and P. M. Johnson.

Guest editor’s introduction: Special issue on mining software

repositories. IEEE Trans. Softw. Eng., 31(6):426–428, 2005. [7] M. Mendonca and N. L. Sunderhaft. Mining software engi-

neering data: A survey. A DACS state-of-the-art report, Data

& Analysis Center for Software, Rome, NY, 1999.











and predicting effort in software projects. In Proceedings of the 25th International Conference on Software Engineering,

pages 274–284, Portland, Oregon, 2003. [9] T. Xie. Bibliography on mining software engineering data.

Available online at http://ase.csc.ncsu.edu/dmse/.

Document info
Document views12
Page views12
Page last viewedSat Jan 21 00:38:57 UTC 2017