History of software metrics as a subject area
To assess the current status of software metrics, and its successes and failures, we need to consider first its history. Although the first dedicated book on software metrics was not published until 1976 [Gilb 1976], the history of active software metrics dates back to the late-1960s. Then the Lines of Code measure (LOC or KLOC for thousands of lines of code) was used routinely as the basis for measuring both programmer productivity (LOC per programmer month) and program quality (defects per KLOC). In other words LOC was being used as a surrogate measure of different notions of program size. The early resource prediction models (such as those of [Putnam 1978] and [Boehm 1981]) also used LOC or related metrics like delivered source instructions as the key size variable. In 1971 Akiyama [Akiyama 1971] published what we believe was the first attempt to use metrics for software quality prediction when he proposed a crude regression-based model for module defect density (number of defects per KLOC) in terms of the module size measured in KLOC. In other words he was using KLOC as a surrogate measure for program complexity.
The obvious drawbacks of using such a crude measure as LOC as a surrogate measure for such different notions of program size such as effort, functionality, and complexity, were recognised in the mid-1970s. The need for more discriminating measures became especially urgent with the increasing diversity of programming languages. After all, a LOC in an assembly language is not comparable in effort, functionality, or complexity to a LOC in a high-level language. Thus, the decade starting from the mid-1970s saw an explosion of interest in measures of software complexity (pioneered by the likes of [Halstead 1977] and [McCabe 1976]) and measures of size (such as function points pioneered by [Albrecht 1979] and later by [Symons 1991]) which were intended to be independent of programming language.
Work on extending, validating and refining complexity metrics (including applying them to new paradigms such as object oriented languages [Chidamber and Kemerer 1994]) has been a dominant feature of academic metrics research up to the present day [Fenton 1991, Zuse 1991].
In addition to work on specific metrics and models, much recent work has focused on meta-level activities, the most notable of which are:
- Work on the mechanics of implementing metrics programs. Two pieces of work stand out in this respect:
- The work of [Grady and Caswell 1987] ( later extended in [Grady 1992]) which was the first and most extensive experience report of a company-wide software metrics program. This work contains key guidelines (and lessons learned) which influenced (and inspired) many subsequent metrics programs.
- The work of Basili, Rombach and colleagues on GQM(Goal-Question Metric) [Basili and Rmbach 1988]. By borrowing some simple ideas from the Total Quality Management field, Basili and his colleagues proposed a simple scheme for ensuring that metrics activities were always goal-driven. A metrics program established without clear and specific goals and objectives is almost certainly doomed to fail [Hall and Fenton 1997]). Basilis high profile in the community and outstanding communications and technology transfer skills ensured that this important message was subsequently widely accepted and applied. That does not mean it is without its criticisms ([Bache and Neil 1995] and [Hetzel 1993] argue that the inherent top-down approach ignores what is feasible to measure at the bottom). However, most metrics programs at least pay lip service to GQM with the result that such programs should in principle be collecting only those metrics which are relevant to the specific goals.
- the use of metrics in empirical software engineering: specifically we refer to empirical work concerned with evaluating the effectiveness of specific software engineering methods, tools and technologies. This is a great challenge for the academic/research software metrics community. There is now widespread awareness that we can no longer rely purely on the anecdotal claims of self-appointed experts about which new methods really work and how. Increasingly we are seeing measurement studies that quantify the effectiveness of methods and tools. Basili and his colleagues have again pioneered this work (see, for example [Basili and Reiter 1981, Basili et al 1986]. Success here is judged by acceptance of empirical results, and the ability to repeat experiments to independently validate results.
- work on theoretical underpinnings of software metrics This work (exemplified by [Briand et al 1996, Fenton 1991, Zuse 1991] is concerned with improving the level of rigour in the discipline as a whole. For example, there has been considerable work in establishing a measurement theory basis for software metrics activities. The most important success from this work has been the acceptance of the need to consider scale types when defining measures with restrictions on the metrics analyses techniques that are relevant for the given scale type.
To go back to our resources section click here.[papers/_private/copyright_notice.html]
Last modified: July 28, 1999.