GQM, CMM, ISO standards and frameworks

Many software metrics programmes have failed because they had poorly defined, or even non-existent objectives. To counter this problem Vic Basili and his colleagues at Maryland University developed a rigorous goal oriented approach to measurement [Basili and Rombach 1988]. Because of its intuitive nature the approach has gained widespread appeal. The fundamental idea is a simple one; managers proceed according to the following three stages:

Figure 7.1 illustrates how several metrics might be generated from a single goal.

The figure shows that the overall goal is to evaluate the effectiveness of using a coding standard. To decide if the standard is effective, several key questions must be asked. First, it is important to know who is using the standard, so that you can compare the productivity of the coders who use the standard with the productivity of those who do not. Likewise, you probably want to compare the quality of the code produced with the standard with the quality of non-standard code. To address these issues, it is important to ask questions about productivity and quality.

Once these questions are identified, you must analyze each question to determine what must be measured in order to answer the question. For example, to understand who is using the standard, it is necessary to know what proportion of coders is using the standard. However, it is also important to have an experience profile of the coders, explaining how long they have worked with the standard, the environment, the language, and other factors that will help to evaluate the effectiveness of the standard. The productivity question requires a definition of productivity, which is usually some measure of effort divided by some measure of product size. As shown in the figure, the metric can be in terms of lines of code, function points, or any other metric that will be useful to you. Similarly, quality may be measured in terms of the number of errors found in the code, plus any other quality measures that you would like to use.

In this way, you generate only those measures that are related to the goal. Notice that, in many cases, several measurements may be needed to answer a single question. Likewise, a single measurement may apply to more than one question. The goal provides the purpose for collecting the data, and the questions tell you and your project how to use the data.

*Goal*	*Questions*	*Metrics*
Plan	How much does the inspection process cost?	Average effort per KLOC Percentage of reinspections
	How much calendar time does the inspection process take?	Average effort per KLOC Total KLOC inspected
Monitor and control	What is the quality of the inspected software?	Average faults detected per KLOC Average inspection rate Average preparation rate
	To what degree did the staff conform to the procedures?	Average inspection rate Average preparation rate Average lines of code inspected Percentage of reinspections
	What is the status of the inspection process?	Total KLOC inspected
Improve	How effective is the inspection process?	Defect removal efficiency Average faults detected per KLOC Average inspection rate Average preparation rate Average lines of code inspected
	What is the productivity of the inspection process?	Average effort per fault detected Average inspection rate Average preparation rate Average lines of code inspected

GQM is in fact only one of a number of approaches for defining measurable goals that have appeared in the literature: the other most well known approaches are:

Process improvement is an umbrella term for a growing movement underpinned by the notion that all issues of software quality revolve around improving the software development process. Central to this movement has been the work of the Software Engineering Institute (SEI) at Carnegie Mellon promoting the Capability Maturity Model (CMM). The CMM has its origins in [Humphrey 1989] and the latest version is described in [Paulk et al 1994]. The development of the CMM was commissioned by the US DOD as a ramification of the problems experienced in their software procurement. They wanted a means of assessing the suitability of potential contractors. The CMM is a five-level model of a software development organisation's process maturity (based very much on TQM concepts), as shown in Figure 1.

By means of an extensive questionnaire, follow-up interviews and collection of evidence, software organisations can be 'graded' into one of the five maturity levels, based primarily on the rigour of their development processes. Except for level 1, each level is characterised by a set of Key Process Areas (KPA's). For example, the KPA's for level 2 are: requirements management, project planning, project tracking, subcontract management, quality assurance and configuration management. The KPA's for level 5 are defect prevention, technology change management, and process change management.

Ideally, companies are supposed to be at level 3 at least to be able to win contracts from the DOD. This important commercial motivation is the reason why the CMM has such a high profile. Few companies have managed to reach as high as level 3; most are at level 1. Only very recently has there been evidence of any level 5 organisations; the best known is the part of IBM responsible for the software for NASA's space shuttle programme [Keller 1992].

The CMM is having a huge international impact, and this impact has resulted in significantly increased awareness and use of software metrics. The reason for this is that metrics are relevant in KPAs throughout the model. Table 7.2 presents an overview of the types of measurement suggested by each maturity level, where the selection depends on the amount of information visible and available at a given maturity level. Level 1 measurements provide a baseline for comparison as you seek to improve your processes and products. Level 2 measurements focus on project management, while level 3 measures the intermediate and final products produced during development. The measurements at level 4 capture characteristics of the development process itself to allow control of the individual activities of the process. A level 5 process is mature enough and managed carefully enough to allow measurements to provide feedback for dynamically changing the process during a particular project’s development.

*Maturity Level*	*Characteristics*	*Type of Metrics to Use*
5. Optimizing	Improvement fed back to the process	Process plus feedback for changing the process
4. Managed	Measured process	Process plus feedback for control
3. Defined	Process defined and institutionalized	Product
2. Repeatable	Process dependent on individuals	Project management
1. Initial	Ad hoc	Baseline

Despite its international acceptance, the CMM is not without criticism. The most serious accusation concerns the validity of the five-level scale itself. There is, as yet, no convincing evidence that higher rated companies produce better quality software. There have also been concerns regarding the questionnaire [Bollinger and McGowan 1991]. A European project (funded under the ESPRIT programme) that is closely related to the CMM is Bootstrap [Woda and Schynoll 1992]. The Bootstrap method is also a framework for assessing software process maturity; the key differences are that individual projects (rather than just entire organisations) can be assessed and the results of assessments can be any real numbers between 1 and 5. Thus, for example, a department could be rated at 2.6, indicating that it is 'better' than level 2 maturity (in CMM) but not good enough for level 3 in CMM.

The most recent development in the process improvement arena is SPICE (Software Process Improvement and Capability dEtermination). This is an international project [ISO/IEC 1993] whose aim is to develop a standard for software process assessment, building on the best features of the CMM, Bootstrap, and ISO9003 (described below).

There are now literally hundreds of national and international standards which are directly or indirectly concerned with software quality assurance. A general criticism of these standards is that they are overtly subjective in nature and that they concentrate almost exclusively on the development processes rather than the products [Fenton et al 1993]. Despite these criticisms the following small number of generic software QA standards are having a significant impact on software metrics activities for QA.

In Europe and also increasingly in Japan, the pre-eminent quality standard to which people aspire is based around the international standard, ISO 9001[ISO 9001]. This general manufacturing standard specifies a set of 20 requirements for a quality management system, covering policy, organisation, responsibilities, and reviews, in addition to the controls that need to be applied to life cycle activities in order to achieve quality products. ISO 9001 is not specific to any market sector; the software 'version' of the standard is ISO 9003 [ISO 9003]. The ISO 9003 standard is also the basis of the TickIT initiative that was sponsored by the UK Department of Trade and Industry [TickIT 1992]. Companies apply to become TickIT-certified (most of the key IT companies have already successfully achieved this certification); they must be fully re-assessed every three years.

Different countries have their own national standards based on the ISO9000 series. For example, in the UK, the equivalent is the BS5750 series. The EEC equivalent to ISO 9001 is EN29001.

ISO 9126 Software product evaluation: Quality characteristics and guidelines for their use

This is the first international standard to attempt to define a framework for evaluating software quality [ISO9126, Azuma 1993]. The standard defines software quality as:

Heavily influenced by the SQM approach described above, ISO 9216 asserts that software quality may be evaluated by six characteristics: functionality, reliability, efficiency, usability, maintainability and portability. Each of these characteristics is defined as a 'set of attributes that bear' on the relevant aspect of software, and can be refined through multiple levels of subcharacteristics. Thus, for example, reliability is defined as

Examples of possible definitions of subcharacteristics at the first level are given, but are relegated to Annex A, which is not part of the International Standard. Attributes at the second level of refinement are left completely undefined. Some people have argued that, since the characteristics and subcharacteristics are not properly defined, ISO 9126 does not provide a conceptual framework within which comparable measurements may be made by different parties with different views of software quality, e.g., users, vendors and regulatory agencies. The definitions of attributes like reliability also differ from other well-established standards. Nevertheless, ISO9126 is an important milestone in the development of software quality measurement.

This standard [IEEE 1061] was finalised in 1992. It does not prescribe any product metrics, although there is an Appendix which describes the SQM approach. Rather it provides a methodology for establishing quality requirements and identifying, analysing, and validating software quality metrics.

Measurement Frameworks and Standards