Software Team Organization and Specialization:Quantifying Organizational Results

<< How Software Personnel Learn New Skills:The Evolution of Software Learning Channels

Project Management and Software Engineering:Software Progress and Problem Tracking >>

Chapter

Software Team Organization

and Specialization

Introduction

More than almost any other technical or engineering field, software devel-

opment depends upon the human mind, upon human effort, and upon

human organizations. From the day a project starts until it is retired

perhaps 30 years later, human involvement is critical to every step in

development, enhancement, maintenance, and customer support.

Software requirements are derived from human discussions of appli-

cation features. Software architecture depends upon the knowledge of

human specialists. Software design is based on human understanding

augmented by tools that handle some of the mechanical aspects, but

none of the intellectual aspects.

Software code is written line-by-line by craftspeople as custom arti-

facts and involves the highest quantity of human effort of any modern

manufactured product. (Creating sculpture and building special prod-

ucts such as 12-meter racing yachts or custom furniture require similar

amounts of manual effort by skilled artisans, but these are not main-

stream products that are widely utilized by thousands of companies.)

Although automated static analysis tools and some forms of auto-

mated testing exist, the human mind is also a primary tool for finding

bugs and security flaws. Both manual inspections and manual creation

of test plans and test cases are used for over 95 percent of software

applications, and for almost 100 percent of software applications larger

than 1,000 function points in size. Unfortunately, both quality and secu-

rity remain weak links for software.

As the economy sinks into global recession, the high costs and mar-

ginal quality and security of custom software development are going

275

276

Chapter Five

to attract increasingly critical executive attention. It may well be that

the global recession will provide a strong incentive to begin to migrate

from custom development to construction from standard reusable com-

ponents. The global recession may also provide motivation for designing

more secure software with higher quality, and for moving toward higher

levels of automation in quality control and security control.

In spite of the fact that software has the highest labor content of any

manufactured product, the topic of software team organization struc-

ture is not well covered in the software literature.

There are anecdotal reports on the value of such topics as pair-pro-

gramming, small self-organizing teams, Agile teams, colocated teams,

matrix versus hierarchical organizations, project offices, and several

others. But these reports lack quantification of results. It is hard to

find empirical data that shows side-by-side results of different kinds of

organizations for the same kinds of applications.

One of the larger collections of team-related information that is avail-

able to the general public is the set of reports and data published by the

International Software Benchmarking Standards Group (ISBSG). For

example, this organization has productivity and average application

size for teams ranging between 1 and 20 personnel. They also have data

on larger teams, with the exception that really large teams in excess of

500 people are seldom reported to any benchmark organization.

Quantifying Organizational Results

This chapter will deal with organizational issues in a somewhat unusual

fashion. As various organization structures and sizes are discussed,

information will be provided that attempts to show in quantified form

a number of important topics:

1. Typical staffing complements in terms of managers, software engi-

neers, and specialists.

2. The largest software projects that a specific organization size and

type can handle.

3. The average size of software projects a specific organization size

and type handles.

4. The average productivity rates observed with specific organization

sizes and types.

5. The average development schedules observed with specific organi-

zation sizes and types.

6. The average quality rates observed with specific organization sizes

and types.

Software Team Organization and Specialization

277

7. Demographics, or the approximate usage of various organization

structures.

8. Demographics in the sense of the kinds of specialists often deployed

under various organizational structures.

Of course, there will be some overlap among various sizes and kinds

of organization structures. The goal of the chapter is to narrow down

the ranges of uncertainty and to show what forms of organization are

best suited to software projects of various sizes and types.

Organizations in this chapter are discussed in terms of typical depart-

mental sizes, starting with one-person projects and working upward

to large, multinational, multidisciplinary teams that may have 1,000

personnel or more.

Observations of various kinds of organization structures are derived

from on-site visits to a number of organizations over a multiyear period.

Examples of some of the organizations visited by the author include

Aetna Insurance, Apple, AT&T, Boeing, Computer Aid Incorporated

(CAI), Electronic Data Systems (EDS), Exxon, Fidelity, Ford Motors,

General Electric, Hartford Insurance, IBM, Microsoft, NASA, NSA,

Sony, Texas Instruments, the U.S. Navy, and more than 100 other

organizations.

Organization structures are important aspects of successful software

projects, and a great deal more empirical study is needed on organiza-

tional topics.

The Separate Worlds of Information

Technology and Systems Software

Many medium and large companies such as banks and insurance com-

panies only have information technology (IT) organizations. While there

are organizational problems and issues within such companies, there

are larger problems and issues within companies such as Apple, Cisco,

Google, IBM, Intel, Lockheed, Microsoft, Motorola, Oracle, Raytheon,

SAP and the like, which develop systems and embedded software as

well as IT software.

Within most companies that build both IT and systems software, the

two organizations are completely different. Normally, the IT organiza-

tion reports to a chief information officer (CIO). The systems software

groups usually report to a chief technology officer (CTO).

The CIO and the CTO are usually at the same level, so neither has

authority over the other. Very seldom do these two disparate software

organizations share much in the way of training, tools, methodologies,

or even programming languages. Often they are located in different

buildings, or even in different countries.

278

Chapter Five

Because the systems software organization tends to operate as a profit

center, while the IT organizations tends to operate as a cost center, there

is often friction and even some dislike between the two groups.

The systems software group brings in revenues, but the IT organi-

zation usually does not. The friction is made worse by the fact that

compensation levels are often higher in the systems software domain

than in the IT domain.

While there are significant differences between IT and systems soft-

ware, there are also similarities. As the global recession intensifies and

companies look for ways to save money, sharing information between

IT and systems groups would seem to be advantageous.

Both sides need training in security, in quality assurance, in testing,

and in software reusability. The two sides tend to be on different business

cycles, so it is possible that the systems software side might be growing

while the IT side is downsizing, or vice versa. Coordinating position open-

ings between the two sides would be valuable in a recession.

Also valuable would be shared resources for certain skills that both

sides use. For example, there is a chronic shortage of good technical

writers, and there is no reason why technical communications could not

serve the IT organization and the systems organization concurrently.

Other groups such as testing, database administration, and quality

assurance might also serve both the systems and IT organizations.

So long as the recession is lowering sales volumes and triggering

layoffs, organizations that employ both systems software and IT groups

would find it advantageous to consider cooperation.

Both sides usually have less than optimal quality, although systems

software is usually superior to IT applications in that respect. It is pos-

sible that methods such as PSP, TSP, formal inspections, static analysis,

automated testing, and other sophisticated quality control methods could

be used by both the IT side and the systems side, which would simplify

training and also allow easier transfers of personnel from one side to

the other.

Colocation vs. Distributed Development

The software engineering literature supports a hypothesis that develop-

ment teams that are colocated in the same complex are more productive

than distributed teams of the same size located in different cities or

countries.

Indeed a study carried out by the author that dealt with large soft-

ware applications such as operating systems and telecommunication

systems noted that for each city added to the development of the same

applications, productivity declined by about 5 percent compared with

teams of identical sizes located in a single site.

Software Team Organization and Specialization

279

The same study quantified the costs of travel from city to city. For one

large telecommunications application that was developed jointly between

six cities in Europe and one city in the United States, the actual costs of

airfare and travel were higher than the costs of programming or coding.

The overall team size for this application was about 250, and no fewer

than 30 of these software engineers or specialists would be traveling from

country to country every week, and did so for more than three years.

Unfortunately, the fact that colocation is beneficial for software is an

indication that "software engineering" is a craft or art form rather than

an engineering field. For most engineered products such as aircraft,

automobiles, and cruise ships, many components and subcomponents

are built by scores of subcontractors who are widely dispersed geograph-

ically. While these manufactured parts have to be in one location for

final assembly, they do not have to be constructed in the same building

to be cost-effective.

Software engineering lacks sufficient precision in both design and

development to permit construction from parts that can be developed

remotely and then delivered for final construction. Of course, software

does involve both outsourcers and remote development teams, but the

current results indicate lower productivity than for colocated teams.

The author's study of remote development was done in the 1980s,

before the Web and the Internet made communication easy across geo-

graphic boundaries.

Today in 2009, conference calls, webinars, wiki groups, Skype, and

other high-bandwidth communication methods are readily available.

In the future, even more sophisticated communication methods will

become available.

It is possible to envision three separate development teams located

eight hours apart, so that work on large applications could be transmit-

ted from one time zone to another at the end of every shift. This would

permit 24-hour development by switching the work to three different

countries located eight hours apart. Given the sluggish multiyear devel-

opment schedules of large software applications, this form of distributed

development might cut schedules down by perhaps 60 percent compared

with a single colocated team.

For this to happen, it is obvious that software would need to be an engi-

neering discipline rather than a craft or art form, so that the separate

teams could work in concert rather than damaging each other's results.

In particular, the architecture, design, and coding practices would have

to be understood and shared by the teams at all three locations.

What might occur in the future would be a virtual development environ-

ment that was available 24 hours a day. In this environment, avatars of

the development teams could communicate "face to face" by using either

their own images or generic images. Live conversations via Skype or the

280

Chapter Five

equivalent could also be used as well as e-mail and various specialized

tools for activities such as remote design and code inspections.

In addition, suites of design tools and project planning tools would also

be available in the virtual environment so that both technical and busi-

ness discussions could take place without the need for expensive travel.

In fact, a virtual war room with every team's status, bug reports, issues,

schedules, and other project materials could be created that might even

be more effective than today's colocated organizations.

The idea is to allow three separate teams located thousands of miles

apart to operate with the same efficiency as colocated teams. It is also

desirable for quality to be even better than today. Of course, with 24-hour

development, schedules would be much shorter than they are today.

As of 2009, virtual environments are not yet at the level of sophisti-

cation needed to be effective for large system development. But as the

recession lengthens, methods that lower costs (especially travel costs)

need to be reevaluated at frequent intervals.

An even more sophisticated and effective form of software engineer-

ing involving distributed development would be that of just-in-time

software engineering practices similar to those used on the construction

of automobiles, aircraft, and large cruise ships.

In this case, there would need to be standard architectures that sup-

ported construction from reusable components. The components might

either be already in stock, or developed by specialized vendors whose

geographic locations might be anywhere on the planet.

The fundamental idea is that rather than custom design and custom

coding, standard architectures and standard designs would allow con-

struction from standard reusable components.

Of course, this idea involves many software engineering technical

topics that don't fully exist in 2009, such as parts lists, standard inter-

faces, certification protocols for quality and security, and architectural

methods that support reusable construction.

As of 2009, the development of custom-built software applications

ranges between $1,000 per function point and $3,000 per function point.

Software maintenance and enhancements range between about $100

and $500 per function point per year, forever. These high costs make

software among the most expensive business "machines" ever created.

As the recession lengthens, it is obvious that the high costs of custom

software development need to be analyzed and more cost-effective meth-

ods developed. A combination of certified reusable components that could

be assembled by teams that are geographically dispersed could, in theory,

lead to significant cost reductions and schedule reductions also.

A business goal for software engineers would be to bring software

development costs down below $100 per function point, and annual

maintenance and enhancement costs below $50 per function point.

Software Team Organization and Specialization

281

A corollary business goal might be to reduce development schedules

for 10,000function point applications from today's averages of greater

than 36 calendar months down to 12 calendar months or less.

Defect potentials should be reduced from today's averages of greater

than 5.00 per function point down to less than 2.50 per function point.

At the same time, average levels of defect removal efficiency should

rise from today's average of less than 85 percent up to greater than 95

percent, and ideally greater than 97 percent.

Colocation cannot achieve such major reductions in costs, schedules,

and quality, but a combination of remote development, virtual develop-

ment environments, and standard reusable components might well turn

software engineering into a true engineering field, and also lower both

development and maintenance costs by significant amounts.

The Challenge of Organizing

Software Specialists

In a book that includes "software engineering" in the title, you might

suppose that the majority of the audience at which the book is aimed

are software engineers working on development of new applications.

While such software engineers are a major part of the audience, they

actually comprise less than one-third of the personnel who work on

software in large corporations.

In today's world of 2009, many companies have more personnel working

on enhancing and modifying legacy applications than on new develop-

ment. Some companies have about as many test personnel as they do

conventional software engineering personnel--sometimes even more.

Some of the other software occupations are just as important as soft-

ware engineers for leading software projects to a successful outcome.

These other key staff members work side-by-side with software engi-

neers, and major applications cannot be completed without their work.

A few examples of other important and specialized skills employed on

software projects include architects, business analysts, database admin-

istrators, test specialists, technical writers, quality assurance special-

ists, and security specialists.

As discussed in Chapter 4 and elsewhere, the topic of software spe-

cialization is difficult to study because of inconsistencies in job titles,

inconsistencies in job descriptions, and the use of abstract titles such

as "member of the technical staff " that might encompass as many as

20 different jobs and occupations.

In this chapter, we deal with an important issue. In the presence of so

many diverse skills and occupations, all of which are necessary for soft-

ware projects, what is the best way to handle organization structures?

282

Chapter Five

Should these specialists be embedded in hierarchical structures? Should

they be part of matrix software organization structures and report in to

their own chain of command while reporting via "dotted lines" to project

managers? Should they be part of small self-organizing teams?

This topic of organizing specialists is surprisingly ambiguous as of

2009 and has very little solid data based on empirical studies. A few

solid facts are known, however:

1. Quality assurance personnel need to be protected from coercion in

order to maintain a truly objective view of quality and to report

honestly on problems. Therefore, the QA organization needs to be

separate from the development organization all the way up to the

level of a senior vice president of quality.

2. Because the work of maintenance and bug repairs is rather differ-

ent from the work of new development, large corporations that have

extensive portfolios of legacy software applications should consider

using separate maintenance departments for bug repairs.

3. Some specialists such as technical writers would have little oppor-

tunity for promotion or job enrichment if embedded in departments

staffed primarily by software engineers. Therefore, a separate

technical publications organization would provide better career

opportunities.

The fundamental question for specialists is whether they should be

organized in skill-based units with others who share the same skills

and job titles, or embedded in functional departments where they will

actually exercise those skills.

The advantage of skill-based units is that they offer specialists wider

career opportunities and better educational opportunities. Also, in case

of injury or incapacity, the skill-based organizations can usually assign

someone else to take over.

The advantage of the functional organization where specialists are

embedded in larger units with many other kinds of skills is that the

specialists are immediately available for the work of the unit.

In general, if there are a great many of a certain kind of special-

ist (technical writers, testers, quality assurance, etc.), the skill-based

organizations seem advantageous. But for rare skills, there may not be

enough people in the same occupation for a skill-based group to even

be created (i.e., security, architecture, etc.).

In this chapter, we will consider various alternative methods for deal-

ing with the organization of key specialists associated with software.

There are more than 120 software-related specialties in all, and for

some of these, there may only be one or two employed even in fairly

large companies.

Software Team Organization and Specialization

283

This chapter concentrates on key specialties whose work is critical

to the success of large applications in large companies. Assume the

software organization in a fairly large company employs a total of 1,000

personnel. In this total of 1,000 people, how many different kinds of spe-

cialists and how many specific individuals are likely to be employed? For

that matter, what are the specialists that are most important to success?

Table 5-1 identifies a number of these important specialists and the

approximate distribution out of a total of 1,000 software personnel.

TABLE 5-1

Distribution of Software Specialists for 1,000 Total Software Staff

Number

Percent

Maintenance specialists

315

31.50%

Development software engineers

275

27.50%

Testing specialists

125

12.50%

First-line managers

120

12.00%

Quality assurance specialists

2.50%

Technical writing specialists

2.30%

Customer support specialists

2.00%

Configuration control specialists

1.50%

Second-line managers

0.90%

10.

Business analysts

0.80%

11.

Scope managers

0.70%

12.

Administrative support

0.70%

13.

Project librarians

0.50%

14.

Project planning specialists

0.50%

15.

Architects

0.40%

16.

User interface specialists

0.40%

17.

Cost estimating specialists

0.30%

18.

Measurement/metric specialists

0.30%

19.

Database administration specialists

0.30%

20.

Nationalization specialists

0.30%

21.

Graphical artists

0.30%

22.

Performance specialists

0.30%

23.

Security specialists

0.30%

24.

Integration specialists

0.30%

25.

Encryption specialists

0.20%

26.

Reusability specialists

0.20%

27.

Test library control specialists

0.20%

28.

Risk specialists

0.10%

29.

Standards specialists

0.10%

30.

Value analysis specialists

0.10%

TOTAL SOFTWARE EMPLOYMENT

1000

100.00%

284

Chapter Five

As can be seen from Table 5-1, software engineers do not operate all by

themselves. A variety of other skills are needed in order to develop and

maintain software applications in the modern world. Indeed, as of 2009,

the number and kinds of software specialists are increasing, although

the recession may reduce the absolute number of software personnel if

it lengthens and stays severe.

Software Organization Structures

from Small to Large

The observed sizes of software organization structures range from a low

of one individual up to a high that consists of multidisciplinary teams

of 30 personnel or more.

For historical reasons, the "average" size of software teams tends to be

about eight personnel reporting to a manager or team leader. However,

both smaller and larger teams are quite common.

This section of Chapter 5 examines the sizes and attributes of soft-

ware organization structures from small to large, starting with one-

person projects.

One-Person Software Projects

The most common corporate purpose for one-person projects is that of

carrying out maintenance and small enhancements to legacy software

applications. For new development, building web sites is a typical one-

person activity in a corporate context.

However, a fairly large number of one-person software companies

actually develop small commercial software packages such as iPhone

applications, shareware, freeware, computer games, and other small

applications. In fact, quite a lot of innovative new software and product

ideas originate from one-person companies.

Because small software maintenance projects are

Demographics

common, on any given day, probably close to 250,000 one-person projects

are under way in the United States, with the majority being mainte-

nance and enhancements.

In terms of one-person companies that produce small applications, the

author estimates that as of 2009, there are probably more than 10,000 in

the United States. This has been a surprisingly fruitful source of inno-

vation, and is also a significant presence in the open-source, freeware,

and shareware domains.

Project size The average size of new applications done by one-person

projects is about 50 function points, and the maximum size is below 1,000

Software Team Organization and Specialization

285

function points. For maintenance or defect repair work, the average size is

less than 1 function point and seldom tops 5 function points. For enhance-

ment to legacy applications, the average size is about 5 to 10 function

points for each new feature added, and seldom tops 15 function points.

Productivity rates Productivity rates for one-person efforts are usually

quite good, and top 30 function points per staff month. One caveat is

that if the one-person development team also has to write user manuals

and provide customer support, then productivity gets cut approximately

in half.

Another caveat is that many one-person companies are home based.

Therefore unexpected events such as a bout of flu, a new baby, or some

other normal family event such as weddings and funerals can have a

significant impact on the work at hand.

A third caveat is that one-person software projects are very sensitive

to the skill and work practices of specific individuals. Controlled experi-

ments indicate about a 10-to-1 difference between the best and worst

results for tasks such as coding and bug removal. That being said, quite

a few of the people who migrate into one-person positions tend to be at

the high end of the competence and performance scale.

Development schedules for one-person maintenance and

Schedules

enhancement projects usually range between a day and a week. For new

development by one person, schedules usually range between about two

months and six months.

Quality The quality levels for one-person applications are not too bad.

Defect potentials run to about 2.5 bugs per function point, and defect

removal efficiency is about 90 percent. Therefore a small iPhone applica-

tion of 25 function points might have a total of about 60 bugs, of which

6 will still be present at release.

Specialization You might think that one-person projects would be the

domain of generalists, since it is obvious that special skills such as

testing and documentation all have to be found in the same individual.

However, one of the more surprising results of examining one-person

projects is that many of them are carried out by people who are not

software engineers or programmers at all.

For embedded and systems software, many one-person software

projects are carried out by electrical engineers, telecommunication

engineers, automotive engineers, or some other type of engineer. Even

for business software, some one-person projects may be carried out by

accountants, attorneys, business analysts, and other domain experts who

are also able to program. This is one of the reasons why such a significant

286

Chapter Five

number of inventions and new ideas flow from small companies and

one-person projects.

The major caution about one-person

Cautions and counter indications

projects for either development or maintenance is lack of backup in case

of illness or incapacity. If something should happen to that one person,

work will stop completely.

A second caution is if the person developing software is a domain

expert (i.e., accountant, business analyst, statistician, etc.) who is

building an application for personal use in a corporation, there may be

legal questions involving the ownership of the application should the

employee leave the company.

A third caution is that there may be liability issues in case the soft-

ware developed by a knowledge worker contains errors or does some

kind of damage to the company or its clients.

One-person projects are the norm and are quite effective

Conclusions

for small enhancement updates and for maintenance changes to legacy

applications.

Although one-person development projects must necessarily be rather

small, a surprising number of innovations and good ideas have origi-

nated from brilliant individual practitioners.

Pair Programming for Software

Development and Maintenance

The idea of pair-programming is for two software developers to share one

computer and take turns doing the coding, while the other member of

the team serves as an observer. The roles switch back and forth between

the two at frequent intervals, such as perhaps every 30 minutes to an

hour. The team member doing the coding is called the driver and the

other member is the navigator or observer.

As of 2009, the results of pair programming are ambiguous. Several

studies indicate fewer defects from pair programming, while others

assert that development schedules are improved as well.

However, all of the experiments were fairly small in scale and fairly

narrow in focus. For example, no known study of pair-programming defects

compared the results against an individual programmer who used static

analysis and automatic testing. Neither have studies compared top-gun

individuals against average to mediocre pairs, or vice versa.

There are also no known studies that compare the quality results of

pair programming against proven quality approaches such as formal

design and code inspections, which have almost 50 years of empirical

data available, and which also utilize the services of other people for

finding software defects.

Software Team Organization and Specialization

287

While many of the pair-programming experiments indicate shorter

development schedules, none indicate reduced development effort or costs

from having two people perform work that is normally performed by one

person.

For pair programming to lower development costs, schedules would

have to be reduced by more than 50 percent. However, experiments

and data collected to date indicate schedule reductions of only about

15 percent to 30 percent, which would have the effect of raising develop-

ment costs by more than 50 percent compared with a single individual

doing the same work.

Pair-programming enthusiasts assert that better quality will com-

pensate for higher development effort and costs, but that claim is not

supported by studies that included static analysis, automatic testing,

formal inspections, and other sophisticated defect removal methods. The

fact that two developers who use manual defect removal methods might

have lower defects than one developer using manual defect removal

methods is interesting but unconvincing.

Pair programming might be an interesting and useful method for

developing reusable components, which need to have very high quality

and reliability, but where development effort and schedules are com-

paratively unimportant. However, Watts Humphrey's Team Software

Process (TSP) is also an excellent choice for reusable components and

has far more historical data available than pair programming does.

Subjectively, the pair-programming concept seems to be enjoyable to

many who have experienced it. The social situation of having another

colleague involved with complicated algorithms and code structures is

perceived as being advantageous.

As the recession of 2009 continues to expand and layoffs become more

numerous, it is very likely that pair programming will no longer be

utilized, due to the fact that companies will be reducing software staffs

down to minimal levels and can no longer afford the extra overhead.

Most of the literature on pair programming deals with colocation in

a single office. However, remote pair-programming, where the partners

are in different cities or countries, is occasionally cited.

Pair programming is an interesting form of collaboration, and collabo-

ration is always needed for applications larger than about 100 function

points in size.

In the context of test-driven development, one interesting variation

of pair programming would be for one of the pair to write test cases and

the other to write code, and then to switch roles.

Another area where pair programming has been used successfully

is that of maintenance and bug repairs. One maintenance outsource

company has organized their maintenance teams along the lines of an

urban police station. The reason for this is that bugs come in at random

288

Chapter Five

intervals, and there is always a need to have staff available when a new

bug is reported, especially a new high-severity bug.

In the police model of maintenance, a dispatcher and several pairs of

maintenance programmers work as partners, just as police detectives

work as partners.

During defect analysis, having two team members working side by

side speeds up finding the origins of reported bugs. Having two people

work on the defect repairs as partners also speeds up the repair inter-

vals and reduces bad-fix injections. (Historically, about 7 percent of

attempts to repair a bug accidentally introduce a new bug in the fix

itself. These are called bad fixes.)

In fact, pair programming for bug repairs and maintenance activities

looks as if it may be the most effective use of pairs yet noted.

Demographics Because pair programming is an experimental approach,

the method is not widely deployed. As the recession lengthens, there may

be even less pair-programming. The author estimates that as of 2009,

perhaps 500 to 1,000 pairs are currently active in the United States.

Project size The average size of new applications done by pair-program-

ming teams is about 75 function points, and the maximum size is fewer

than 1,000 function points. For maintenance or defect repair work, the

average size is less than 1 function point. For enhancement to legacy

applications, the average size is about 5 to 10 function points for each

new feature added.

Productivity rates for pair-programming efforts

Productivity rates

are usually in the range of 16 to 20 function points per staff month or

30 percent less than the same project done by one person.

Pair-programming software projects are very sensitive to the skill

and work practices of specific individuals. As previously mentioned, con-

trolled experiments indicate about a 10-to-1 range difference between

the best and worst results for tasks such as coding and bug removal by

individual participants in such studies.

Some psychological studies of software personnel indicate a tendency

toward introversion, which may make the pair-programming concept

uncomfortable to some software engineers. The literature on pair pro-

gramming does indicate social satisfaction.

Schedules Development schedules for pair-programming maintenance

and enhancement projects usually range between a day and a week.

For new development by pairs, schedules usually range between about

two months and six months. Schedules tend to be about 10 percent to

30 percent shorter than one-person efforts for the same number of func-

tion points.

Software Team Organization and Specialization

289

Quality The quality levels for pair-programming applications are not

bad. Defect potentials run to about 2.5 bugs per function point, and

defect removal efficiency is about 93 percent. Therefore, a small iPhone

application of 25 function points might have a total of about 60 bugs, of

which 4 will still be present at release. This is perhaps 15 percent better

than individual developers using manual defect removal and testing.

However, there is no current data that compares pair programming with

individual programming efforts where automated static analysis and

automated testing are part of the equation.

There are few studies to date on the role of specialization

Specialization

in a pair-programming context. However, there are reports of interest-

ing distributions of effort. For example, one of the pair might write test

cases while the other is coding, or one might write user stories while

the other codes.

To date there are no studies of pair programming that concern teams

with notably different backgrounds working on the same application;

that is, a software engineer teamed with an electrical engineer or an

automotive engineer; a software engineer teamed with a medical doctor;

and so forth. The pairing of unlike disciplines would seem to be a topic

that might be worth experimenting with.

Cautions and counter indications The topic of pair programming needs

additional experimentation before it can become a mainstream approach,

if indeed it ever does. The experiments need to include more sophisticated

quality control, and also to compare top-gun individual programmers.

The higher costs of pair programming are not likely to gain adherents

during a strong recession.

There is scarcely enough empirical data about pair pro-

Conclusions

gramming to draw solid conclusions. Experiments and anecdotal results

are generally favorable, but the experiments to date cover only a few

variables and ignore important topics such as the role of static analysis,

automatic testing, inspections, and other quality factors. As the global

recession lengthens and deepens, pair programming may drop from

view due to layoffs and downsizing of software organizations.

Self-Organizing Agile Teams

For several years, as the Agile movement gained adherents, the concept

of small self-organizing teams also gained adherents. The concept of

self-organized teams is that rather than have team members reporting

to a manager or formal team leader, the members of the team would

migrate to roles that they felt most comfortably matched their skills.

290

Chapter Five

In a self-organizing team, every member will be a direct contribu-

tor to the final set of deliverables. In an ordinary department with a

manager, the manager is usually not a direct contributor to the code

to deliverables that reach end users. Therefore, self-organizing teams

should be slightly more efficient than ordinary departments of the same

size, because they would have one additional worker.

In U.S. businesses, ordinary departments average about eight employ-

ees per manager. The number of employees reporting to a manager is

called the span of control. (The actual observed span of control within

large companies such as IBM has ranged from a low of 2 to a high of

30 employees per manager.)

For self-organizing teams, the nominal range of size is about "7 plus or

minus 2." However, to truly match any given size of software project, team

sizes need to range from a low of two up to a maximum of about 12.

A significant historical problem with software has been that of decom-

posing applications to fit existing organization structures, rather than

decomposing the applications into logical pieces based on the funda-

mental architecture.

The practical effect has been to divide large applications into multiple

segments that can be developed by an eight-person department whether

or not that matches the architecture of the application.

In an Agile context, a user representative may be a member of the

team and provides inputs as to the features that are needed, and also

provides experiential reports based on running the pieces of the applica-

tion as they are finished. The user representative has a special role and

normally does not do any code development, although some test cases

may be created by the embedded user representative. Obviously, the

user will provide inputs in terms of user stories, use cases, and informal

descriptions of the features that are needed.

In theory, self-organizing teams are cross-functional, and everyone

contributes to every deliverable on an as-needed basis. However, it is

not particularly effective for people to depart from their main areas of

competence. Technical writers may not make good programmers. Very

few people are good technical writers. Therefore, the best results tend

to be achieved when team members follow their strengths.

However, in areas where everyone (or no one) is equally skilled, all can

participate. Creating effective test cases may be an example where skills

are somewhat sparse throughout. Dealing with security of code is an

area where so few people are skilled that if it is a serious concern, out-

side expertise will probably have to be imported to support the team.

Another aspect of self-organizing teams is the usage of daily status

meetings, which are called Scrum sessions, using a term derived from

the game of rugby. Typically, Scrum sessions are short and deal with

three key issues: (1) what has been accomplished since the last Scrum

Software Team Organization and Specialization

291

session, (2) what is planned between today and the next Scrum session,

and (3) what problems or obstacles have been encountered.

(Scrum is not the only method of meeting and sharing information.

Phone calls, e-mails, and informal face-to-face meetings occur every day.

There may also be somewhat larger meetings among multiple teams,

on an as-needed basis.)

One of the controversial roles with self-organizing teams is that of

Scrum master. Nominally, the Scrum master is a form of coordinator

for the entire project and is charged with setting expectations for work

that spans multiple team members; that is, the Scrum master is a sort

of coach. This role means that the personality and leadership qualities

of the Scrum master exert a strong influence on the overall team.

Demographics Because Agile has been on a rapid growth path for sev-

eral years, the number of small Agile teams is still increasing. As of

2009, the author estimates that in the United States alone there are

probably 35,000 small self-organizing teams that collectively employ

about 250,000 software engineers and other occupations.

Project size The average size of new applications done by self-organizing

teams with seven members is about 1,500 function points, and the

maximum size is perhaps 3,000 function points. (Beyond 3,000 func-

tion points, teams of teams would be utilized.) Self-organizing teams

are seldom used for maintenance or defect repair work, since a bug's

average size is less than 1 function point and needs only one person. For

enhancements to legacy applications, self-organizing teams might be

used for major enhancements in the 150 to 500function point range.

For smaller enhancements of 5 to 10 function points, individuals would

probably be used for coding, with perhaps some assistance from testers,

technical writers, and integration specialists.

Although there are methods for scaling up small teams to encom-

pass teams of teams, scaling has been a problem for self-organizing

teams. In fact, the entire Agile philosophy seems better suited to

applications below about 2,500 function points. Very few examples

of large systems greater than 10,000 function points have even been

attempted using Agile or self-organizing teams.

Productivity rates Productivity rates for self-organizing teams on proj-

ects of 1,500 function points are usually in the range of 15 function

points per staff month. They sometimes top 20 function points per staff

month for applications where the team has significant expertise and may

drop below 10 function points per staff month for unusual or complex

projects.

292

Chapter Five

Productivity rates for individual sprints are higher, but that fact is

somewhat irrelevant because the sprints do not include final integration

of all components, system test of the entire application, and the final

user documentation.

Self-organizing team projects tend to minimize the performance

ranges of individuals and may help to bring novices up to speed fairly

quickly. However, if the range of performance on a given team exceeds

about 2-to-1, those at the high end of the performance range will become

dissatisfied with the work of those at the low end of the range.

Development schedules for new development by self-

Schedules

organizing teams for typical 1,500function point projects usually range

between about 9 months and 18 months and would average perhaps

12 calendar months for the entire application.

However, the Agile approach is to divide the entire application into a

set of segments that can be developed independently. These are called

sprints and would typically be of a size that can be completed in perhaps

one to three months. For an application of 1,500 function points, there

might be five sprints of about 300 function points each. The schedule

for each sprint might be around 2.5 calendar months.

Quality The quality levels for self-organizing teams are not bad, but

usually don't achieve the levels of methods such as Team Software

Process (TSP) where quality is a central issue. Typical defect potentials

run to about 4.5 bugs per function point, and defect removal efficiency

is about 92 percent.

Therefore, an application of 1,500 function points developed by a

self-organizing Agile team might have a total of about 6,750 bugs, of

which 540 would still be present at release. Of these, about 80 might

be serious bugs.

However, if tools such as automated static analysis and automated

testing are used, then defect removal efficiency can approach 97 percent.

In this situation, only about 200 bugs might be present at release. Of

these, perhaps 25 might be serious.

There are few studies to date on the role of specialization

Specialization

in self-organizing teams. Indeed, some enthusiasts of self-organizing

teams encourage generalists. They tend to view specialization as being

similar to working on an assembly line. However, generalists often have

gaps in their training and experience. The kinds of specialists who might

be useful would be security specialists, test specialists, quality assur-

ance specialists, database specialists, user-interface specialists, network

specialists, performance specialists, and technical writers.

Software Team Organization and Specialization

293

Cautions and counter indications The main caution about self-organizing

teams is that the lack of a standard and well-understood structure opens up

the team to the chance of power struggles and disruptive social conflicts.

A second caution is that scaling Agile up from small applications to

large systems with multiple teams in multiple locations has proven to

be complicated and difficult.

A third caution is that the poor measurement practices associated

with Agile and with many self-organizing teams give the method the

aura of a cult rather than of an engineering discipline. The failure either

to measure productivity or quality, or to report benchmarks using stan-

dard metrics is a serious deficiency.

Conclusions The literature and evidence for self-organizing Agile teams

is somewhat mixed and ambiguous. For the first five years of the Agile

expansion, self-organizing teams were garnering a majority of favorable

if subjective articles.

Since about the beginning of 2007, on the other hand, an increasing

number of articles and reports have appeared that raise questions about

self-organizing teams and that even suggest that they be abolished due

to confusion as to roles, disruptive power struggles within the teams,

and outright failures of the projects.

This is a typical pattern within the software industry. New develop-

ment methods are initially championed by charismatic individuals and

start out by gaining a significant number of positive articles and positive

books, usually without any empirical data or quantification of results.

After several years, problems begin to be noted, and increasing num-

bers of applications that use the method may fail or be unsuccessful. In

part this may be due to poor training, but the primary reason is that

almost no software development method is fully analyzed or used under

controlled conditions prior to deployment. Poor measurement practices

and a lack of benchmarks are also chronic problems that slow down

evaluation of software methods.

Unfortunately, self-organizing teams originated in the context of Agile

development. Agile has been rather poor in measuring either productiv-

ity or quality, and creates almost no effective benchmarks. When Agile

projects are measured, they tend to use special metrics such as story

points or use-case points, which are not standardized and lack empirical

collections of data and benchmarks.

Team Software Process (TSP) Teams

The concept of Team Software Process (TSP) was developed by Watts

Humphrey based on his experiences at IBM and as the originator of

the capability maturity model (CMM) for the Software Engineering

Institute (SEI).

294

Chapter Five

The TSP concept deals with the roles and responsibilities needed to

achieve successful software development. But TSP is built on individual

skills and responsibilities, so it needs to be considered in context with

the Personal Software Process (PSP). Usually, software engineers and

specialists learn PSP first, and then move to TSP afterwards.

Because of the background of Watts Humphrey with IBM and with

the capability maturity model, the TSP approach is congruent with the

modern capability maturity model integrated (CMMI) and appears to

satisfy many of the criteria for CMMI level 5, which is the top or highest

level of the CMMI structure.

Because TSP teams are self-organizing teams, they have a surface

resemblance to Agile teams, which are also self-organizing. However,

the Agile teams tend to adopt varying free-form structures based on the

skills and preferences of whoever is assigned to the team.

The TSP teams, on the other hand, are built on a solid underpinning

of specific roles and responsibilities that remain constant from project

to project. Therefore, with TSP teams, members are selected based on

specific skill criteria that have been shown to be necessary for successful

software projects. Employees who lack needed skills would probably not

become members of TSP teams, unless training were available.

Also, prior training in PSP is mandatory for TSP teams. Other kinds

of training such as estimating, inspections, and testing may also be used

as precursors.

Another interesting difference between Agile teams and TSP teams is

the starting point of the two approaches. The Agile methods were origi-

nated by practitioners whose main concerns were comparatively small

IT applications of 1,500 or fewer function points. The TSP approach was

originated by practitioners whose main concerns were large systems

software applications of 10,000 or more function points.

The difference in starting points leads to some differences in skill sets

and specialization. Because small applications use few specialists, Agile

teams are often populated by generalists who can handle design, coding,

testing, and even documentation on an as-needed basis.

Because TSP teams are often involved with large applications, they

tend to utilize specialists for topics such as configuration control, inte-

gration, testing, and the like.

While both Agile and TSP share a concern for quality, they tend to go

after quality in very different fashions. Some of the Agile methods are

based on test-driven development, or creating test cases prior to creat-

ing the code. This approach is fairly effective. However, Agile tends to

avoid formal inspections and is somewhat lax on recording defects and

measuring quality.

With TSP, formal inspections of key deliverables are an integral part, as

is formal testing. Another major difference is that TSP is very rigorous in

Software Team Organization and Specialization

295

measuring every single defect encountered from the first day of require-

ments through delivery, while defect measures during Agile projects are

somewhat sparse and usually don't occur before testing.

Both Agile and TSP may utilize automated defect tracking tools, and

both may utilize approaches such as static analysis, automated testing,

and automated test library controls.

Some other differences between Agile and TSP do not necessarily affect

the outcomes of software projects, but they do affect what is known about

those outcomes. Agile tends to be lax on measuring productivity and qual-

ity, while TSP is very rigorous in measuring task hours, earned value,

defect counts, and many other quantified facts.

Therefore, when projects are finished, Agile projects have only vague

and unconvincing data that demonstrates either productivity or qual-

ity results. TSP, on the other hand, has a significant amount of reliable

quantified data available.

TSP can be utilized with both hierarchical and matrix organization

structures, although hierarchical structures are perhaps more common.

Watts Humphrey reports that TSP is used for many different kinds of

software, including defense applications, civilian government applica-

tions, IT applications, commercial software in companies such as Oracle

and Adobe, and even by some of the computer game companies, where

TSP has proven to be useful in eliminating annoying bugs.

Demographics TSP is most widely used by large organizations that

employ between perhaps 1,000 and 50,000 total software personnel.

Because of the synergy between TSP and the CMMI, it is also widely

used by military and defense software organizations. These large organi-

zations tend to have scores of specialized skills and hundreds of projects

going on at the same time.

The author estimates that there are about 500 companies in the

United States now using TSP. While usage may be experimental in some

of these companies, usage is growing fairly rapidly due to the success

of the approach. The number of software personnel using TSP in 2009

is perhaps 125,000 in the United States.

Project size The average size of new applications done by TSP teams

with eight employees and a manager is about 2,000 function points.

However, TSP organizations can be scaled up to any arbitrary size, so

even large systems in excess of 100,000 function points can be handled

by TSP teams working in concert. For large applications with multiple

TSP teams, some specialist teams such as testing, configuration control,

and integration also support the general development teams.

Another caveat with multiple teams attempting to cooperate is that

when more than about a dozen teams are involved simultaneously,

296

Chapter Five

some kind of a project office may be needed for overall planning and

coordination.

Productivity rates for TSP departments on projects of

Productivity rates

2,000 function points are usually in the range of 14 to18 function points

per staff month. They sometimes top 22 function points per staff month

for applications where the team has significant expertise, and may drop

below 10 function points per staff month for unusual or complex proj-

ects. Productivity tends to be inversely proportional to application size

and declines as applications grow larger.

Schedules Development schedules for new development by TSP groups

with eight team members working on a 2,000function point project

usually range between about 12 months and 20 months and would aver-

age perhaps 14 calendar months for the entire application.

Quality The quality levels for TSP organizations are exceptionally good.

Average defect potentials with TSP run to about 4.0 bugs per func-

tion point, and defect removal efficiency is about 97 percent. Delivered

defects would average about 0.12 per function point.

Therefore, an application of 2,000 function points developed by a single

TSP department might have a total of about 8,000 bugs, of which 240 would

still be present at release. Of these, about 25 might be serious bugs.

However, if in addition to pretest inspections, tools such as automated

static analysis and automated testing are used, then defect removal

efficiency can approach 99 percent. In this situation, only about 80 bugs

might be present at release. Of these, perhaps 8 might be serious bugs,

which is a rate of only 0.004 per function point.

Generally, as application sizes increase, defect potentials also increase,

while defect removal efficiency levels decline. Interestingly, with TSP,

this rule may not apply. Some of the larger TSP applications achieve

more or less the same quality as small applications.

Another surprising finding with TSP is that productivity does not

seem to degrade significantly as application size goes up. Normally,

productivity declines with application size, but Watts Humphrey reports

no significant reductions across a wide range of application sizes. This

assertion requires additional study, because that would make TSP

unique among software development methods.

TSP envisions a wide variety of specialists. Most TSP

Specialization

teams will have numerous specialists for topics such as architecture,

testing, security, database design, and many others.

Interestingly, the TSP approach does not recommend software quality

assurance (SQA) as being part of a standard TSP team. This is because

Software Team Organization and Specialization

297

of the view that the TSP team itself is so rigorous in quality control that

SQA is not needed.

In companies where SQA groups are responsible for collecting quality

data, TSP teams will provide such data as needed, but it will be collected

by the team's own personnel rather than by an SQA person or staff

assigned to the project.

The main caution about TSP organiza-

Cautions and counter indications

tions and projects is that while they measure many important topics,

they do not use standard metrics such as function points. The TSP use

of task hours is more or less unique, and it is difficult to compare task

hours against standard resource metrics.

Another caution is that few if any TSP projects have ever submit-

ted benchmark data to any of the formal software benchmark groups

such as the International Software Benchmarking Standards Group

(ISBSG). As a result, it is almost impossible to compare TSP against

other methods without doing complicated data conversion.

It is technically feasible to calculate function point totals using sev-

eral of the new high-speed function point methods. In fact, quantifying

function points for both new applications and legacy software now takes

only a few minutes. Therefore, reporting on quality and productivity

using function points would not be particularly difficult.

Converting task-hour data into normal workweek and work-month

information would be somewhat more troublesome, but no doubt the

data could be converted using algorithms or some sort of rule-based

expert system.

It would probably be advantageous for both Agile and TSP projects

to adopt high-speed function point methods and to submit benchmark

results to one or more of the benchmark organizations such as ISBSG.

Conclusions The TSP approach tends to achieve a high level of successful

applications and few if any failures. As a result, it deserves to be studied

in depth.

From observations made during litigation for projects that failed or

never operated successfully, TSP has not yet had failures that ended up in

court. This may change as the number of TSP applications grows larger.

TSP emphasizes the competence of the managers and technical staff,

and it emphasizes effective quality control and change management

control. Effective estimating and careful progress tracking also are stan-

dard attributes of TSP projects. The fact that TSP personnel are carefully

trained before starting to use the method, and that experienced mentors

are usually available, explains why TSP is seldom misused.

With Agile, for example, there may be a dozen or more variations of

how development activities are performed, but they still use the name

298

Chapter Five

"Agile" as an umbrella term. TSP activities are more carefully defined

and used, so when the author visited TSP teams in multiple companies,

the same activities carried out the same way were noted.

Because of the emphasis on quality, TSP would be a good choice as the

construction method for standard reusable components. It also seems to

be a good choice for hazardous applications where poor quality might

cause serious problems; that is, in medical systems, weapons systems,

financial applications, and the like.

Conventional Departments with Hierarchical

Organization Structures

The concept of hierarchical organizations is the oldest method for

assigning social roles and responsibilities on the planet. The etymology

of the word "hierarchy" is from the Greek, and the meaning is "rule by

priests." But the concept itself is older than Greece and was also found

in Egypt, Sumer, and most other ancient civilizations.

Many religions are organized in hierarchical fashion, as are military

organizations. Some businesses are hierarchical if they are privately

owned. Public companies with shareholders are usually semi-hierarchical,

in that the operating units report upward level-by-level to the president

or chief executive officer (CEO). The CEO, however, reports to a board

of directors elected by the shareholders, so the very top level of a public

company is not exactly a true hierarchy.

In a hierarchical organization, units of various sizes each have a formal

leader or manager who is appointed to the position by higher authorities.

While the appointing authority is often the leader of the next highest

level of organization in the structure, the actual power to appoint is usu-

ally delegated from the top of the hierarchy. Once appointed, each leader

reports to the next highest leader in the same chain of command.

While appointed leaders or managers at various levels have author-

ity to issue orders and to direct their own units, they are also required

to adhere to directives that descend from higher authorities. Progress

reports flow back up to higher authorities.

In business hierarchies, lower level managers are usually appointed

by the manager of the next highest level. But for executive positions

such as vice presidents the appointments may be made by a committee

of top executives. The purpose of this, at least in theory, is to ensure

the competence of the top executives of the hierarchy. However, the

recent turmoil in the financial sector and the expanding global reces-

sion indicates that top management tends to be a weak link in far too

many companies.

It should be noted that the actual hierarchical structure of an orga-

nization and its power structure may not be identical. For example,

Software Team Organization and Specialization

299

in Japan during the Middle Ages, the emperor was at the top of the

formal government hierarchy, but actual ruling power was vested in a

military organization headed by a commander called the shogun. Only

the emperor could appoint the shogun, but the specific appointment

was dictated by the military leadership, and the emperor had almost

no military or political power.

A longstanding issue with hierarchical organizations is that if the

leader at the top of the pyramid is weak or incompetent, the entire

structure may be at some risk of failing. For hierarchical governments,

weak leadership may lead to revolutions or loss of territory to strong

neighbors.

For hierarchical business organizations, weak leadership at the top

tends to lead to loss of market share and perhaps to failure or bank-

ruptcy. Indeed analysis of the recent business failures from Enron

through Lehmann does indicate that the top of these hierarchies did

not have the competence and insight necessary to deal with serious

problems, or even to understand what the problems were.

It is an interesting business phenomenon that the life expectancy of a

hierarchical corporation is approximately equal to the life expectancies

of human beings. Very few companies live to be 100 years old. As the

global recession lengthens and deepens, a great many companies are

likely to expire, although some will expand and grow stronger.

A hierarchical organization has two broad classes of employees. One

of these classes consists of the workers or specialists who actually do

the work of the enterprise. The second class consists of the managers

and executives to whom the workers report. Of course, managers also

report to higher-level managers.

The distinction between technical work and managerial work is so

deeply embedded in hierarchical organizations that it has created two

very distinct career paths: management and technical work.

When starting out their careers, young employees almost always

begin as technical workers. For software, this means starting out as

software engineers, programmers, systems analysts, technical writers,

and the like. After a few years of employment, workers need to make

a career choice and either get promoted into management or stay with

technical work.

The choice is usually determined by personality and personal inter-

ests. Many people like technical work and never want to get into manage-

ment. Other people enjoy planning and coordination of group activities

and opt for a management career.

There is an imbalance in the numbers of managers and technical

workers. In most companies, the managerial community totals to about

15 percent of overall employment, while the technical workers total to

about 85 percent. Since managers are not usually part of the production

300

Chapter Five

process of the company, it is important not to have an excessive number

of managers and executives. Too many managers and executives tend to

degrade operational performance. This has been noted in both business

and military organizations.

It is interesting that up to a certain point, the compensation levels

of technical workers and managers are approximately the same. For

example, in most corporations, the top technical workers can have com-

pensation that equals third-line managers. However, at the very top of

corporations, there is a huge imbalance.

The CEOs of a number of corporations and some executive vice presi-

dents have compensation packages that are worth millions of dollars. In

fact, some executive compensation packages are more than 250 times the

compensation of the average worker within the company. As the global

recession deepens, these enormous executive compensation packages are

being challenged by both shareholders and government regulators.

Another topic that is beginning to be questioned is the span of control,

or the number of technical workers who report to one manager. For his-

torical reasons that are somewhat ambiguous, the average department

in the United States has about eight technical workers reporting to one

manager. The ranges observed run from two employees per manager to

about 30 employees per manager.

Assuming an average of eight technical workers per manager, then

about 12.5 percent of total employment would be in the form of first-line

managers. When higher-level managers are included, the overall total

is about 15 percent.

From analyzing appraisal scores and examining complaints against

managers in large corporations, it appears that somewhat less than

15 percent of the human population is qualified to be effective in man-

agement. In fact, only about 10 percent (or less) seem to be qualified to

be effective in management.

That being said, it might be of interest to study raising the average

span of control from 8 workers per manager up to perhaps 12 workers

per manager. Weeding out unqualified managers and restoring them to

technical work might improve overall efficiency and reduce the social

discomfort caused by poor management.

Practicing managers state that increasing the span of control would

lower their ability to control projects and understand the actual work of

their subordinates. However, time and motion studies carried out by the

author in large corporations such as IBM found that software managers

tended to spend more time in meetings with other managers than in dis-

cussions or meetings with their own employees. In fact, a possible law of

business is "managerial meetings are inversely proportional to the span

of control." The more managers on a given project, the more time they

spend with other managers rather than with their own employees.

Software Team Organization and Specialization

301

Another and more controversial aspect of this study had to do with

project failure rates, delays, and other mishaps. For large projects with

multiple managers, the failure rates seem to correlate more closely

to the number of managers involved with the projects than with the

number of software engineers and technical workers.

While the technical workers often managed to do their jobs and get

along with their colleagues in other departments, managerial efforts tend

to be diluted by power struggles and debates with other managers.

This study needs additional research and validation. However, it led

to the conclusion that increasing the span of control and reducing mana-

gerial numbers tends to raise the odds of a successful software project

outcome. This would especially be true if the displaced managers hap-

pened to be those of marginal competence for managerial work.

In many hierarchical departments with generalists, the same people

do both development and maintenance. It should be noted that if the

same software engineers are responsible for both development and

maintenance concurrently, it will be very difficult to estimate their

development work with accuracy. This is because maintenance work

involved with fixing high-severity defects tends to preempt software

development tasks and therefore disrupts development schedules.

Another topic of significance is that when exit interviews are reviewed

for technical workers, two troubling facts are noted: (1) technical work-

ers with the highest appraisal scores tend to leave in the largest num-

bers; and (2) the most common reason cited for leaving a company is

"I don't like working for bad management."

Another interesting phenomenon about management in hierarchical

organizations is termed "the Peter Principle" and needs to be mentioned

briefly. The Peter Principle was created by Dr. Lawrence J. Peter and

Raymond Hull in the 1968 book of the same name. In essence, the Peter

Principle holds that in hierarchical organizations, workers and manag-

ers are promoted based on their competence and continue to receive

promotions until they reach a level where they are no longer competent.

As a result, a significant percentage of older employees and managers

occupy jobs for which they are not competent.

The Peter Principle may be amusing (it was first published in a

humorous book), but given the very large number of cancelled software

projects and the even larger number of schedule delays and cost over-

runs, it cannot be ignored or discounted in a software context.

Assuming that the atomic unit of a hierarchical software organization

consists of eight workers who report to one manager, what are their

titles, roles, and responsibilities?

Normally, the hierarchical mode of organization is found in compa-

nies that utilize more generalists than specialists. Because software

specialization tends to increase with company size, the implication is

302

Chapter Five

that hierarchical organizations are most widely deployed for small to

midsize companies with small technical staffs. Most often, hierarchical

organizations are found in companies that employ between about 5 and

50 software personnel.

The primary job title in a hierarchical structure would be programmer

or software engineer, and such personnel would handle both develop-

ment and maintenance work.

However, the hierarchical organization is also found in larger companies

and in companies that do have specialists. In this case, an eight-person

department might have a staffing complement of five software engineers,

two testers, and a technical writer all reporting to the same manager.

Large corporations have multiple business units such as marketing,

sales, finance, human resources, manufacturing, and perhaps research.

Using hierarchical principles, each of these might have its own software

organization dedicated to building the software used by a specific business

unit; that is, financial applications, manufacturing support applications,

and so forth.

But what happens when some kind of a corporate or enterprise appli-

cation is needed that cuts across all business units? Cross-functional

applications turned out to be difficult in traditional hierarchical or

"stovepipe" organizations.

Two alternative approaches were developed to deal with cross-

functional applications. Matrix management was one, and it will be

discussed in the next section of this chapter. The second was enter-

prise resource planning (ERP) packages, which were created by large

software vendors such as SAP and Oracle to handle cross-functional

business applications.

As discussed in the next topic, the matrix-management organization

style is often utilized for software groups with extensive specializa-

tion and a need for cross-functional applications that support multiple

business units.

In the software world, hierarchical organizations are

Demographics

found most often in small companies that employ between perhaps

5 and 50 total software personnel. These companies tend to adopt a

generalist philosophy and have few specialists other than some tech-

nical skills such as network administration and technical writing. In

a generalist context, hierarchical organizations of about five to eight

software engineers reporting to a manager handle development, testing,

and maintenance activities concurrently.

The author estimates that there are about 10,000 such small compa-

nies in the United States. The number of software personnel working

under hierarchical organization structures is perhaps 250,000 in the

United States as of 2009.

Software Team Organization and Specialization

303

Hierarchical structures are also found in some large companies, so

perhaps another 500,000 people work in hierarchical structures inside

large companies and government agencies.

The average size of new applications done by hierarchical

Project size

teams with eight employees and a manager is about 2,000 function

points. However, one of the characteristics of hierarchical organizations

is that they can cooperate on large projects, so even large systems in

excess of 100,000 function points can be handled by multiple depart-

ments working in concert.

The caveat with multiple departments attempting to cooperate is

that when more than about a dozen are involved simultaneously, some

kind of project office may be utilized for overall planning and coordina-

tion. Some of the departments involved may handle integration, testing,

configuration control, quality assurance, technical writing, and other

specialized topics.

Productivity rates for hierarchical departments on

Productivity rates

projects of 2,000 function points are usually in the range of 12 func-

tion points per staff month. They sometimes top 20 function points per

staff month for applications where the team has significant expertise,

and may drop below 10 function points per staff month for unusual

or complex projects. Productivity tends to be inversely proportional to

application size and declines as applications grow larger.

Development schedules for new development by a single

Schedules

hierarchical group with eight team members working on a 2,000

function point project usually range between about 14 months and

24 months and would average perhaps 18 calendar months for the

entire application.

The quality levels for hierarchical departments are fairly aver-

Quality

age. Defect potentials run to about 5.0 bugs per function point, and

defect removal efficiency is about 85 percent. Delivered defects would

average about 0.75 per function point.

Therefore, an application of 2,000 function points developed by a

single hierarchical department would have a total of about 10,000 bugs,

of which 1,500 would still be present at release. Of these, about 225

might be serious bugs.

However, if pretest inspections are used, and if tools such as auto-

mated static analysis and automated testing are used, then defect

removal efficiency can approach 97 percent. In this situation, only

about 300 bugs might be present at release. Of these, perhaps 40 might

be serious.

304

Chapter Five

There are few studies to date on the role of specialization

Specialization

in hierarchical software organization structures. Because of common

gaps in the training and experience of generalists, some kinds of special-

ization are needed for large applications. The kinds of specialists that

might be useful would be security specialists, test specialists, quality

assurance specialists, database specialists, user-interface specialists,

network specialists, performance specialists, and technical writers.

Cautions and counter indications The main caution about hierarchical

organization structures is that software work tends to be artificially

divided to match the abilities of eight-person departments, rather than

segmented based on the architecture and design of the applications

themselves. As a result, some large functions in large systems are arbi-

trarily divided between two or more departments when they should be

handled by a single group.

While communication within a given department is easy and sponta-

neous, communication between departments tends to slow down due to

managers guarding their own territories. Thus, for large projects with

multiple hierarchical departments, there are high probabilities of power

struggles and disruptive social conflicts, primarily among the manage-

ment community.

Conclusions The literature on hierarchical organizations is interesting

but incomplete. Much of the literature is produced by enthusiasts for

alternate forms of organization structures such as matrix management,

Agile teams, pair programming, clean-room development, and the like.

Hierarchical organizations have been in continuous use for software

applications since the industry began. While that fact might seem to

indicate success, it is also true that the software industry has been

characterized by having higher rates of project failures, cost overruns,

and schedule overruns than any other industry. The actual impact of

hierarchical organizations on software success or software failure is still

somewhat ambiguous as of 2009.

Other factors such as methods, employee skills, and management

skills tend to be intertwined with organization structures, and this

makes it hard to identify the effect of the organization itself.

Conventional Departments with Matrix

Organization Structures

The history of matrix management is younger than the history of soft-

ware development itself. The early literature on matrix management

seemed to start around the late 1960s, when it was used within NASA

for dealing with cross-functional projects associated with complex space

programs.

Software Team Organization and Specialization

305

The idea of matrix management soon moved from NASA into the

civilian sector and was eventually picked up by software organizations

for dealing with specialization and cross-functional applications.

In a conventional hierarchical organization, software personnel of

various kinds report to managers within a given business unit. The

technical employees may be generalists, or the departments may include

various specialists too, such as software engineers, testers, and techni-

cal writers. If a particular business unit has ten software departments,

each of these departments might have a number of software engineers,

testers, technical writers, and so forth.

By contrast, in a matrix organization, various occupation groups and

specialists report to a skill or career manager. Thus all technical writers

might report to a technical publications group; all software engineers

might be in a software engineering group; all testers might be in a test

services group; and so forth.

By consolidating various kinds of knowledge workers within skill-

based organizations, greater job enrichment and more career opportu-

nities tend to occur than when specialists are isolated and fragmented

among multiple hierarchical departments.

Under a matrix organization, when specialists are needed for vari-

ous projects, they are assigned to projects and report temporarily to

the project managers for the duration of the projects. This of course

introduces the tricky concept of employees working for two managers

at the same time.

One of the managers (usually the skill manager) has appraisal and

salary authority over specialist employees, while the other (usually

the project manager) uses their services for completing the project.

The project managers may provide inputs to the skill managers about

job performance.

The manager with appraisal and salary authority over employees is

said to have solid line reporting authority. The manager who merely

borrows the specialists for specific tasks or a specific project is said to

have dotted line authority. These two terms reflect the way organization

charts are drawn.

It is an interesting phenomenon that matrix management is new

enough so that early versions of SAP, Oracle, and some other enterprise

resource planning (ERP) applications did not support dotted-line or

matrix organization structures. As of 2009, all ERP packages now sup-

port matrix organization diagrams.

The literature on matrix management circa 2009 is very strongly

polarized between enthusiasts and opponents. About half of the books

and articles regard matrix management as a major business achieve-

ment. The other half of the books and articles regard matrix manage-

ment as confusing, disruptive, and a significant business liability.

306

Chapter Five

A Google search of the phrase "failures of matrix management"

returned 315,000 citations, while a search of the phrase "successes of

matrix management" returned 327,000 citations. As can be seen, this is

a strong polarization of opinion that is almost evenly divided.

Over the years, three forms of matrix organization have surfaced

called weak matrix, strong matrix, and balanced matrix.

The original form of matrix organization has now been classified

as a weak matrix. In this form of organization, the employees report

primarily to a skill manager and are borrowed by project managers on

an as-needed basis. The project managers have no appraisal author-

ity or salary authority over the employees and therefore depend upon

voluntary cooperation to get work accomplished. If there are conflicts

between the project managers and the skill managers in terms of

resource allocations, the project managers lack the authority to acquire

the skills their projects may need.

Because weak matrix organizations proved to be troublesome, the

strong matrix variation soon appeared. In a strong matrix, the special-

ists may still report to a skill manager, but once assigned to a project,

the needs of the project take precedence. In fact, the specialists may

even be formally assigned to the project manager for the duration of

the project and receive appraisals and salary reviews.

In a balanced matrix, responsibility and authority are nominally

equally shared between the skill manager and the project manager. While

this sounds like a good idea, it has proven to be difficult to accomplish.

As a result, the strong matrix form seems to be dominant circa 2009.

Demographics In the software world, matrix organizations are found

most often in large companies that employ between perhaps 1,000 and

50,000 total software personnel. These large companies tend to have

scores of specialized skills and hundreds of projects going on at the

same time.

The author estimates that there are about 250 such large companies

in the United States with primarily matrix organization. The number

of software personnel working under matrix organization structures is

perhaps 1 million in the United States as of 2009.

Project size The average size of new applications done by matrix teams

with eight employees and a manager is about 2,000 function points.

However, matrix organizations can be scaled up to any arbitrary size, so

even large systems in excess of 100,000 function points can be handled

by multiple matrix departments working in concert.

The caveat with multiple departments attempting to cooperate is that

when more than about a dozen are involved simultaneously, some kind

of a project office may be needed for overall planning and coordination.

Software Team Organization and Specialization

307

With really large applications in excess of 25,000 function points,

some of the departments may be fully staffed by specialists who handle

topics such as integration, testing, configuration control, quality assur-

ance, technical writing, and other specialized topics.

Productivity rates Productivity rates for matrix departments on projects

of 2,000 function points are usually in the range of 10 function points

per staff month. They sometimes top 16 function points per staff month

for applications where the team has significant expertise, and may drop

below 6 function points per staff month for unusual or complex projects.

Productivity tends to be inversely proportional to application size and

declines as applications grow larger.

Schedules Development schedules for new development by a single

matrix group with eight team members working on a 2,000function

point project usually ranges between about 16 months and 28 months and

would average perhaps 18 calendar months for the entire application.

Quality The quality levels for matrix organizations often are average.

Defect potentials run to about 5.0 bugs per function point, and defect

removal efficiency is about 85 percent. Delivered defects would average

about 0.75 per function point. Matrix and hierarchical organizations are

identical in quality, unless special methods such as formal inspections,

static analysis, automated testing, and other state-of-the-art approaches

have been introduced.

Therefore, an application of 2,000 function points developed by a

single matrix department might have a total of about 10,000 bugs, of

which 1,500 would still be present at release. Of these, about 225 might

be serious bugs.

However, if pretest inspections are used, and if tools such as automated

static analysis and automated testing are used, then defect removal effi-

ciency can approach 97 percent. In this situation, only about 300 bugs

might be present at release. Of these, perhaps 40 might be serious.

As application sizes increase, defect potentials also increase, while

defect removal efficiency levels decline.

Specialization The main purpose of the matrix organization structure is

to support specialization. That being said, there are few studies to date

on the kinds of specialization in matrix software organization structures.

As of 2009, topics such as the numbers of architects needed, the number

of testers needed, and the number of quality assurance personnel needed

for applications of various sizes remains ambiguous.

Typical kinds of specialization are usually needed for large applica-

tions. The kinds of specialists that might be useful would be security

specialists, test specialists, quality assurance specialists, database

308

Chapter Five

specialists, user-interface specialists, network specialists, perfor-

mance specialists, and technical writers.

Cautions and counter indications The main caution about matrix organi-

zation structures is that of political disputes between the skill managers

and the project managers.

Another caution, although hard to evaluate, is that roughly half of the

studies and literature about matrix organization assert that the matrix

approach is harmful rather than beneficial. The other half, however,

says the opposite and claims significant value from matrix organiza-

tions. But any approach with 50 percent negative findings needs to be

considered carefully and not adopted blindly.

A common caution for both matrix and hierarchical organizations is

that software work tends to be artificially divided to match the abilities

of eight-person departments, rather than segmented based on the archi-

tecture and design of the applications. As a result, some large functions in

large systems are arbitrarily divided between two or more departments

when they should be handled by a single group.

While technical communication within a given department is easy

and spontaneous, communication between departments tends to slow

down due to managers guarding their own territories. Thus, for large

projects with multiple hierarchical or matrix departments, there are

high probabilities of power struggles and disruptive social conflicts,

primarily among the management community.

Conclusions The literature on matrix organizations is so strongly polar-

ized that it is hard to find a consensus. With half of the literature praising

matrix organizations and the other half blaming them for failures and

disasters, it is not easy to find solid empirical data that is convincing.

From observations made during litigation for projects that failed or

never operated successfully, there seems to be little difference between

hierarchical and matrix organizations. Both matrix and hierarchical

organizations end up in court about the same number of times.

What does make a difference is the competence of the managers and

technical staff, and the emphasis on effective quality control and change

management control. Effective estimating and careful progress tracking

also make a difference, but none of these factors are directly related to

either the hierarchical or matrix organization styles.

Specialist Organizations in Large Companies

Because development software engineers are not the only or even the

largest occupation group in big companies and government agencies,

it is worthwhile to consider what kinds of organizations best serve the

needs of the most common occupation groups.

Software Team Organization and Specialization

309

In approximate numerical order by numbers of employees, the major

specialist occupations would be

1. Maintenance software engineers

2. Test personnel

3. Business analysts and systems analysts

4. Customer support personnel

5. Quality assurance personnel

6. Technical writing personnel

7. Administrative personnel

8. Configuration control personnel

9. Project office staff

Estimating specialists

■

Planning specialists

■

Measurement and metrics specialists

■

Scope managers

■

Process improvement specialists

■

Standards specialists

■

Many other kinds of personnel perform technical work such as net-

work administration, operating data centers, repair of workstations and

personal computers, and other activities that center around operations

rather than software. These occupations are important, but are outside

the scope of this book.

Following are discussion of organization structures for selected

specialist groups.

Software Maintenance Organizations

For small companies with fewer than perhaps 50 software personnel,

maintenance and development are usually carried out by the same

people, and there are no separate maintenance groups. For that matter,

some forms of customer support may also be tasked to the software

engineering community in small companies.

However, as companies grow larger, maintenance specialization tends

to occur. For companies with more than about 500 software personnel,

maintenance groups are the norm rather than the exception.

(Note: The International Software Benchmarking Standards Group

(ISBSG) has maintenance benchmark data available for more than

310

Chapter Five

400 projects and is adding new data monthly. Refer to www.ISBSG.org

for additional information.)

The issue of separating maintenance from development has both

detractors and adherents.

The detractors of separate maintenance groups state that separating

maintenance from development may require extra staff to become famil-

iar with the same applications, which might artificially increase overall

staffing. They also assert that if enhancements and defect repairs are

taking place at the same time for the same applications and are done by

two different people, the two tasks might interfere with each other.

The adherents of separate maintenance groups assert that because

bugs occur randomly and in fairly large numbers, they interfere with

development schedules. If the same person is responsible for adding a

new feature to an application and for fixing bugs, and suddenly a high-

severity bug is reported, fixing the bug will take precedence over doing

development. As a result, development schedules will slip and probably

slip so badly that the ROI of the application may turn negative.

Although both sets of arguments have some validity, the author's

observations support the view that separate maintenance organizations

are the most useful for larger companies that have significant volumes

of software to maintain.

Separate maintenance teams have higher productivity rates in find-

ing and fixing problems than do developers. Also, having separate main-

tenance change teams makes development more predictable and raises

development productivity.

Some maintenance groups also handle small enhancements as well

as defect repairs. There is no exact definition of a "small enhancement,"

but a working definition is an update that can be done by one person in

less than one week. That would limit the size of small enhancements to

about 5 or fewer function points.

Although defect repairs and enhancements are the two most common

forms of maintenance, there are actually 23 different kinds of mainte-

nance work performed by large organizations, as shown in Table 5-2.

Although the 23 maintenance topics are different in many respects,

they all have one common feature that makes a group discussion pos-

sible: they all involve modifying an existing application rather than

starting from scratch with a new application.

Each of the 23 forms of modifying existing applications has a dif-

ferent reason for being carried out. However, it often happens that

several of them take place concurrently. For example, enhancements

and defect repairs are very common in the same release of an evolving

application.

The maintenance literature has a number of classifications for main-

tenance tasks such as "adaptive," "corrective," or "perfective." These seem

Software Team Organization and Specialization

311

TABLE 5-2

Twenty-Three Kinds of Maintenance Work

Major enhancements (new features of greater than 20 function points)

Minor enhancements (new features of less than 5 function points)

Maintenance (repairing defects for good will)

Warranty repairs (repairing defects under formal contract)

Customer support (responding to client phone calls or problem reports)

Error-prone module removal (eliminating very troublesome code segments)

Mandatory changes (required or statutory changes)

Complexity or structural analysis (charting control flow plus complexity metrics)

Code restructuring (reducing cyclomatic and essential complexity)

10.

Optimization (increasing performance or throughput)

11.

Migration (moving software from one platform to another)

12.

Conversion (changing the interface or file structure)

13.

Reverse engineering (extracting latent design information from code)

14.

Reengineering (transforming legacy applications to modern forms)

15.

Dead code removal (removing segments no longer utilized)

16.

Dormant application elimination (archiving unused software)

17.

Nationalization (modifying software for international use)

18.

Mass updates such as Euro or Year 2000 repairs

19.

Refactoring, or reprogramming applications to improve clarity

20.

Retirement (withdrawing an application from active service)

21.

Field service (sending maintenance members to client locations)

22.

Reporting bugs or defects to software vendors

23.

Installing updates received from software vendors

to be classifications that derive from academia. While there is nothing

wrong with them, they manage to miss the essential point. Maintenance

overall has only two really important economic distinctions:

1. Changes that are charged to and paid for by customers (enhance-

ments)

2. Changes that are absorbed by the company that built the software

(bug repairs)

Whether a company uses standard academic distinctions of mainte-

nance activities or the more detailed set of 23 shown here, it is important

to separate costs into the two buckets of customer-funded or self-funded

expenses.

Some companies such as Symantec charge customers for service

calls, even for reporting bugs. The author regards such charges as being

unprofessional and a cynical attempt to make money out of incompetent

quality control.

312

Chapter Five

There are also common sequences or patterns to these modification

activities. For example, reverse engineering often precedes reengineer-

ing, and the two occur so often together as to almost constitute a linked

set. For releases of large applications and major systems, the author

has observed from six to ten forms of maintenance all leading up to the

same release.

In recent years, the Information Technology Infrastructure Library

(ITIL) has had a significant impact on maintenance, customer sup-

port, and service management in general. The ITIL is a rather large

collection of more than 30 books and manuals that deal with service

management, incident reporting, change teams, reliability criteria,

service agreements, and a host of other topics. As this book is being

written in 2009, the third release of the ITIL is under way.

It is an interesting phenomenon of the software world that while

ITIL has become a major driving force in service agreements within

companies for IT service, it is almost never used by commercial vendors

such as Microsoft and Symantec for agreements with their customers.

In fact, it is quite instructive to read the small print in the end-user

license agreements (EULAs) that are always required prior to using

the software.

When these agreements are read, it is disturbing to see clauses that

assert that the vendors have no liabilities whatsoever, and that the

software is not guaranteed to operate or to have any kind of quality

levels.

The reason for these one-sided EULA agreements is that software

quality control is so bad that even major vendors would go bankrupt if

sued for the damages that their products can cause.

For many IT organizations and also for commercial software groups,

a number of functions are joined together under a larger umbrella: cus-

tomer support, maintenance (defect repairs), small enhancements (less

than 5 function points), and sometimes integration and configuration

control.

In addition, several forms of maintenance work deal with software

not developed by the company itself:

1. Maintenance of commercial applications such as those acquired

from SAP, Oracle, Microsoft, and the like. The maintenance tasks

here involve reporting bugs, installing new releases, and possibly

making custom changes for local conditions.

2. Maintenance of open-source and freeware applications such as

Firefox, Linux, Google, and the like. Here, too, the maintenance

tasks involve reporting bugs and installing new releases, plus cus-

tomization as needed.

Software Team Organization and Specialization

313

3. Maintenance of software added to corporate portfolios via mergers

or acquisitions with other companies. This is a very tricky situa-

tion that is fraught with problems and hazards. The tasks here can

be quite complex and may involve renovation, major updates, and

possibly migration from one database to another.

In addition to normal maintenance, which combines defect repairs

and enhancements, legacy applications may undergo thorough and

extensive modernization, called renovation.

Software renovation can include surgical removal of error-prone

modules, automatic or manual restructuring to reduce complexity,

revision or replacement of comments, removal of dead code segments,

and possibly even automatic conversion of the legacy application

from old or obsolete programming languages into newer program-

ming languages.

Renovation may also include data mining to extract business rules

and algorithms embedded in the code but missing from specifications

and written descriptions of the code. Static analysis and automatic test-

ing tools may also be included in renovation. Also, it is now possible to

generate function point totals for legacy applications automatically, and

this may also occur as part of renovation activities.

The observed effect of software renovation is to stretch out the useful

life of legacy applications by an additional ten years. Renovation reduces

the number of latent defects in legacy code, and therefore reduces future

maintenance costs by about 50 percent per calendar year for the applica-

tions renovated. Customer support costs are also reduced.

As the recession deepens and lengthens, software renovation will

become more and more valuable as a cost-effective alternative to retir-

ing legacy applications and redeveloping them. The savings accrued

from renovation could reduce maintenance costs so significantly

that redevelopment could occur using the savings that accrue from

renovation.

If a company does plan to renovate legacy applications, it is appro-

priate to fix some of the chronic problems that no doubt are present in

the original legacy code. The most obvious of these would be to remove

security vulnerabilities, which tend to be numerous in legacy applications.

The second would be to improve quality by using inspections, static

analysis, automated testing, and other modern techniques such as TSP

during renovations.

A combination of the Team Software Process (TSP), the Caja security

architecture from Google, and perhaps the E programming language, which

is more secure than most languages, might be considered for renovating

applications that deal with financial or valuable proprietary data.

314

Chapter Five

For predicting the staffing and effort associated with software main-

tenance, some useful rules of thumb have been developed based on

observations of maintenance groups in companies such as IBM, EDS,

Software Productivity Research, and a number of others.

Maintenance assignment scope = the amount of software that one

maintenance programmer can successfully maintain in a single calen-

dar year. The U.S. average as of 2009 is about 1,000 function points. The

range is between a low of about 350 function points and a high of about

5,500 function points. Factors that affect maintenance assignment scope

include the experience of the maintenance team, the complexity of the

code, the number of latent bugs in the code, the presence or absence of

"error-prone modules" in the code, and the available tool suites such as

static analysis tools, data mining tools, and maintenance workbenches.

This is an important metric for predicting the overall number of main-

tenance programmers needed.

(For large applications, knowledge of the internal structure is vital

for effective maintenance and modification. Therefore, major systems

usually have their own change teams. The number of maintenance pro-

grammers in such a change team can be calculated by dividing the size

of the application in function points by the appropriate maintenance

assignment scope, as shown in the previous paragraph.)

Defect repair rates = the average number of bugs or defects that

a maintenance programmer can fix in a calendar month of 22 working

days. The U.S. average is about 10 bugs repaired per calendar month.

The range is from fewer than 5 to about 17 bugs per staff month. Factors

that affect this rate include the experience of the maintenance program-

mer, the complexity of the code, and "bad-fix injections," or new bugs

accidentally injected into the code created to repair a previous bug. The

U.S. average for bad-fix injections is about 7 percent.

Renovation productivity = the average number of function points

per staff month for renovating software applications using a full suite

of renovation support tools. The U.S. average is about 65 function points

per staff month. The range is from a low of about 25 function points per

staff month for highly complex applications in obscure languages to

more than 125 function points per staff month for applications of mod-

erate complexity in fairly modern languages. Other factors that affect

this rate include the overall size of the applications, the presence or

absence of "error-prone modules" in the application, and the experience

of the renovation team.

(Manual renovation without automated support is much more dif-

ficult, and hence productivity rates are much lower--in the vicinity of

14 function points per staff month. This is somewhat higher than new

development, but still close to being marginal in terms of return on

investment.)

Software Team Organization and Specialization

315

Software does not age gracefully. Once software is put into production,

it continues to change in three important ways:

1. Latent defects still present at release must be found and fixed after

deployment.

2. Applications continue to grow and add new features at a rate of

between 5 percent and 10 percent per calendar year, due either to

changes in business needs, or to new laws and regulations, or both.

3. The combination of defect repairs and enhancements tends to

gradually degrade the structure and increase the complexity of

the application. The term for this increase in complexity over

time is called entropy. The average rate at which software entropy

increases is about 1 percent to 3 percent per calendar year.

A special problem with software maintenance is caused by the

fact that some applications use multiple programming languages.

As many as 15 different languages have been found within a single

large application.

Multiple languages are troublesome for maintenance because they

add to the learning chores of the maintenance teams. Also some (or all)

of these language may be "dead" in the sense that there are no longer

working compilers or interpreters. This situation chokes productivity

and raises the odds of bad-fix injections.

Because software defect removal and quality control are imperfect,

there will always be bugs or defects to repair in delivered software appli-

cations. The current U.S. average for defect removal efficiency is only

about 85 percent of the bugs or defects introduced during development.

This has been the average for more than 20 years.

The actual values are about 5 bugs per function point created during

development. If 85 percent of these are found before release, about

0.75 bug per function point will be released to customers.

For a typical application of 1,000 function points or 100,000 source

code statements, that implies about 750 defects present at delivery.

About one fourth, or 185 defects, will be serious enough to stop the

application from running or will create erroneous outputs.

Since defect potentials tend to rise with the overall size of the appli-

cation, and since defect removal efficiency levels tend to decline with

the overall size of the application, the overall volume of latent defects

delivered with the application rises with size. This explains why super-

large applications in the range of 100,000 function points, such as

Microsoft Windows and many enterprise resource planning (ERP)

applications, may require years to reach a point of relative stability.

These large systems are delivered with thousands of latent bugs or

defects.

316

Chapter Five

Of course, average values are far worse than best practices. A com-

bination of formal inspections, static analysis, and automated testing

can bring cumulative defect removal efficiency levels up to 99 percent.

Methods such as the Team Software Process (TSP) can lower defect

potentials down below 3.0 per function point.

Unless very sophisticated development practices are followed, the first

year of the release of a new software application will include a heavy

concentration of defect repair work and only minor enhancements.

However, after a few years, the application will probably stabilize as

most of the original defects are found and eliminated. Also after a few

years, new features will increase in number.

As a result of these trends, maintenance activities will gradually

change from the initial heavy concentration on defect repairs to a longer-

range concentration on new features and enhancements.

Not only is software deployed with a significant volume of latent

defects, but the phenomenon of bad-fix injection has been observed

for more than 50 years. Roughly 7 percent of all defect repairs will

contain a new defect that was not there before. For very complex and

poorly structured applications, these bad-fix injections have topped

20 percent.

Even more alarming, once a bad fix occurs, it is very difficult to cor-

rect the situation. Although the U.S. average for initial bad-fix injection

rates is about 7 percent, the secondary injection rate against previous

bad fixes is about 15 percent for the initial repair and 30 percent for the

second. A string of up to five consecutive bad fixes has been observed,

with each attempted repair adding new problems and failing to correct

the initial problem. Finally, the sixth repair attempt was successful.

In the 1970s, the IBM Corporation did a distribution analysis of

customer-reported defects against their main commercial software

applications. The IBM personnel involved in the study, including the

author, were surprised to find that defects were not randomly distrib-

uted through all of the modules of large applications.

In the case of IBM's main operating system, about 5 percent of the

modules contained just over 50 percent of all reported defects. The most

extreme example was a large database application, where 31 modules

out of 425 contained more than 60 percent of all customer-reported bugs.

These troublesome areas were known as error-prone modules.

Similar studies by other corporations such as AT&T and ITT found

that error-prone modules were endemic in the software domain. More

than 90 percent of applications larger than 5,000 function points were

found to contain error-prone modules in the 1980s and early 1990s.

Summaries of the error-prone module data from a number of companies

were published in the author's book Software Quality: Analysis and

Guidelines for Success.

Software Team Organization and Specialization

317

Fortunately, it is possible to surgically remove error-prone modules

once they are identified. It is also possible to prevent them from occur-

ring. A combination of defect measurements, formal design inspections,

formal code inspections, and formal testing and test-coverage analysis

have proven to be effective in preventing error-prone modules from

coming into existence.

Today, in 2009, error-prone modules are almost nonexistent in organiza-

tions that are higher than level 3 on the capability maturity model (CMM)

of the Software Engineering Institute. Other development methods such

as the Team Software Process (TSP) and Rational Unified Process (RUP)

are also effective in preventing error-prone modules. Several forms of

Agile development such as extreme programming (XP) also seem to be

effective in preventing error-prone modules from occurring.

Removal of error-prone modules is a normal aspect of renovating

legacy applications, so those software applications that have under-

gone renovation will have no error-prone modules left when the work

is complete.

However, error-prone modules remain common and troublesome for

CMMI level 1 organizations. They are also alarmingly common in legacy

applications that have not been renovated and that are maintained

without careful measurement of defects.

Once deployed, most software applications continue to grow at annual

rates of between 5 percent and 10 percent of their original functionality.

Some applications, such as Microsoft Windows, have increased in size

by several hundred percent over a ten-year period.

The combination of continuous growth of new features coupled with

continuous defect repairs tends to drive up the complexity levels of aging

software applications. Structural complexity can be measured via met-

rics such as cyclomatic and essential complexity using a number of com-

mercial tools. If complexity is measured on an annual basis and there

is no deliberate attempt to keep complexity low, the rate of increase is

between 1 percent and 3 percent per calendar year.

However, and this is important, the rate at which entropy or com-

plexity increases is directly proportional to the initial complexity of the

application. For example, if an application is released with an average

cyclomatic complexity level of less than 10, it will tend to stay well struc-

tured for at least five years of normal maintenance and enhancement

changes.

But if an application is released with an average cyclomatic com-

plexity level of more than 20, its structure will degrade rapidly, and its

complexity levels might increase by more than 2 percent per year. The

rate of entropy and complexity will even accelerate after a few years.

As it happens, both bad-fix injections and error-prone modules tend to

correlate strongly (although not perfectly) with high levels of complexity.

318

Chapter Five

A majority of error-prone modules have cyclomatic complexity levels

of 10 or higher. Bad-fix injection levels for modifying high-complexity

applications are often higher than 20 percent.

Here, too, renovation can reverse software entropy and bring cyclo-

matic complexity levels down below 10, which is the maximum safe

level of code complexity.

There are several difficulties in exploring software maintenance costs

with accuracy. One of these difficulties is the fact that maintenance

tasks are often assigned to development personnel who interleave both

development and maintenance as the need arises. This practice makes

it difficult to distinguish maintenance costs from development costs,

because the programmers are often rather careless in recording how

time is spent.

Another and very significant problem is that a great deal of software

maintenance consists of making very small changes to software appli-

cations. Quite a few bug repairs may involve fixing only a single line of

code. Adding minor new features, such as perhaps a new line-item on a

screen, may require fewer than 50 source code statements.

These small changes are below the effective lower limit for counting

function point metrics. The function point metric includes weighting

factors for complexity, and even if the complexity adjustments are set to

the lowest possible point on the scale, it is still difficult to count function

points below a level of perhaps 15 function points.

An experimental method called micro function points has been devel-

oped for small maintenance changes and bug repairs. This method is

similar to standard function points, but drops down to three decimal

places of precision and so can deal with fractions of a single function

point.

Of course, the work of making a small change measured with micro

function points may be only an hour or less. But in large companies,

where as many as 20,000 such changes are made in a year, the cumula-

tive costs are not trivial. Micro function points are intended to eliminate

the problem that small maintenance updates have not been subject to

formal economic analysis.

Quite a few maintenance tasks involve changes that are either a frac-

tion of a function point, or may at most be fewer than 5 function points

or about 250 Java source code statements. Although normal counting

of function points is not feasible for small updates, and micro function

points are still experimental, it is possible to use the backfiring method

of converting counts of logical source code statements into equivalent

function points. For example, suppose an update requires adding 100

Java statements to an existing application. Since it usually takes about

50 Java statements to encode 1 function point, it can be stated that this

small maintenance project is about 2 function points in size.

Software Team Organization and Specialization

319

Because of the combination of 23 separate kinds of maintenance work

mixed with both large and small updates, maintenance effort is harder to

estimate and harder to measure than in conventional software develop-

ment. As a result, there are many fewer maintenance benchmarks than

development benchmarks. In fact, there is much less reliable information

about maintenance than about almost any other aspect of software.

Maintenance activities are frequently outsourced to either domestic or

offshore outsource companies. For a variety of business reasons, main-

tenance outsource contracts seem to be more stable and less likely to

end up in court than software development contracts.

The success of maintenance outsource contracts is because of two

major factors:

1. Small maintenance changes do not have the huge cost and schedule

slippage rates associated with major development projects.

2. Small maintenance changes to existing software almost never fail

completely. A significant number of development projects do fail and

are never completed at all.

There may be other reasons as well, but the fact remains that main-

tenance outsource contracts seem more stable and less likely to end up

in court than development outsource contracts.

Maintenance is the dominant work of the software industry in 2009

and will probably stay the dominant activity for the indefinite future.

For software, as with many other industries, once the industry passes

50 years of age, more workers are involved with repairing existing prod-

ucts than there are workers involved with building new products.

Demographics In the software world, separate maintenance organiza-

tions are found most often in large companies that employ between

perhaps 500 and 50,000 total software personnel.

The author estimates that there are about 2,500 such large compa-

nies the United States with separate maintenance organizations. The

number of software personnel working on maintenance in maintenance

organizations is perhaps 800,000 in the United States as of 2009. (The

number of software personnel who perform both development and main-

tenance is perhaps 400,000.)

The average size of software defects is less than 1 function

Project size

point, which is why micro-function points are needed. Enhancements

or new features typically range from a low of perhaps 5 function points

to a high of perhaps 500 function points. However, there are so many

enhancements, that software applications typically grow at a rate of

around 8 percent per calendar year for as long as they are being used.

320

Chapter Five

Productivity rates for defect repairs are only about

Productivity rates

10 function points per staff month, due to the difficulty of finding the

exact problem, plus the need for regression testing and constructing

new releases. Another way of expressing defect repair productivity is to

use defects or bugs fixed per month, and a typical value would be about

10 bugs per staff month.

The productivity rates for enhancements average about 15 function

points per staff month, but vary widely due to the nature and size of the

enhancement, the experience of the team, the complexity of the code,

and the rate at which requirements change during the enhancement.

The range for enhancements can be as low as about 5 function points

per staff month, or as high as 35 function points per staff month.

Development schedules for defect repairs range from a few

Schedules

hours to a few days, with one major exception. Defects that are abeyant,

or cannot be replicated by the change teams, may take weeks to repair

because the internal version of the application used by the change team

may not have the defect. It is necessary to get a great deal more infor-

mation from users in order to isolate abeyant defects.

Fixing a bug is not the same as issuing a new release. Within some

companies such as IBM, maintenance schedules in the sense of defect

repairs vary with the severity level of the bugs reported; that is, severity

1 bugs (most serious), about 1 week; severity 2 bugs, about two weeks;

severity 3 bugs, next release; severity 4 bugs, next release or whenever

it is convenient.

Development schedules for enhancements usually run from about

1 month up to 9 months. However, many companies have fixed release

intervals that aggregate a number of enhancements and defect repairs

and release them at the same time. Microsoft "service packs" are one

example, as are the intermittent releases of Firefox. Normally, fixed

release intervals are either every six months or once a year, although

some may be quarterly.

Quality The main quality concerns for maintenance or defect repairs

are threefold: (1) higher defect potentials for maintenance and enhance-

ments than for new development, (2) the presence or absence of error-

prone modules in the application, and (3) the bad-fix injection rates for

defect repairs, which average about 7 percent.

Maintenance and enhancement defect potentials are higher than for

new development and run to about 6.0 bugs per function point. Defect

removal efficiency is usually lower than for new development and is only

about 83 percent. As a result, delivered defects would average about

1.08 per function point.

Software Team Organization and Specialization

321

An additional quality concern that grows slowly worse over a period

of years is that application complexity (as measured by cyclomatic com-

plexity) slowly increases because changes tend to degrade the original

structure. As a result, each year, defect potentials may be slightly higher

than the year before, while bad-fix injections may increase. Unless the

application is renovated, these problems tend to become so bad that

eventually the application can no longer be safely modified.

In addition to renovation, other approaches such as formal inspections

for major enhancements and significant defect repairs, static analysis,

and automatic testing can raise defect removal efficiency levels above

95 percent. However, bad-fix injections and error-prone modules are

still troublesome.

The main purpose of the maintenance organization

Specialization

structures is to support maintenance specialization. While not every-

one enjoys maintenance, it happens that quite a few programmers and

software engineers do enjoy it.

Other specialist work in a maintenance organization includes inte-

gration and configuration control. Maintenance software engineers

normally do most of the testing on small updates and small enhance-

ments, although formal test organizations may do some specialized

testing such as system testing prior to a major release.

Curiously, software quality assurance (SQA) is seldom involved

with defect repairs and minor enhancements carried out by main-

tenance groups. However, SQA specialists usually do work on major

enhancements.

Technical writers don't have a major role in software maintenance,

but may occasionally be involved if enhancements trigger changes in

user manuals or HELP text.

That being said, few studies to date deal with either personality or

technical differences between successful maintenance programmers and

successful development programmers.

The main caution about maintenance

Cautions and counter indications

specialization and maintenance organizations is that they tend to lock

personnel into narrow careers, sometimes limited to repairing a single

application for a period of years. There is little chance of career growth

or knowledge expansion if a software engineer spends years fixing bugs

in a single software application. Occasionally, switching back and forth

from maintenance to development is a good practice for minimizing

occupational boredom.

The literature on maintenance organizations is very

Conclusions

sparse compared with the literature on development. Although there are

322

Chapter Five

some good books, there are few long-range studies that show application

growth, entropy increase, and defect trends over multiple years.

Given that software maintenance is the dominant activity of the

software industry in 2009, a great deal more research and study are

indicated. Research is needed on data mining of legacy applications to

extract business rules; on removing security vulnerabilities from legacy

code; on the costs and value of software renovation; and on the applica-

tion of quality control methods such as inspections, static analysis, and

automated testing to legacy code.

Customer Support Organizations

In small companies with few software applications and few custom-

ers or application users, support may be carried out on an informal

basis by the development team itself. However, as numbers of customers

increase and numbers of applications needing support increase, a point

will soon be reached where a formal customer support organization will

be needed.

Informal rules of thumb for customer support indicate that customer

support staffing is dependent on three variables:

1. Number of customers

2. Number of latent bugs or defects in released software

3. Application size measured in terms of function points or lines

of code

One full-time customer support person would probably be needed for

applications that meet these criteria: 150 customers, 500 latent bugs in

the software (75 serious bugs), and 10,000 function points or 500,000

source code statements in a language such as Java.

The most effective known method for improving customer support is to

achieve much better application quality levels than are typical today in

2009. Every reduction of about 220 latent defects at delivery can reduce

customer support staffing needs by one person. This is based on the

assumption that customer support personnel speak to about 30 custom-

ers per day, and each released defect is encountered by 30 customers.

Therefore, each released defect occupies one day for one customer sup-

port staff member, and there are 220 working days per year.

Some companies attempt to reduce customer support costs by charg-

ing for support calls, even to report bugs in the applications! This is

an extremely bad business practice that primarily offends customers

without benefiting the companies. Every customer faced with a charge

for customer support is an unhappy customer who is actively in search

of a more sensible competitive product.

Software Team Organization and Specialization

323

Also, since software is routinely delivered with hundreds of serious

bugs, and since customer reports of those bugs are valuable to soft-

ware vendors, charging for customer support is essentially cutting off

a valuable resource that can be used to lower maintenance costs. Few

companies that charge for support have many happy customers, and

many are losing market shares.

Unfortunately, customer support organizations are among the most

difficult of any kind of software organization to staff and organize well.

There are several reasons for this. The first is that unless a company

charges for customer support (not a recommended practice), the costs

can be high. The second is that customer-support work tends to have

limited career opportunities, and this makes it difficult to attract and

keep personnel.

As a result, customer support was one of the first business activities

to be outsourced to low-cost offshore providers. Because customer sup-

port is labor intensive, it was also among the first business activities

to attempt to automate at least some responses. To minimize the time

required for discussions with live support personnel, there are a variety

of frequently asked questions (FAQ) and other topics that users can

access by phone or e-mail prior to speaking with a real person.

Unfortunately, these automated techniques are often frustrating to

users because they require minutes of time dealing with sometimes

arcane voice messages before reaching a real person. Even worse,

these automated voice messages are almost useless for the hard of

hearing.

That being said, companies in the customer support business have

made some interesting technical innovations with voice response sys-

tems and also have developed some fairly sophisticated help-desk pack-

ages that keep track of callers or e-mails, identify bugs or defects that

have been previously reported, and assist with other administrative

functions.

Because calls and e-mail from customers contain a lot of potentially

valuable information about deep bugs and security flaws, prudent com-

panies want to capture this information for analysis and to use it as part

of their quality and security improvement programs.

At a sociological level, an organization called the Service and Support

Professionals Association (SSPA) not only provides useful information

for support personnel, but also evaluates the customer support of vari-

ous companies and issues awards and citations for excellence. The SSPA

group also has conferences and events dealing with customer support.

(The SSPA web site is www.thesspa.com.)

SSPA has an arrangement with the well-known J.D. Power and

Associates to evaluate customer service in order to motivate companies

324

Chapter Five

by issuing various awards. As an example, the SSPA web site mentions

the following recent awards as of 2009:

ProQuest Business Solutions--Most improved

■

IBM Rochester--Sustained excellence for three consecutive years

■

Oracle Corporation--Innovative support

■

Dell--Mission critical support

■

RSA Security--Best support for complex systems

■

For in-house support as opposed to commercial companies that sell

software, the massive compendium of information contained in the

Information Technology Infrastructure Library (ITIL) spells out great

topics such as Help-Desk response time targets, service agreements,

incident management, and hundreds of other items of information.

Software customer support is organized in a multitier arrangement

that uses automated and FAQ as the initial level, and then brings in

more expertise at other levels. An example of such a multitier arrange-

ment might resemble the following:

Level 0--Automated voice messages, FAQ, and pointers to available

■

downloads

Level 1--Personnel who know basics of the application and common

■

bugs

Level 2--Experts in selected topics

■

Level 3--Development personnel or top-gun experts

■

The idea behind the multilevel approach is to minimize the time

requirements of developers and experts, while providing as much useful

information as possible in what is hopefully an efficient manner.

As mentioned in a number of places in this book, the majority of customer

service calls and e-mails are due to poor quality and excessive numbers

of bugs. Therefore, more sophisticated development approaches such as

using Team Software Process (TSP), formal inspections, static analysis,

automated testing, and the like will not only reduce development costs and

schedules, but will also reduce maintenance and customer support costs.

It is interesting to consider how one of the J.D. Power award recipi-

ents, IBM Rochester, goes about customer support:

"There is a strong focus on support responsiveness, in terms of both time

to response as well as the ability to provide solutions. When customers

call in, there is a target that within a certain amount of time (a minute or

a couple of minutes), the call must be answered. IBM does not want long

hold times where customers spend >10 minutes just waiting for the phone

to be answered.

Software Team Organization and Specialization

325

When problems/defects are reported, the formal fix may take some time.

Before the formal fix is available, the team will provide a temporary solu-

tion in as soon as possible, and a key metric used is "time to first relief."

The first-relief temporary repairs may take less than 24 hours for some

new problems, and even less if the problem is already known.

When formal fixes are provided, a key metric used by IBM Rochester is

the quality of the fixes: percent of defective fixes. The Rochester's defec-

tive fix rate is the lowest among the major platforms in IBM. (Since the

industry average for bad-fix injection is about 7%, it is commendable that

IBM addresses this issue.)

The IBM Rochester support center also conducts a "trailer survey." This is

a survey of customer satisfaction about the service or fix. These surveys

are based on samples of problem records that are closed. IBM Rochester's

trailer survey satisfaction is in the high 90s in terms of percentages of

satisfied customers.

Another IBM Rochester factor could be called the "cultural factor." IBM as

a corporation and Rochester as a lab both have a long tradition of focus on

quality (e.g., winning the Malcolm Baldrige quality award). Because cus-

tomer satisfaction correlates directly with quality, the IBM Rochester prod-

ucts have long had a reputation for excellence (IBM system /34, system/36,

system /38, AS/400, system i, etc.). IBM and Rochester employees are proud

of the quality that they deliver for both products and services."

For major customer problems, teams (support, development, test, etc.)

work together to come up with solutions. Customer feedback has long

been favorable for IBM Rochester, which explains their multiyear award

for customer support excellence. Often when surveyed customers men-

tion explicitly and favorably the amount of support and problem solving

that they receive from the IBM Rochester site.

In the software world, in-house customer support staffed

Demographics

by actual employees is in rapid decline due to the recession. Probably a

few hundred large companies still provide such support, but as layoffs

and downsizing continue to escalate, their numbers will be reduced.

However, for small companies that have never employed full-time

customer support personnel, no doubt the software engineers will still

continue to field customer calls and respond to e-mails. There are prob-

ably 10,000 or more U.S. organizations with between 1 and 50 employees

where customer support tasks are performed informally by software

engineers or programmers.

For commercial software organizations, outsourcing of customer sup-

port to specialized support companies is now the norm. While some of

these support companies are domestic, there are also dozens of customer

326

Chapter Five

support organizations in other countries with lower labor costs than

the United States or Europe. However, as the recession continues, labor

costs will decline in the United States, which now has large pools of

unemployed software technical personnel. Customer support, mainte-

nance, and other labor-intensive tasks may well start to move back to

the United States.

The average size of applications where formal customer

Project size

support is close to being mandatory is about 10,000 function points. Of

course, for any size application, customers will have questions and need

to report bugs. But applications in the 10,000function point range usu-

ally have many customers. In addition, these large systems always are

released with thousand of latent bugs.

Productivity rates Productivity rates for customer support are not mea-

sured using function points, but rather numbers of customers assisted.

Typically, one tier-1 customer support person on a telephone support

desk can talk to about 30 people per day, which translates into each call

taking about 16 minutes.

For tier 2 and tier 3 customer support, where experts are used, the

work of talking to customers is probably not full time. However, for

problems serious enough to reach tier 2, expect each call to take about

70 minutes. For problems that reach tier 3, there will no doubt be mul-

tiple calls back and forth and probably some internal research. Expect

tier 3 calls to take about 240 minutes.

If a customer is reporting a new bug that has not been identified or fixed,

then days or even weeks may be required. (The author worked as an expert

witness in a lawsuit where the time required to fix one bug in a financial

application was more than nine calendar months. In the course of fixing

this bug, the first four attempts each took about two months. They not only

failed to fix the original bug, but added new bugs in each fix.)

Schedules The primary schedule issue for customer support is the wait

or hold time before speaking to a live support person. Today, in 2009,

reaching a live person can take between 10 minutes and more than 60

minutes of hold time. Needless to say, this is very frustrating to clients.

Improving quality should also reduce wait times. Assuming constant

support staffing, every reduction of ten severity 1 or 2 defects released

in software should reduce wait times by about 30 seconds.

Quality Customer support calls are directly proportional to the number

of released defects or bugs in software. It is theoretically possible that

releasing software with zero defects might reduce the number of cus-

tomer support calls to zero, too. In today's world, where defect removal

Software Team Organization and Specialization

327

efficiency only averages 85 percent and hundreds or thousands of seri-

ous bugs are routinely still present when software is released, there will

be hundreds of customer support calls and e-mails.

It is interesting that some open-source and freeware applications such

as Linux, Firefox, and Avira seem to have better quality levels than equiv-

alent applications released by established vendors such as Microsoft and

Symantec. In part this may be due to the skills of the developers, and in

part it may be due to routinely using tools such as static analysis prior

to release.

Specialization The role of tier-1 customer support is very specialized.

Effective customer support requires a good personality when dealing

with crabby customers plus fairly sophisticated technical skills. Of these

two, the criterion for technical skill is easier to fill then the criterion for

a good personality when dealing with angry or outraged customers. That

being said, there are few studies to date that deal with either personal-

ity or technical skills in support organizations.

In addition to customer support provided by vendors of software, some

user associations and nonprofit groups provide customer support on

a volunteer basis. Many freeware and open-source applications have

user groups that can answer technical questions. Even for commercial

software, it is sometimes easier to get an informed response to a ques-

tion from an expert user than it is from the company that built the

software.

The main caution about customer sup-

Cautions and counter indications

port work is that it tends to lock personnel into narrow careers, some-

times limited to discussing a single application such as Oracle or SAP

for a period of years. There is little chance of career growth or knowledge

expansion.

Another caution is that improving customer support via automation

and expert systems is technically feasible, but many existing patents

cover such topics. As a result, attempts to develop improved customer

support automation may require licensing of intellectual property.

The literature on customer support is dominated by

Conclusions

two very different forms of information. The Information Technology

Infrastructure Library (ITIL) contains more than 30 volumes and more

than 5,000 pages of information on every aspect of customer support.

However, the ITIL library is aimed primarily at in-house customer sup-

port and is not used very much by commercial software vendors.

For commercial software customer support, some trade books are

available, but the literature tends to be dominated by white papers

and monographs published by customer support outsource companies.

328

Chapter Five

Although these tend to be marketing texts, some of them do provide

useful information about the mechanics of customer support. There

are also interesting reports available from companies that provide cus-

tomer-support automation, which is both plentiful and seems to cover

a wide range of features.

Given the fact that customer support is a critical activity of the

software industry in 2009, a great deal more research and study are

indicated. Research is needed on the relationship between quality and

customer support, on the role of user associations and volunteer groups,

and on the potential automation that might improve customer support.

In particular, research is needed on providing customer support for deaf

and hard-of-hearing customers, blind customers, and those with other

physical challenges.

Software Test Organizations

There are ten problems with discussing software test organizations that

need to be highlighted:

1. There are more than 15 different kinds of software testing.

2. Many kinds of testing can be performed either by developers, by

in-house test organizations, by outsource test organizations, or by

quality assurance teams based on company test strategies.

3. With Agile teams and with hierarchical organizations, testers will

probably be embedded with developers and not have separate

departments.

4. Matrix organizations testers would probably be in a separate test-

ing organization reporting to a skill manager, but assigned to spe-

cific projects as needed.

5. Some test organizations are part of quality assurance organizations

and therefore have several kinds of specialists besides testing.

6. Some quality assurance organizations collect data on test results,

but do no testing of their own.

7. Some testing organizations are called "quality assurance" and per-

form only testing. These may not perform other QA activities such

as moderating inspections, measuring quality, predicting quality,

teaching quality, and so on.

8. For any given software application, the number of separate kinds

of testing steps ranges from a low of 1 form of testing to a high of

17 forms of testing based on company test strategies.

Software Team Organization and Specialization

329

9. For any given software application, the number of test and/or qual-

ity assurance organizations that are part of its test strategy can

range from a low of one to a high of five, based on company quality

strategies.

10. For any given defect removal activity, including testing, as many as

11 different kinds of specialists may take part.

As can perhaps be surmised from the ten points just highlighted,

there is no standard way of testing software applications in 2009. Not

only is there no standard way of testing, but there are no standard

measures of test coverage or defect removal efficiency, although both

are technically straightforward measurements.

The most widely used form of test measurement is that of test cover-

age, which shows the amount of code actually executed by test cases.

Test coverage measures are fully automated and therefore easy to do.

This is a useful metric, but much more useful would be to measure

defect removal efficiency as well.

Defect removal efficiency is more complicated and not fully auto-

mated. To measure the defect removal efficiency of a specific test stage

such as unit test, all defects found by the test are recorded. After unit

test is finished, all other defects found by all other tests are recorded,

as are defects found by customers in the first 90 days. When all defects

have been totaled, then removal efficiency can be calculated.

Assume unit test found 100 defects, function test and later test stages

found 200 defects, and customers reported 100 defects in the first

90 days of use. The total number of defects found was 400. Since unit

test found 100 out of 400 defects, in this example, its efficiency is 25

percent, which is actually not far from the 30 percent average value of

defect removal efficiency for unit test.

(A quicker but less reliable method for determining defect removal

efficiency is that of defect seeding. For example, if 100 known bugs were

seeded into the software discussed in the previous paragraph and 25

were found, then the defect removal efficiency level of 25 percent could

be calculated immediately. However, there is no guarantee that the

"tame" bugs that were seeded would be found at exactly the same rate

as "wild" bugs that are made by accident.)

It is an unfortunate fact that most forms of testing are not very effi-

cient and find only about 25 percent to 40 percent of the bugs that are

actually present, although the range is from less than 20 percent to

more than 70 percent.

It is interesting that there is much debate over black box testing,

which lacks information on internals; white box testing, with full vis-

ibility of internal code; and gray box testing, with visibility of internals,

but testing is at the external level.

330

Chapter Five

So far as can be determined, the debate is theoretical, and few experi-

ments have been performed to measure the defect removal efficiency

levels of black, white, or gray box testing. When measures of efficiency

are taken, white box testing seems to have higher levels of defect

removal efficiency than black box testing.

Because many individual test stages such as unit test are so low

in efficiency, it can be seen why several different kinds of testing are

needed. The term cumulative defect removal efficiency refers to the

overall efficiency of an entire sequence of tests or defect removal

operations.

As a result of lack of testing standards and lack of widespread test-

ing effectiveness measurements, testing by itself does not seem to be a

particularly cost-effective approach for achieving high levels of quality.

Companies that depend purely upon testing for defect removal almost

never top 90 percent in cumulative defect removal, and often are below

75 percent.

The newer forms of testing such as test-driven development (TDD)

use test cases as a form of specification and create the test cases first,

before the code itself is created. As a result, the defect removal efficiency

of TDD is higher than many forms of testing and can top 85 percent.

However, even with TDD, bad-fix injection needs to be factored into the

equation. About 7 percent of attempts to fix bugs accidentally include

new bugs in the fixes.

If TDD is combined with other approaches such as formal inspection

of the test cases and static analysis of the code, then defect removal

efficiency can top 95 percent.

There is some ambiguity in the data that deals with automatic testing

versus manual testing. In theory, automatic testing should have higher

defect removal efficiency than manual testing in at least 70 percent

of trials. For example, manual unit testing averages about 30 percent

in terms of defect removal efficiency, while automatic testing may top

50 percent. However, testing skills vary widely among software engi-

neers and programmers, and automatic testing also varies widely. More

study of this topic is indicated.

The poor defect removal efficiency of normal testing brings up an

important question: If testing is not very effective in finding and remov-

ing bugs, what is effective? This is an important question, and it is

also a question that should be answered in a book entitled Software

Engineering Best Practices.

The answer to the question of "What is effective in achieving high

levels of quality?" is that a combination of defect prevention and mul-

tiple forms of defect removal is needed for optimum effectiveness.

Defect prevention refers to methods and techniques that can lower

defect potentials from U.S. averages of about 5.0 per function point.

Software Team Organization and Specialization

331

Examples of methods that have demonstrated effectiveness in terms

of defect prevention include the higher levels of the capability matu-

rity model integration (CMMI), joint application design (JAD), qual-

ity function deployment (QFD), root-cause analysis, Six Sigma for

software, the Team Software Process (TSP), and also the Personal

Software Process (PSP).

For small applications, the Agile method of having an embedded user

as part of the team can also reduce defect potentials. (The caveat with

embedded users is that for applications with more than about 50 users,

one person cannot speak for the entire set of users. For applications with

thousands of users, having a single embedded user is not adequate. In such

cases, focus groups and surveys of many users are necessary.)

As it happens, formal inspections of requirements, design, and code

serve double duty and are very effective in terms of defect prevention as

well as being very effective in terms of defect removal. This is because

participants in formal inspections spontaneously avoid making the

same mistakes that are found during the inspections.

The combination of methods that have been demonstrated to raise

defect removal efficiency levels includes formal inspections of require-

ments, design, code, and test materials; static analysis of code prior to

testing; and then a test sequence that includes at least eight forms of

testing: (1) unit test, (2) new function test, (3) regression test, (4) per-

formance test, (5) security test, (6) usability test, (7) system test, and

(8) some form of external test with customers or clients, such as beta

test or acceptance test.

Such a combination of pretest inspections, static analysis, and at least

eight discrete test stages will usually approach 99 percent in terms of

cumulative defect removal efficiency levels. Not only does this combination

raise defect removal efficiency levels, but it is also very cost-effective.

Projects that top 95 percent in defect removal efficiency levels usually

have shorter development schedules and lower costs than projects that

skimp on quality. And, of course, they have much lower maintenance

and customer support costs, too.

Testing is a teachable skill, and there are a number of for-profit and

nonprofit organizations that offer seminars, classes, and several flavors

of certification for test personnel. While there is some evidence that

certified test personnel do end up with higher levels of defect removal

efficiency than uncertified test personnel, the poor measurement and

benchmark practices of the software industry make that claim some-

what anecdotal. It would be helpful if test certification included a

learning segment on how to measure defect removal efficiency.

Following in Table 5-3 are examples of a number of different forms

of software inspection, static analysis, and testing, with the probable

organization that performs each activity indicated.

332

Chapter Five

TABLE 5-3

Forms of Software Defect Removal Activities

Pretest Removal Inspections

Performed by

Requirements

Analysts

Design

Designers

Code

Programmers

Test plans

Testers

Test cases

Testers

Static analysis

Programmers

General Testing

Subroutine test

Programmers

Unit test

Programmers

New function test

Testers or programmers

10.

Regression test

Testers or programmers

11.

System test

Testers or programmers

Special Testing

12.

Performance testing

Performance specialists

13.

Security testing

Security specialists

14.

Usability testing

Human factors specialists

15.

Component testing

Testers

16.

Integration testing

Testers

17.

Nationalization testing

Foreign language experts

18.

Platform testing

Platform specialists

19.

SQA validation testing

Software quality assurance

20.

Lab testing

Hardware specialists

External Testing

21.

Independent testing

External test company

22.

Beta testing

Customers

23.

Acceptance testing

Customers

Special Activities

24.

Audits

Auditors, SQA

25.

Independent verification and validation (IV&V)

IV&V contractors

26.

Ethical hacking

Hacking consultants

Table 5-3 shows 26 different kinds of defect removal activity carried

out by a total of 11 different kinds of internal specialists, 3 specialists

from outside companies, and also by customers. However, only very large

and sophisticated high-technology companies would have such a rich

mixture of specialization and would utilize so many different kinds of

defect removal.

Smaller companies would either have the testing carried out by software

engineers or programmers (who often are not well trained), or they would

Software Team Organization and Specialization

333

have a testing group staffed primarily by testing specialists. Testing can

also be outsourced, although as of 2009, this activity is not common.

At this point, it is useful to address three topics that are not well

covered in the testing literature:

1. How many testers are needed for various kinds of testing?

2. How many test cases are needed for various kinds of testing?

3. What is the defect removal efficiency of various kinds of testing?

Table 5-4 shows the approximate staffing levels for the 17 forms of

testing that were illustrated in Table 5-3. Note that this information is

only approximate, and there are wide ranges for each form of testing.

Because testing executes source code, the information in Table 5-4

is based on source code counts rather than on function points. With

more than 700 programming languages ranging from assembly through

TABLE 5-4

Test Staffing for Selected Test Stages

Application language

Java

Application code size

50,000

Application KLOC

Function points

1,000

General Testing

Assignment Scope

Test Staff

Subroutine test

10,000

5.00

Unit test

10,000

5.00

New function test

25,000

2.00

Regression test

25,000

2.00

System test

50,000

1.00

Special Testing

Performance testing

50,000

1.00

Security testing

50,000

1.00

Usability testing

25,000

2.00

Component testing

25,000

2.00

10.

Integration testing

50,000

1.00

11.

Nationalization testing

1,50,000

0.33

12.

Platform testing

50,000

1.00

13.

SQA validation testing

75,000

0.67

14.

Lab testing

50,000

1.00

External Testing

15.

Independent testing

7,500

6.67

16.

Beta testing

25,000

2.00

17.

Acceptance testing

25,000

2.00

334

Chapter Five

modern languages such as Ruby and E, the same application illustrated

in Table 5-4 might vary by more than 500 percent in terms of source

code size. Java is the language used in Table 5-4 because it is one of the

most common languages in 2009.

The column labeled "Assignment Scope" illustrates the amount of

source code that one tester will probably be responsible for testing.

Note that there are very wide ranges in assignment scopes based on the

experience levels of test personnel, on the cyclomatic complexity of the

code, and to a certain extent, on the specific language or combination of

languages in the application being tested.

Because the testing shown in Table 5-4 involves a number of differ-

ent people with different skills who probably would be from different

departments, the staffing breakdown for all 17 tests would include

5 developers through unit test; 2 test specialists for integration and

system test; 3 specialists for security, nationalization, and usability

test; 1 SQA specialist; 7 outside specialists from other companies; and

2 customers: 20 people in all.

Of course, it is unlikely that any small application of 1,000 function

points or 50 KLOC (thousands of lines of code) would use (or need) all

17 of these forms of testing. The most probable sequence for a 50-KLOC

Java application would be 6 kinds of testing performed by 5 developers,

2 test specialists, and 2 users, for a total of 9 test personnel in all.

In Table 5-5, data from the previous tables is used as the base for

staffing, but the purpose of Table 5-5 is to show the approximate num-

bers of test cases produced for each test stage, and then the total number

of test cases for the entire application. Here, too, there are major varia-

tions, so the data is only approximate.

The code defect potential for the 50 KLOC code sample of the Java

application would be about 1,500 total bugs, which is equal to 1.5 code

bugs per function point, or 30 bugs per KLOC. (Note that earlier bugs

in requirements and design are excluded and assumed to have been

removed before testing begins.)

If all 17 of the test stages were used, they would probably detect about 95

percent of the total bugs present, or 1,425 in all. That would leave 75 bugs

latent when the application is delivered. Assuming both the numbers for

potential defects and the numbers for test cases are reasonably accurate

(a questionable assumption) then it takes an average of 1.98 test cases to

find 1 bug.

Of course, since only about 6 out of the 17 test stages are usually per-

formed, the removal efficiency would probably be closer to 75 percent, which

is why additional nontest methods such as inspections and static analysis

are needed to achieve really high levels of defect removal efficiency.

If even this small 50-KLOC example uses more than 2,800 test cases,

it is obvious that corporations with hundreds of software applications

Software Team Organization and Specialization

335

TABLE 5-5

Test Cases for Selected Test Stages

Application language

Java

Application code size

50,000

Application KLOC

Function points

1,000

Test Cases

Total Test

Test Cases

Test Staff

Per KLOC

Cases

Per Person

General Testing

Subroutine test

5.00

12.00

600

120.00

Unit test

5.00

10.00

500

100.00

New function test

2.00

5.00

250

125.00

Regression test

2.00

4.00

200

100.00

System test

1.00

3.00

150

150.00

Special Testing

Performance testing

1.00

50.00

Security testing

1.00

3.00

150

150.00

Usability testing

2.00

3.00

150

75.00

Component testing

2.00

1.50

37.50

10.

Integration testing

1.00

1.50

75.00

11.

Nationalization testing

0.33

0.50

75.76

12.

Platform testing

1.00

2.00

100

100.00

13.

SQA validation testing

0.67

1.00

74.63

14.

Lab testing

1.00

50.00

External Testing

15.

Independent testing

6.67

4.00

200

29.99

16.

Beta testing

2.00

100

50.00

17.

Acceptance testing

2.00

100

50.00

TOTAL TEST CASES

2825

TEST CASES PER KLOC

56.50

TEST CASES PER PERSON (20 TESTERS)

141.25

will eventually end up with millions of test cases. Once created, test cases

have residual value for regression test purposes. Fortunately, a number

of automated tools can be used to store and manage test case libraries.

The existence of such large test libraries is a necessary overhead

of software development and maintenance. However, this topic needs

additional study. Creating reusable test cases would seem to be of value.

Also, there are often errors in test cases, which is why inspections of test

plans and test cases are useful.

With hundreds of different people creating test cases in large com-

panies and government agencies, there is a good chance that duplicate

tests will accidentally be created. In fact, this does occur, and a study at

336

Chapter Five

IBM noted about 30 percent redundancy or duplicates in one software

lab's test library.

The final Table 5-6 in this section shows defect removal efficiency

levels against six sources of error: requirements defects, design defects,

coding defects, security defects, defects in test cases, and performance

defects.

Table 5-6 is complicated by the fact that not every defect removal

method is equally effective against each type of defect. In fact, many

TABLE 5-6

Defect Removal Efficiency by Defect Type

Pretest Removal

Req.

Des.

Code

Sec.

Test

Perf.

Inspections:

defects

1. Requirements

85.00%

2. Design

85.00%

25.00%

3. Code

85.00%

40.00%

15.00%

4. Test plans

85.00%

5. Test cases

85.00%

6. Static analysis

30.00%

87.00%

25.00%

20.00%

General Testing

7. Subroutine test

35.00%

10.00%

8. Unit test

30.00%

10.00%

9. New function test

15.00%

35.00%

10.00%

10. Regression test

15.00%

11. System test

10.00%

20.00%

25.00%

7.00%

25.00%

Special Testing

12. Performance testing

5.00%

10.00%

70.00%

13. Security testing

65.00%

14. Usability testing

10.00%

15. Component testing

10.00%

25.00%

16. Integration testing

10.00%

30.00%

17. Nationalization testing

3.00%

18. Platform testing

10.00%

19. SQA validation testing

5.00%

15.00%

20. Lab testing

10.00%

20.00%

External Testing

21. Independent testing

5.00%

30.00%

5.00%

5.00% 10.00%

22. Beta testing

30.00%

25.00%

10.00%

15.00%

23. Acceptance testing

30.00%

20.00%

5.00%

15.00%

Special Activities

24. Audits

15.00%

10.00%

25. Independent verification

and validation (IV&V)

10.00%

26. Ethical hacking

85.00%

Software Team Organization and Specialization

337

forms of defect removal have 0 percent efficiency against security flaws.

Coding defects are the easiest type of defect to remove; requirements

defects, security defects, and defects in test materials are the most dif-

ficult to eliminate.

Historically, formal inspections have the highest levels of defect

removal efficiency against the broadest range of defects. The more

recent method of static analysis has a commendably high level of defect

removal efficiency against coding defects, but currently operates only on

about 15 programming languages out of more than 700.

The data in Table 5-6 has a high margin of error, but the table itself

shows the kind of data that needs to be collected in much greater volume

to improve software quality and raise overall levels of defect removal effi-

ciency across the software industry. In fact, every software application

larger than 1,000 function points in size should collect this kind of data.

One important source of defects is not shown in Table 5-6 and that

is bad-fix injection. About 7 percent of bug repairs contain a fresh

bug in the repair itself. Assume that unit testing found and removed

100 bugs in an application. But there is a high probability that 7 new

bugs would be accidentally injected into the application due to errors

in the fixes themselves. (Bad-fix injections greater than 25 percent may

occur with error-prone modules.)

Bad-fix injection is a very common source of defects in software, but it

is not well covered either in the literature on testing or in the literature

on software quality assurance.

Another quality issue that is not well covered is that of error-

prone modules. As mentioned elsewhere in this book, bugs are not

randomly distributed, but tend to clump in a small number of very

buggy modules.

If an application contains one or more error-prone modules, then

defect removal efficiency levels against those modules may be only half

of the values shown in Table 5-6, and bad-fix injection rates may top

25 percent. This is why error-prone modules can seldom be repaired, but

need to be surgically removed and replaced by a new module.

In spite of the long history of testing and the large number of test per-

sonnel employed by the software industry, a great deal more research

is needed. Some of the topics that need research are automatic genera-

tion of test cases from specifications, developing reusable test cases,

better predictions of test case numbers and removal efficiency, and

much better measurement of test results in terms of defect removal

efficiency levels.

Demographics In the software world, testing has long been one of the

major development activities, and test personnel are among the largest

software occupation groups. But to date there is no accurate census of

338

Chapter Five

test personnel, due in part to the fact that so many different kinds of

specialists get involved in testing.

Because testing is on the critical path for releasing software, there

is a tendency for software project managers or even senior executives

to put pressure on test personnel to truncate testing when schedules

are slipping. By having test organizations reporting to separate skill

managers, as opposed to project or application managers, this adds a

measure of independence.

However, testing is such an integral part of software development that

test personnel need to be involved essentially from the first day that

development begins. Whether testers report to skill managers or are

embedded in project teams, they need early involvement during require-

ment and design. This is especially true with test-driven development

(TDD), where test cases are an integral part of the requirements and

design processes.

The minimum size of applications where formal testing

Project size

is mandatory is about 100 function points. As a rule, the larger the

application, the more kinds of pretest defect removal activities and

more kinds of testing are needed to be successful or even to finish the

application at all.

For large systems less than 10,000 function points, inspections, static

analysis, security analysis, and about ten forms of testing are needed

to achieve high levels of defect removal efficiency. Unfortunately, many

companies skimp on testing and nontest activities, so U.S. average

results are embarrassingly bad: 85 percent cumulative defect removal

efficiency. These results have been fairly flat or constant from 1996

through 2009.

Productivity rates There are no effective productivity rates for testing.

There are no effective size metrics for test cases. At a macro level, testing

productivity can be measured by using "work hours per function point"

or the reciprocal "function points per staff month," but those measures

are abstract and don't really capture the essence of testing.

Measures such as "test cases created per month" or "test cases exe-

cuted per month" send the wrong message, because they might encour-

age extra testing simply to puff up the results and not raise defect

removal efficiency.

Measures such as "defects detected per month" are unreliable, because

for really top-gun developers, there may not be very many defects to

find. The "cost per defect" metric is also unreliable for the same reason.

Testers will still run many test cases whether an application has any

bugs or not. As a result, cost per defect rises as defect quantities go

down; hence the cost per defect metric penalizes quality.

Software Team Organization and Specialization

339

The primary schedule issues for test personnel are those of

Schedules

test case creation and test case execution. But testing schedules depend

more upon the number of bugs found and the time it takes to repair the

bugs than on test cases.

One factor that is seldom measured but also delays test schedules

is bugs or defects in test cases themselves. A study done some years

ago by IBM found more bugs in test cases than in the applications

being tested. This topic is not well covered by the testing literature.

(This was the same study that had found about 30 percent redun-

dant or duplicate test cases in test libraries.) Running duplicate test

cases adds to testing costs and schedules, but not to defect removal

efficiency levels.

When testing starts on applications with high volumes of defects, the

entire schedule for the project is at risk, because testing schedules will

extend far beyond their planned termination. In fact, testing delays due

to excessive defect volumes is the main reason for software schedule

delays.

The most effective way to minimize test schedules is to have very few

defects present because pretest inspections and static analysis found

most of them before testing began. Defect prevention such as TSP or

joint application design (JAD) can also speed up test schedules.

For the software industry as a whole, delays in testing due to exces-

sive bugs is a major cause of application cost and schedule overruns

and also of project cancellations. Because long delays and cancellation

trigger a great deal of litigation, high defect potentials and low levels

of defect removal efficiency are causative factors in breach of contract

lawsuits.

Quality Testing by itself has not been efficient enough in finding bugs to

be the only form of defect removal used on major software applications.

Testing alone almost never tops 85 percent defect removal efficiency,

with the exception of the newer test-driven development (TDD), which

can hit 90 percent.

Testing combined with formal inspections and static analysis achieves

higher levels of defect removal efficiency, shorter schedules, and lower

costs than testing alone. Moreover, these savings not only benefit devel-

opment, but also lower the downstream costs of customer support and

maintenance.

Readers who are executives and qualified to sign contracts are

advised to consider 95 percent as the minimum acceptable level of

defect removal efficiency. Every outsource contract, every internal qual-

ity plan, and every license with a software vendor should require proof

that the development organization will top 95 percent in defect removal

efficiency.

340

Chapter Five

Testing specialization covers a wide range of skills.

Specialization

However, for many small companies with a generalist philosophy, soft-

ware developers may also serve as software testers even though they

may not be properly trained for the role.

For large companies, a formal testing department staffed by testing

specialists will give better results than development testing by itself.

For very large multinational companies and for companies that build

systems and embedded software, test and quality assurance specialists

will be numerous and have many diverse skills.

There are several forms of test certification available. Testers who go

to the trouble of achieving certification are to be commended for taking

their work seriously. However, there is not a great deal of empirical

data that compares the defect removal efficiency levels of tests carried

out by certified testers versus the same kind of testing performed by

uncertified testers.

The main caution about testing is that

Cautions and counter indications

it does not find very many bugs or defects. For more than 50 years, the

software industry has routinely delivered large software applications

with hundreds of latent bugs, in spite of extensive testing.

A second caution about testing is that testing cannot find require-

ments errors such as the famous Y2K problem. Once an error becomes

embedded in requirements and is not found via inspections, quality func-

tion deployment (QFD), or some other nontest approach, all that testing

will accomplish is to confirm the error. This is why correct requirements

and design documents are vital for successful testing. This also explains

why formal inspections of requirements and design documents raise

testing efficiency by about 5 percent per test stage.

The literature on testing is extensive but almost totally

Conclusions

devoid of quantitative data that deals with defect removal efficiency,

with testing costs, with test staffing, with test specialization, with

return on investment (ROI), or with the productivity of test personnel.

However, there are dozens of books and hundreds of web sites with

information on testing.

Several nonprofit organizations are involved with testing, such as the

Association for Software Testing (AST) and the American Society for

Quality (ASQ). There is also a Global Association for Software Quality

(GASQ).

There are local and regional software quality organizations in many

cities. There are also for-profit test associations that hold a number of

conferences and workshops, and also offer certification exams.

Given the central role of testing over the past 50 years of software

engineering, the gaps in the test literature are surprising and dismaying.

Software Team Organization and Specialization

341

A technical occupation that has no clue about the most efficient and cost-

effective methods for preventing or removing serious errors is not qualified

to be called "engineering."

Some of the newer forms of testing such as test-driven development

(TDD) are moving in a positive direction by shifting test case develop-

ment to earlier in the development cycle, and by joining test cases with

requirements and design. These changes in test strategy result in higher

levels of defect removal efficiency coupled with lower costs as well.

But to achieve really high levels of quality in a cost-effective manner,

testing alone has always been insufficient and remains insufficient in

2009. A synergistic combination of defect prevention and a multiphase

suite of defect removal activities that combine inspections, static analysis,

automated testing, and manual testing provide the best overall results.

For the software industry as a whole, defect potentials have been far

too high, and defect removal efficiency far too low for far too many years.

This unfortunate combination has raised development costs, stretched

out development schedules, caused many failures and also litigation,

and raised maintenance and customer support costs far higher than

they should be.

Defect prevention methods such as Team Software Process (TSP),

quality function deployment (QFD), Six Sigma for software, joint appli-

cation design (JAD), participation in inspections, and certified reusable

components have the theoretical potential of lowering defect potentials

by 80 percent or more compared with 2009. In other words, defect poten-

tials could drop from about 5.0 per function point down to about 1.0 per

function point or lower.

Defect removal combinations that include formal inspections, static

analysis, test-driven development, using both automatic and manual

testing, and certified reusable test cases could raise average defect

removal efficiency levels from today's approximate average of about

85 percent in 2009 up to about 97 percent. Levels that approach

99.9 percent could even be achieved in many cases.

Effective combinations of defect prevention and defect removal

activities are available in 2009 but seldom used except by a few very

sophisticated organizations. What is lacking is not so much the tech-

nologies that improve quality, but awareness of how effective the best

combinations really are. Also lacking is awareness of how ineffective

testing alone can be. It is lack of widespread quality measurements

and lack of quality benchmarks that are delaying improvements in

software quality.

Also valuable are predictive estimating tools that can predict both

defect potentials and the defect removal efficiency levels of any com-

bination of review, inspection, static analysis, automatic test stage,

and manual test stage. Such tools exist in 2009 and are marketed by

342

Chapter Five

companies such as Software Productivity Research (SPR), SEER, Galorath,

and Price Systems. Even more sophisticated tools that can predict the

damages that latent defects cause to customers exist in prototype form.

The final conclusion is that until the software industry can routinely

top 95 percent in average defect removal efficiency levels, and hit 99

percent for critical software applications, it should not even pretend

to be a true engineering discipline. The phrase "software engineering"

without effective quality control is a hoax.

Software Quality Assurance (SQA)

Organizations

The author of this book worked for five years in IBM's Software Quality

Assurance organizations in Palo Alto and Santa Teresa, California. As a

result, the author may have a residual bias in favor of SQA groups that

function along the lines of IBM's SQA groups.

Within the software industry, there is some ambiguity about the role

and functions of SQA groups. Among the author's clients (primarily

Fortune 500 companies), following is an approximate distribution of

how SQA organizations operate:

In about 50 percent of companies, SQA is primarily a testing orga-

■

nization that performs regression tests, performance tests, system

tests, and other kinds of testing that are used for large systems as

they are integrated. The SQA organization reports to a vice president

of software engineering, to a CIO, or to local development managers

and is not an independent organization. There may be some respon-

sibility for measuring quality, but testing is the main focus. These

SQA organizations tend to be quite large and may employ more than

25 percent of total software engineering personnel.

In about 35 percent of companies, SQA is a focal point for estimating

■

and measuring quality and ensuring adherence to local and national

quality standards. But the SQA group is separate from testing orga-

nizations, and performs only limited and special testing such as stan-

dards adherence. To have an independent view, the SQA organization

reports to its own vice president of quality and is not part of the devel-

opment or test organizations. (This is the form of SQA that IBM had

when the author worked there.) These organizations tend to be fairly

small and employ between 1 percent and 3 percent of total software

engineering personnel.

About 10 percent of companies have a testing organization but no

■

SQA organization at all. The testing group usually reports to the CIO

or to a vice president or senior software executive. In such situations,

testing is the main focus, although there may be some measurement

Software Team Organization and Specialization

343

of quality. While the testing organization may be large, the staffing

for SQA is zero.

In about 5 percent of companies, there is a vice president of SQA and

■

possibly one or two assistants, but nobody else. In this situation, SQA

is clearly nothing more than an act that can be played when custom-

ers visit. Such organizations may have testing groups that report

to various development managers. The so-called SQA organizations

where there are executives but no SQA personnel employ less than

one-tenth of one percent of total software engineering personnel.

Because software quality assurance (SQA) is concerned with more than

testing, it is interesting to look at the activities and roles of "traditional"

SQA groups that operate independently from test organizations.

1. Collecting and measuring software quality during development and

after release, including analyzing test results and test coverage. In

some organizations such as IBM, defect removal efficiency levels

are also calculated.

2. Predicting software quality levels for major new applications,

including construction of special quality estimating tools.

3. Performing statistical studies of quality or carrying out root-cause

analysis.

4. Examining and teaching quality methods such as quality function

deployment (QFD) or Six Sigma for software.

5. Participating in software inspections as moderators or recorders,

and also teaching inspections.

6. Ensuring that local, national, and international quality standards

are followed. SQA groups are important for achieving ISO 9000

certification, for example.

7. Monitoring the activities associated with the various levels of the

capability maturity model integration (CMMI). SQA groups play a

major part in software process improvements and ascending to the

higher levels of the CMMI.

8. Performing specialized testing such as standards adherence.

9. Teaching software quality topics to new employees.

10. Acquiring quality benchmark data from external organizations

such as the International Software Benchmarking Standards

Group (ISBSG).

A major responsibility of IBM's SQA organization was determining

whether the quality level of new applications was likely to be good

344

Chapter Five

enough to ship the application to customers. The SQA organization

could stop delivery of software that was felt to have insufficient quality

levels.

Development managers could appeal an SQA decision to stop the

release of questionable software, and the appeal would be decided by

IBM's president or by a senior vice president. This did not happen often,

but when it did, the event was taken very seriously by all concerned.

The fact that the SQA group was vested with this power was a strong

incentive for development managers to take quality seriously.

Obviously, for SQA to have the power to stop delivery of a new applica-

tion, the SQA team had to have its own chain of command and its own

senior vice president independent of the development organization. If

SQA had reported to a development executive, then threats or coercion

might have made the SQA role ineffective.

One unique feature of the IBM SQA organization was a formal "SQA

research" function, which provided time and resources for carrying out

research into topics that were beyond the state of the art currently avail-

able. For example, IBM's first quality estimation tool was developed

under this research program. Researchers could submit proposals for

topics of interest, and those selected and approved would be provided

with time and with some funding if necessary.

Several companies encourage SQA and other software engineering

personnel to write technical books and articles for outside journals such

as CrossTalk (the U.S. Air Force software journal) or some of the IEEE

journals.

One company, ITT, as part of its software engineering research lab,

allowed articles to be written during business hours and even provided

assistance in creating camera-ready copy for books. It is a significant

point that authors should be allowed to keep the royalties from the

technical books that they publish.

It is an interesting phenomenon that almost every company with

defect removal efficiency levels that average more than 90 percent has

a formal and active SQA organization. Although formal and active SQA

groups are associated with better-than-average quality, the data is not

sufficient to assert that SQA is the primary cause of high quality.

The reason is that most organizations that have low software quality

don't have any measurements in place, and their poor quality levels only

show up if they commission a special assessment, or if they are sued

and end up in court.

It would be nice to say that organizations with formal SQA teams aver-

age greater than 90 percent in defect removal efficiency and that similar

companies doing similar software that lack formal SQA teams average

less than 80 percent in defect removal efficiency. But the unfortunate fact

is that only the companies with formal SQA teams are likely to know

Software Team Organization and Specialization

345

what their defect removal efficiency levels are. In fact, quality measure-

ment practices are so poor that even some companies that do have an

SQA organization do not know their defect removal efficiency levels.

In the software world, SQA is not large numerically,

Demographics

but has been a significant source of quality innovation. There are per-

haps 5,000 full-time SQA personnel employed in the United States as

of 2009.

SQA organizations are very common in companies that build sys-

tems software, embedded software, or commercial software, such as SAP,

Microsoft, Oracle, and the like. SQA organizations are less common in

IT groups such as banks and finance companies, although they do occur

within the larger companies.

Many cities have local SQA organizations, and there are also national

and international equality associations as well.

There is one interesting anomaly with SQA support of software appli-

cations. Development teams that use the Team Software Process (TSP)

have their own internal equivalent of SQA and also collect extensive

data on bugs and quality. Therefore, TSP teams normally do not have

any involvement from corporate SQA organizations. They of course

provide data to the SQA organization for corporate reporting purposes,

but they don't have embedded SQA personnel.

Normally, SQA involvement is mandatory for large appli-

Project size

cations above about 2,500 function points. While SQA involvement

might be useful for smaller applications, they tend to have better

quality than large applications. Since SQA resources are limited,

concentrating on large applications is perhaps the best use of SQA

personnel.

There are no effective productivity rates for SQA

Productivity rates

groups. However, it is an interesting and important fact that produc-

tivity rates for software applications that do have SQA involvement,

and which manage to top 95 percent in defect removal efficiency,

are usually much better than applications of the same size that

lack SQA.

Even if SQA productivity itself is ambiguous, measuring the quality

and productivity of the applications that are supported by SQA teams

indicates that SQA has significant business value.

The primary schedule issues for SQA teams are the overall

Schedules

schedules for the applications that they support. As with productivity

and quality, there is evidence that an SQA presence on an application

tends to prevent schedule delays.

346

Chapter Five

Indeed if SQA is successful in introducing formal inspections, sched-

ules can even be shortened.

The most effective way to shorten software development schedules is

to have very few defects due to defect prevention, and to remove most

of them prior to testing due to pretest inspections and static analysis.

Since SQA groups push hard for both defect prevention and early defect

removal, an effective SQA group will benefit development schedules--and

especially so for large applications, which typically run late.

For the software industry as a whole, delays due to excessive bugs

are a major cause of application cost and schedule overruns and also of

project cancellations. Effective SQA groups can minimize the endemic

problems.

It is a proven fact that an effective SQA organization can lead to

significant cost reductions and significant schedule improvements for

software projects. Yet because the top executives in many companies do

not understand the economic value of high quality and regard quality

as a luxury rather than a business necessity, SQA personnel are among

the first to be let go during a recession.

Quality The roles of SQA groups center on quality, including quality

measurement, quality predictions, and long-range quality improvement.

SQA groups also have a role in ISO standards and the CMMI. SQA

organizations also teach quality courses and assist in the deployment

of methods such as quality function deployment (QFD) and Six Sigma

for software. In fact, it is not uncommon for many SQA personnel to be

Six Sigma black belts.

There is some uncertainty in 2009 about the role of SQA groups when

test-driven development (TDD) is utilized. Because TDD is fairly new,

the intersection of TDD and SQA is still evolving.

As already mentioned in the testing section of this chapter, read-

ers who are executives and qualified to sign contracts are advised to

consider 95 percent as the minimum acceptable level of defect removal

efficiency. Every outsource contract, every internal quality plan, and

every license with a software vendor should require proof that the devel-

opment organization will top 95 percent in defect removal efficiency.

There is one troubling phenomenon that needs more study. Large

systems above 10,000 function points are often released with hundreds

of latent bugs in spite of extensive testing and sometimes in spite of

large SQA teams. Some of these large systems ended up in lawsuits

where the author happened to an expert witness. It usually happened

that the advice of the SQA teams was not taken, and that the project

manager skimped on quality control in a misguided attempt to compress

schedules.

Software Team Organization and Specialization

347

Specialization SQA specialization covers a wide range of skills that can

include statistical analysis, function point analysis, and also testing.

Other special skills include Six Sigma, complexity analysis, and root-

cause analysis.

The main caution about SQA is that it

Cautions and counter indications

is there to help, and not to hinder. Dogmatic attitudes are counterpro-

ductive for effective cooperation with development and testing groups.

An effective SQA organization can benefit not only qual-

Conclusions

ity, but also schedules and costs. Unfortunately, during recessions, SQA

teams are among the first to be affected by layoffs and downsizing. As

the recession of 2009 stretches out, it causes uncertainty about the

future of SQA in U.S. business.

Because quality benefits costs and schedules, it is urgent for SQA

teams to take positive steps to include measures of defect removal effi-

ciency and measures of the economic value of quality as part of their

standard functions. If SQA could expand the number of formal quality

benchmarks brought in to companies, and collect data for submission

to benchmark groups, the data would benefit both companies and the

software industry.

Several nonprofit organizations are involved with SQA, such as the

American Society for Quality (ASQ). There is also a Global Association

for Software Quality (GASQ).

Local and regional software quality organizations are in many

cities. Also, for-profit SQA associations such as the Quality Assurance

Institute (QAI) hold a number of conferences and workshops, and also

offer certification exams.

SQA needs to assist in introducing a synergistic combination of defect

prevention and a multiphase suite of defect removal activities that

combine inspections, static analysis, automated testing, and manual

testing. There is no silver bullet for quality, but fusions of a variety

of quality methods can be very effective. SQA groups are the logical

place to provide information and training for these effective hybrid

methods.

Effective combinations of defect prevention and defect removal

activities are available in 2009, but seldom used except by a few very

sophisticated organizations. As mentioned in the testing section of

this chapter, what is lacking is not so much the technologies that

improve quality, but awareness of how effective the best combinations

really are. It is lack of widespread quality measurements and lack

of quality benchmarks that are delaying improvements in software

quality.

348

Chapter Five

Also valuable are predictive estimating tools that can predict both defect

potentials and the defect removal efficiency levels of any combination of

review, inspection, static analysis, automatic test stage, and manual test

stage. Normally, SQA groups will have such tools and use them frequently.

In fact, the industry's first software quality prediction tool was developed

by the IBM SQA organization in 1973 in San Jose, California.

The final conclusion is that SQA groups need to keep pushing until

the software industry can routinely top 95 percent in average defect

removal efficiency levels, and hit 99 percent for critical software applica-

tions. Any results less than these are insufficient and unprofessional.

Summary and Conclusions

Fred Brooks, one of the pioneers of software at IBM, observed in his clas-

sic book The Mythical Man Month that software was strongly affected

by organization structures. Not long after Fred published, the author

of this book, who also worked at IBM, noted that large systems tended

to be decomposed to fit existing organization structures. In particular,

some major features were artificially divided to fix standard eight-

person departments.

This book only touches the surface of organizational issues. Deeper

study is needed on the relative merits of small teams versus large teams.

In addition, the "average" span of control of eight employees reporting

to one manager may well be in need of revision. Studies of the effective-

ness of various team sizes found that raising the span of control from

8 up to 12 would allow marginal managers to return to technical work

and would minimize managerial disputes, which tend to be endemic.

Further, since software application sizes are increasing, larger spans of

control might be a better match for today's architecture.

Another major topic that needs additional study is that of really large

software teams that may include 500 or more personnel and dozens

of specialists. There is very little empirical data on the most effective

methods for dealing with such large groups with diverse skills. If such

teams are geographically dispersed, that adds yet another topic that is

in need of additional study.

More recently Dr. Victor Basili, Nachiappan Nagappan, and Brendan

Murphy studied organization structures at Microsoft and concluded

that many of the problems with Microsoft Vista could be traced back to

organizational structure issues.

However, in 2009, the literature on software organization structures

and their impact is sparse compared with other topics that influence

software engineering such as methods, tools, programming languages,

and testing.

Software Team Organization and Specialization

349

Formal organization structures tend to be territorial because manag-

ers are somewhat protective of their spheres of influence. This tends

to narrow the focus of teams. Newer forms of informal organizations

that support cross-functional communication are gaining in popularity.

Cross-functional contacts also increase the chances of innovation and

problem solving.

Software organization structures should be dynamic and change with

technology, but unfortunately, they often are a number of years behind

where they should be.

As the recession of 2009 continues, it may spur additional research

into organizational topics. For example, new subjects that need to be

examined include wiki sites, virtual departments that communicate

using virtual reality, and the effectiveness of home offices to minimize

fuel consumption.

A very important topic with almost no literature is that of dealing

with layoffs and downsizing in the least disruptive way. That topic is

discussed in Chapters 1 and 2 of this book, but few additional citations

exist. Because companies tend to get rid of the wrong people, layoffs

often damage operational efficiency levels for years afterwards.

Another important topic that needs research, given the slow develop-

ment schedules for software, would be a study of global organizations

located in separate time zones eight hours apart, which would allow

software applications and work products to be shifted around the globe

from team to team, and thus permit 24-hour development instead of

8-hour development.

A final organizational topic that needs additional study are the opti-

mum organizations that can create reusable modules and other reusable

deliverables, and then construct software applications from reusable

components rather than coding them on a line-by-line basis.

Readings and References

Brooks, Fred. The Mythical Man-Month. Reading, MA: Addison Wesley, 1995.

Charette, Bob. Software Engineering Risk Analysis and Management. New York:

McGraw-Hill, 1989.

Crosby, Philip B. Quality is Free. New York: New American Library, Mentor Books, 1979.

DeMarco, Tom. Controlling Software Projects. New York: Yourdon Press, 1982.

DeMarco, Tom. Peopleware: Productive Projects and Teams. New York: Dorset

House,1999.

Glass, Robert L. Software Creativity, Second Edition. Atlanta: *books, 2006.

Glass, R.L. Software Runaways: Lessons Learned from Massive Software Project Failures.

Englewood Cliffs, NJ: Prentice Hall, 1998.

Humphrey, Watts. Managing the Software Process. Reading, MA: Addison Wesley, 1989.

Humphrey, Watts. PSP: A Self-Improvement Process for Software Engineers. Upper

Saddle River, NJ: Addison Wesley, 2005.

Humphrey, Watts. TSP Leading a Development Team. Boston: Addison Wesley, 2006.

Humphrey, Watts. Winning with Software: An Executive Strategy. Boston: Addison

Wesley, 2002.

350

Chapter Five

Jones, Capers. Applied Software Measurement, Third Edition. New York: McGraw-Hill,

2008.

Jones, Capers. Estimating Software Costs. New York: McGraw-Hill, 2007.

Jones, Capers. Software Assessments, Benchmarks, and Best Practices. Boston: Addison

Wesley Longman, 2000.

Kan, Stephen H. Metrics and Models in Software Quality Engineering, Second Edition.

Boston: Addison Wesley Longman, 2003.

Kuhn, Thomas. The Structure of Scientific Revolutions. Chicago: University of Chicago

Press, 1996.

Nagappan, Nachiappan, B. Murphy, and V. Basili. The Influence of Organizational

Structure on Software Quality. Microsoft Technical Report MSR-TR-2008-11.

Microsoft Research, 2008.

Pressman, Roger. Software Engineering A Practitioner's Approach, Sixth Edition. New

York: McGraw-Hill, 2005.

Strassmann, Paul. The Squandered Computer. Stamford, CT: Information Economics

Press, 1997.

Weinberg, Gerald M. Becoming a Technical Leader. New York: Dorset House, 1986.

Weinberg, Gerald M. The Psychology of Computer Programming. New York: Van

Nostrand Reinhold, 1971.

Yourdon, Ed. Outsource: Competing in the Global Productivity Race. Upper Saddle River,

NJ: Prentice Hall PTR, 2005.

Yourdon, Ed. Death March The Complete Software Developer's Guide to Surviving

"Mission Impossible" Projects. Upper Saddle River, NJ: Prentice Hall PTR, 1997.

Table of Contents: