ZeePedia

Software Team Organization and Specialization:Quantifying Organizational Results

<< How Software Personnel Learn New Skills:The Evolution of Software Learning Channels
Project Management and Software Engineering:Software Progress and Problem Tracking >>
img
Chapter
5
Software Team Organization
and Specialization
Introduction
More than almost any other technical or engineering field, software devel-
opment depends upon the human mind, upon human effort, and upon
human organizations. From the day a project starts until it is retired
perhaps 30 years later, human involvement is critical to every step in
development, enhancement, maintenance, and customer support.
Software requirements are derived from human discussions of appli-
cation features. Software architecture depends upon the knowledge of
human specialists. Software design is based on human understanding
augmented by tools that handle some of the mechanical aspects, but
none of the intellectual aspects.
Software code is written line-by-line by craftspeople as custom arti-
facts and involves the highest quantity of human effort of any modern
manufactured product. (Creating sculpture and building special prod-
ucts such as 12-meter racing yachts or custom furniture require similar
amounts of manual effort by skilled artisans, but these are not main-
stream products that are widely utilized by thousands of companies.)
Although automated static analysis tools and some forms of auto-
mated testing exist, the human mind is also a primary tool for finding
bugs and security flaws. Both manual inspections and manual creation
of test plans and test cases are used for over 95 percent of software
applications, and for almost 100 percent of software applications larger
than 1,000 function points in size. Unfortunately, both quality and secu-
rity remain weak links for software.
As the economy sinks into global recession, the high costs and mar-
ginal quality and security of custom software development are going
275
276
Chapter Five
to attract increasingly critical executive attention. It may well be that
the global recession will provide a strong incentive to begin to migrate
from custom development to construction from standard reusable com-
ponents. The global recession may also provide motivation for designing
more secure software with higher quality, and for moving toward higher
levels of automation in quality control and security control.
In spite of the fact that software has the highest labor content of any
manufactured product, the topic of software team organization struc-
ture is not well covered in the software literature.
There are anecdotal reports on the value of such topics as pair-pro-
gramming, small self-organizing teams, Agile teams, colocated teams,
matrix versus hierarchical organizations, project offices, and several
others. But these reports lack quantification of results. It is hard to
find empirical data that shows side-by-side results of different kinds of
organizations for the same kinds of applications.
One of the larger collections of team-related information that is avail-
able to the general public is the set of reports and data published by the
International Software Benchmarking Standards Group (ISBSG). For
example, this organization has productivity and average application
size for teams ranging between 1 and 20 personnel. They also have data
on larger teams, with the exception that really large teams in excess of
500 people are seldom reported to any benchmark organization.
Quantifying Organizational Results
This chapter will deal with organizational issues in a somewhat unusual
fashion. As various organization structures and sizes are discussed,
information will be provided that attempts to show in quantified form
a number of important topics:
1. Typical staffing complements in terms of managers, software engi-
neers, and specialists.
2. The largest software projects that a specific organization size and
type can handle.
3. The average size of software projects a specific organization size
and type handles.
4. The average productivity rates observed with specific organization
sizes and types.
5. The average development schedules observed with specific organi-
zation sizes and types.
6. The average quality rates observed with specific organization sizes
and types.
Software Team Organization and Specialization
277
7. Demographics, or the approximate usage of various organization
structures.
8. Demographics in the sense of the kinds of specialists often deployed
under various organizational structures.
Of course, there will be some overlap among various sizes and kinds
of organization structures. The goal of the chapter is to narrow down
the ranges of uncertainty and to show what forms of organization are
best suited to software projects of various sizes and types.
Organizations in this chapter are discussed in terms of typical depart-
mental sizes, starting with one-person projects and working upward
to large, multinational, multidisciplinary teams that may have 1,000
personnel or more.
Observations of various kinds of organization structures are derived
from on-site visits to a number of organizations over a multiyear period.
Examples of some of the organizations visited by the author include
Aetna Insurance, Apple, AT&T, Boeing, Computer Aid Incorporated
(CAI), Electronic Data Systems (EDS), Exxon, Fidelity, Ford Motors,
General Electric, Hartford Insurance, IBM, Microsoft, NASA, NSA,
Sony, Texas Instruments, the U.S. Navy, and more than 100 other
organizations.
Organization structures are important aspects of successful software
projects, and a great deal more empirical study is needed on organiza-
tional topics.
The Separate Worlds of Information
Technology and Systems Software
Many medium and large companies such as banks and insurance com-
panies only have information technology (IT) organizations. While there
are organizational problems and issues within such companies, there
are larger problems and issues within companies such as Apple, Cisco,
Google, IBM, Intel, Lockheed, Microsoft, Motorola, Oracle, Raytheon,
SAP and the like, which develop systems and embedded software as
well as IT software.
Within most companies that build both IT and systems software, the
two organizations are completely different. Normally, the IT organiza-
tion reports to a chief information officer (CIO). The systems software
groups usually report to a chief technology officer (CTO).
The CIO and the CTO are usually at the same level, so neither has
authority over the other. Very seldom do these two disparate software
organizations share much in the way of training, tools, methodologies,
or even programming languages. Often they are located in different
buildings, or even in different countries.
278
Chapter Five
Because the systems software organization tends to operate as a profit
center, while the IT organizations tends to operate as a cost center, there
is often friction and even some dislike between the two groups.
The systems software group brings in revenues, but the IT organi-
zation usually does not. The friction is made worse by the fact that
compensation levels are often higher in the systems software domain
than in the IT domain.
While there are significant differences between IT and systems soft-
ware, there are also similarities. As the global recession intensifies and
companies look for ways to save money, sharing information between
IT and systems groups would seem to be advantageous.
Both sides need training in security, in quality assurance, in testing,
and in software reusability. The two sides tend to be on different business
cycles, so it is possible that the systems software side might be growing
while the IT side is downsizing, or vice versa. Coordinating position open-
ings between the two sides would be valuable in a recession.
Also valuable would be shared resources for certain skills that both
sides use. For example, there is a chronic shortage of good technical
writers, and there is no reason why technical communications could not
serve the IT organization and the systems organization concurrently.
Other groups such as testing, database administration, and quality
assurance might also serve both the systems and IT organizations.
So long as the recession is lowering sales volumes and triggering
layoffs, organizations that employ both systems software and IT groups
would find it advantageous to consider cooperation.
Both sides usually have less than optimal quality, although systems
software is usually superior to IT applications in that respect. It is pos-
sible that methods such as PSP, TSP, formal inspections, static analysis,
automated testing, and other sophisticated quality control methods could
be used by both the IT side and the systems side, which would simplify
training and also allow easier transfers of personnel from one side to
the other.
Colocation vs. Distributed Development
The software engineering literature supports a hypothesis that develop-
ment teams that are colocated in the same complex are more productive
than distributed teams of the same size located in different cities or
countries.
Indeed a study carried out by the author that dealt with large soft-
ware applications such as operating systems and telecommunication
systems noted that for each city added to the development of the same
applications, productivity declined by about 5 percent compared with
teams of identical sizes located in a single site.
Software Team Organization and Specialization
279
The same study quantified the costs of travel from city to city. For one
large telecommunications application that was developed jointly between
six cities in Europe and one city in the United States, the actual costs of
airfare and travel were higher than the costs of programming or coding.
The overall team size for this application was about 250, and no fewer
than 30 of these software engineers or specialists would be traveling from
country to country every week, and did so for more than three years.
Unfortunately, the fact that colocation is beneficial for software is an
indication that "software engineering" is a craft or art form rather than
an engineering field. For most engineered products such as aircraft,
automobiles, and cruise ships, many components and subcomponents
are built by scores of subcontractors who are widely dispersed geograph-
ically. While these manufactured parts have to be in one location for
final assembly, they do not have to be constructed in the same building
to be cost-effective.
Software engineering lacks sufficient precision in both design and
development to permit construction from parts that can be developed
remotely and then delivered for final construction. Of course, software
does involve both outsourcers and remote development teams, but the
current results indicate lower productivity than for colocated teams.
The author's study of remote development was done in the 1980s,
before the Web and the Internet made communication easy across geo-
graphic boundaries.
Today in 2009, conference calls, webinars, wiki groups, Skype, and
other high-bandwidth communication methods are readily available.
In the future, even more sophisticated communication methods will
become available.
It is possible to envision three separate development teams located
eight hours apart, so that work on large applications could be transmit-
ted from one time zone to another at the end of every shift. This would
permit 24-hour development by switching the work to three different
countries located eight hours apart. Given the sluggish multiyear devel-
opment schedules of large software applications, this form of distributed
development might cut schedules down by perhaps 60 percent compared
with a single colocated team.
For this to happen, it is obvious that software would need to be an engi-
neering discipline rather than a craft or art form, so that the separate
teams could work in concert rather than damaging each other's results.
In particular, the architecture, design, and coding practices would have
to be understood and shared by the teams at all three locations.
What might occur in the future would be a virtual development environ-
ment that was available 24 hours a day. In this environment, avatars of
the development teams could communicate "face to face" by using either
their own images or generic images. Live conversations via Skype or the
280
Chapter Five
equivalent could also be used as well as e-mail and various specialized
tools for activities such as remote design and code inspections.
In addition, suites of design tools and project planning tools would also
be available in the virtual environment so that both technical and busi-
ness discussions could take place without the need for expensive travel.
In fact, a virtual war room with every team's status, bug reports, issues,
schedules, and other project materials could be created that might even
be more effective than today's colocated organizations.
The idea is to allow three separate teams located thousands of miles
apart to operate with the same efficiency as colocated teams. It is also
desirable for quality to be even better than today. Of course, with 24-hour
development, schedules would be much shorter than they are today.
As of 2009, virtual environments are not yet at the level of sophisti-
cation needed to be effective for large system development. But as the
recession lengthens, methods that lower costs (especially travel costs)
need to be reevaluated at frequent intervals.
An even more sophisticated and effective form of software engineer-
ing involving distributed development would be that of just-in-time
software engineering practices similar to those used on the construction
of automobiles, aircraft, and large cruise ships.
In this case, there would need to be standard architectures that sup-
ported construction from reusable components. The components might
either be already in stock, or developed by specialized vendors whose
geographic locations might be anywhere on the planet.
The fundamental idea is that rather than custom design and custom
coding, standard architectures and standard designs would allow con-
struction from standard reusable components.
Of course, this idea involves many software engineering technical
topics that don't fully exist in 2009, such as parts lists, standard inter-
faces, certification protocols for quality and security, and architectural
methods that support reusable construction.
As of 2009, the development of custom-built software applications
ranges between $1,000 per function point and $3,000 per function point.
Software maintenance and enhancements range between about $100
and $500 per function point per year, forever. These high costs make
software among the most expensive business "machines" ever created.
As the recession lengthens, it is obvious that the high costs of custom
software development need to be analyzed and more cost-effective meth-
ods developed. A combination of certified reusable components that could
be assembled by teams that are geographically dispersed could, in theory,
lead to significant cost reductions and schedule reductions also.
A business goal for software engineers would be to bring software
development costs down below $100 per function point, and annual
maintenance and enhancement costs below $50 per function point.
Software Team Organization and Specialization
281
A corollary business goal might be to reduce development schedules
for 10,000­function point applications from today's averages of greater
than 36 calendar months down to 12 calendar months or less.
Defect potentials should be reduced from today's averages of greater
than 5.00 per function point down to less than 2.50 per function point.
At the same time, average levels of defect removal efficiency should
rise from today's average of less than 85 percent up to greater than 95
percent, and ideally greater than 97 percent.
Colocation cannot achieve such major reductions in costs, schedules,
and quality, but a combination of remote development, virtual develop-
ment environments, and standard reusable components might well turn
software engineering into a true engineering field, and also lower both
development and maintenance costs by significant amounts.
The Challenge of Organizing
Software Specialists
In a book that includes "software engineering" in the title, you might
suppose that the majority of the audience at which the book is aimed
are software engineers working on development of new applications.
While such software engineers are a major part of the audience, they
actually comprise less than one-third of the personnel who work on
software in large corporations.
In today's world of 2009, many companies have more personnel working
on enhancing and modifying legacy applications than on new develop-
ment. Some companies have about as many test personnel as they do
conventional software engineering personnel--sometimes even more.
Some of the other software occupations are just as important as soft-
ware engineers for leading software projects to a successful outcome.
These other key staff members work side-by-side with software engi-
neers, and major applications cannot be completed without their work.
A few examples of other important and specialized skills employed on
software projects include architects, business analysts, database admin-
istrators, test specialists, technical writers, quality assurance special-
ists, and security specialists.
As discussed in Chapter 4 and elsewhere, the topic of software spe-
cialization is difficult to study because of inconsistencies in job titles,
inconsistencies in job descriptions, and the use of abstract titles such
as "member of the technical staff " that might encompass as many as
20 different jobs and occupations.
In this chapter, we deal with an important issue. In the presence of so
many diverse skills and occupations, all of which are necessary for soft-
ware projects, what is the best way to handle organization structures?
282
Chapter Five
Should these specialists be embedded in hierarchical structures? Should
they be part of matrix software organization structures and report in to
their own chain of command while reporting via "dotted lines" to project
managers? Should they be part of small self-organizing teams?
This topic of organizing specialists is surprisingly ambiguous as of
2009 and has very little solid data based on empirical studies. A few
solid facts are known, however:
1. Quality assurance personnel need to be protected from coercion in
order to maintain a truly objective view of quality and to report
honestly on problems. Therefore, the QA organization needs to be
separate from the development organization all the way up to the
level of a senior vice president of quality.
2. Because the work of maintenance and bug repairs is rather differ-
ent from the work of new development, large corporations that have
extensive portfolios of legacy software applications should consider
using separate maintenance departments for bug repairs.
3. Some specialists such as technical writers would have little oppor-
tunity for promotion or job enrichment if embedded in departments
staffed primarily by software engineers. Therefore, a separate
technical publications organization would provide better career
opportunities.
The fundamental question for specialists is whether they should be
organized in skill-based units with others who share the same skills
and job titles, or embedded in functional departments where they will
actually exercise those skills.
The advantage of skill-based units is that they offer specialists wider
career opportunities and better educational opportunities. Also, in case
of injury or incapacity, the skill-based organizations can usually assign
someone else to take over.
The advantage of the functional organization where specialists are
embedded in larger units with many other kinds of skills is that the
specialists are immediately available for the work of the unit.
In general, if there are a great many of a certain kind of special-
ist (technical writers, testers, quality assurance, etc.), the skill-based
organizations seem advantageous. But for rare skills, there may not be
enough people in the same occupation for a skill-based group to even
be created (i.e., security, architecture, etc.).
In this chapter, we will consider various alternative methods for deal-
ing with the organization of key specialists associated with software.
There are more than 120 software-related specialties in all, and for
some of these, there may only be one or two employed even in fairly
large companies.
img
Software Team Organization and Specialization
283
This chapter concentrates on key specialties whose work is critical
to the success of large applications in large companies. Assume the
software organization in a fairly large company employs a total of 1,000
personnel. In this total of 1,000 people, how many different kinds of spe-
cialists and how many specific individuals are likely to be employed? For
that matter, what are the specialists that are most important to success?
Table 5-1 identifies a number of these important specialists and the
approximate distribution out of a total of 1,000 software personnel.
TABLE 5-1
Distribution of Software Specialists for 1,000 Total Software Staff
Number
Percent
1.
Maintenance specialists
315
31.50%
2.
Development software engineers
275
27.50%
3.
Testing specialists
125
12.50%
4.
First-line managers
120
12.00%
5.
Quality assurance specialists
25
2.50%
6.
Technical writing specialists
23
2.30%
7.
Customer support specialists
20
2.00%
8.
Configuration control specialists
15
1.50%
9.
Second-line managers
9
0.90%
10.
Business analysts
8
0.80%
11.
Scope managers
7
0.70%
12.
Administrative support
7
0.70%
13.
Project librarians
5
0.50%
14.
Project planning specialists
5
0.50%
15.
Architects
4
0.40%
16.
User interface specialists
4
0.40%
17.
Cost estimating specialists
3
0.30%
18.
Measurement/metric specialists
3
0.30%
19.
Database administration specialists
3
0.30%
20.
Nationalization specialists
3
0.30%
21.
Graphical artists
3
0.30%
22.
Performance specialists
3
0.30%
23.
Security specialists
3
0.30%
24.
Integration specialists
3
0.30%
25.
Encryption specialists
2
0.20%
26.
Reusability specialists
2
0.20%
27.
Test library control specialists
2
0.20%
28.
Risk specialists
1
0.10%
29.
Standards specialists
1
0.10%
30.
Value analysis specialists
1
0.10%
TOTAL SOFTWARE EMPLOYMENT
1000
100.00%
284
Chapter Five
As can be seen from Table 5-1, software engineers do not operate all by
themselves. A variety of other skills are needed in order to develop and
maintain software applications in the modern world. Indeed, as of 2009,
the number and kinds of software specialists are increasing, although
the recession may reduce the absolute number of software personnel if
it lengthens and stays severe.
Software Organization Structures
from Small to Large
The observed sizes of software organization structures range from a low
of one individual up to a high that consists of multidisciplinary teams
of 30 personnel or more.
For historical reasons, the "average" size of software teams tends to be
about eight personnel reporting to a manager or team leader. However,
both smaller and larger teams are quite common.
This section of Chapter 5 examines the sizes and attributes of soft-
ware organization structures from small to large, starting with one-
person projects.
One-Person Software Projects
The most common corporate purpose for one-person projects is that of
carrying out maintenance and small enhancements to legacy software
applications. For new development, building web sites is a typical one-
person activity in a corporate context.
However, a fairly large number of one-person software companies
actually develop small commercial software packages such as iPhone
applications, shareware, freeware, computer games, and other small
applications. In fact, quite a lot of innovative new software and product
ideas originate from one-person companies.
Because small software maintenance projects are
Demographics
common, on any given day, probably close to 250,000 one-person projects
are under way in the United States, with the majority being mainte-
nance and enhancements.
In terms of one-person companies that produce small applications, the
author estimates that as of 2009, there are probably more than 10,000 in
the United States. This has been a surprisingly fruitful source of inno-
vation, and is also a significant presence in the open-source, freeware,
and shareware domains.
Project size The average size of new applications done by one-person
projects is about 50 function points, and the maximum size is below 1,000
Software Team Organization and Specialization
285
function points. For maintenance or defect repair work, the average size is
less than 1 function point and seldom tops 5 function points. For enhance-
ment to legacy applications, the average size is about 5 to 10 function
points for each new feature added, and seldom tops 15 function points.
Productivity rates Productivity rates for one-person efforts are usually
quite good, and top 30 function points per staff month. One caveat is
that if the one-person development team also has to write user manuals
and provide customer support, then productivity gets cut approximately
in half.
Another caveat is that many one-person companies are home based.
Therefore unexpected events such as a bout of flu, a new baby, or some
other normal family event such as weddings and funerals can have a
significant impact on the work at hand.
A third caveat is that one-person software projects are very sensitive
to the skill and work practices of specific individuals. Controlled experi-
ments indicate about a 10-to-1 difference between the best and worst
results for tasks such as coding and bug removal. That being said, quite
a few of the people who migrate into one-person positions tend to be at
the high end of the competence and performance scale.
Development schedules for one-person maintenance and
Schedules
enhancement projects usually range between a day and a week. For new
development by one person, schedules usually range between about two
months and six months.
Quality The quality levels for one-person applications are not too bad.
Defect potentials run to about 2.5 bugs per function point, and defect
removal efficiency is about 90 percent. Therefore a small iPhone applica-
tion of 25 function points might have a total of about 60 bugs, of which
6 will still be present at release.
Specialization You might think that one-person projects would be the
domain of generalists, since it is obvious that special skills such as
testing and documentation all have to be found in the same individual.
However, one of the more surprising results of examining one-person
projects is that many of them are carried out by people who are not
software engineers or programmers at all.
For embedded and systems software, many one-person software
projects are carried out by electrical engineers, telecommunication
engineers, automotive engineers, or some other type of engineer. Even
for business software, some one-person projects may be carried out by
accountants, attorneys, business analysts, and other domain experts who
are also able to program. This is one of the reasons why such a significant
286
Chapter Five
number of inventions and new ideas flow from small companies and
one-person projects.
The major caution about one-person
Cautions and counter indications
projects for either development or maintenance is lack of backup in case
of illness or incapacity. If something should happen to that one person,
work will stop completely.
A second caution is if the person developing software is a domain
expert (i.e., accountant, business analyst, statistician, etc.) who is
building an application for personal use in a corporation, there may be
legal questions involving the ownership of the application should the
employee leave the company.
A third caution is that there may be liability issues in case the soft-
ware developed by a knowledge worker contains errors or does some
kind of damage to the company or its clients.
One-person projects are the norm and are quite effective
Conclusions
for small enhancement updates and for maintenance changes to legacy
applications.
Although one-person development projects must necessarily be rather
small, a surprising number of innovations and good ideas have origi-
nated from brilliant individual practitioners.
Pair Programming for Software
Development and Maintenance
The idea of pair-programming is for two software developers to share one
computer and take turns doing the coding, while the other member of
the team serves as an observer. The roles switch back and forth between
the two at frequent intervals, such as perhaps every 30 minutes to an
hour. The team member doing the coding is called the driver and the
other member is the navigator or observer.
As of 2009, the results of pair programming are ambiguous. Several
studies indicate fewer defects from pair programming, while others
assert that development schedules are improved as well.
However, all of the experiments were fairly small in scale and fairly
narrow in focus. For example, no known study of pair-programming defects
compared the results against an individual programmer who used static
analysis and automatic testing. Neither have studies compared top-gun
individuals against average to mediocre pairs, or vice versa.
There are also no known studies that compare the quality results of
pair programming against proven quality approaches such as formal
design and code inspections, which have almost 50 years of empirical
data available, and which also utilize the services of other people for
finding software defects.
Software Team Organization and Specialization
287
While many of the pair-programming experiments indicate shorter
development schedules, none indicate reduced development effort or costs
from having two people perform work that is normally performed by one
person.
For pair programming to lower development costs, schedules would
have to be reduced by more than 50 percent. However, experiments
and data collected to date indicate schedule reductions of only about
15 percent to 30 percent, which would have the effect of raising develop-
ment costs by more than 50 percent compared with a single individual
doing the same work.
Pair-programming enthusiasts assert that better quality will com-
pensate for higher development effort and costs, but that claim is not
supported by studies that included static analysis, automatic testing,
formal inspections, and other sophisticated defect removal methods. The
fact that two developers who use manual defect removal methods might
have lower defects than one developer using manual defect removal
methods is interesting but unconvincing.
Pair programming might be an interesting and useful method for
developing reusable components, which need to have very high quality
and reliability, but where development effort and schedules are com-
paratively unimportant. However, Watts Humphrey's Team Software
Process (TSP) is also an excellent choice for reusable components and
has far more historical data available than pair programming does.
Subjectively, the pair-programming concept seems to be enjoyable to
many who have experienced it. The social situation of having another
colleague involved with complicated algorithms and code structures is
perceived as being advantageous.
As the recession of 2009 continues to expand and layoffs become more
numerous, it is very likely that pair programming will no longer be
utilized, due to the fact that companies will be reducing software staffs
down to minimal levels and can no longer afford the extra overhead.
Most of the literature on pair programming deals with colocation in
a single office. However, remote pair-programming, where the partners
are in different cities or countries, is occasionally cited.
Pair programming is an interesting form of collaboration, and collabo-
ration is always needed for applications larger than about 100 function
points in size.
In the context of test-driven development, one interesting variation
of pair programming would be for one of the pair to write test cases and
the other to write code, and then to switch roles.
Another area where pair programming has been used successfully
is that of maintenance and bug repairs. One maintenance outsource
company has organized their maintenance teams along the lines of an
urban police station. The reason for this is that bugs come in at random
288
Chapter Five
intervals, and there is always a need to have staff available when a new
bug is reported, especially a new high-severity bug.
In the police model of maintenance, a dispatcher and several pairs of
maintenance programmers work as partners, just as police detectives
work as partners.
During defect analysis, having two team members working side by
side speeds up finding the origins of reported bugs. Having two people
work on the defect repairs as partners also speeds up the repair inter-
vals and reduces bad-fix injections. (Historically, about 7 percent of
attempts to repair a bug accidentally introduce a new bug in the fix
itself. These are called bad fixes.)
In fact, pair programming for bug repairs and maintenance activities
looks as if it may be the most effective use of pairs yet noted.
Demographics Because pair programming is an experimental approach,
the method is not widely deployed. As the recession lengthens, there may
be even less pair-programming. The author estimates that as of 2009,
perhaps 500 to 1,000 pairs are currently active in the United States.
Project size The average size of new applications done by pair-program-
ming teams is about 75 function points, and the maximum size is fewer
than 1,000 function points. For maintenance or defect repair work, the
average size is less than 1 function point. For enhancement to legacy
applications, the average size is about 5 to 10 function points for each
new feature added.
Productivity rates for pair-programming efforts
Productivity rates
are usually in the range of 16 to 20 function points per staff month or
30 percent less than the same project done by one person.
Pair-programming software projects are very sensitive to the skill
and work practices of specific individuals. As previously mentioned, con-
trolled experiments indicate about a 10-to-1 range difference between
the best and worst results for tasks such as coding and bug removal by
individual participants in such studies.
Some psychological studies of software personnel indicate a tendency
toward introversion, which may make the pair-programming concept
uncomfortable to some software engineers. The literature on pair pro-
gramming does indicate social satisfaction.
Schedules Development schedules for pair-programming maintenance
and enhancement projects usually range between a day and a week.
For new development by pairs, schedules usually range between about
two months and six months. Schedules tend to be about 10 percent to
30 percent shorter than one-person efforts for the same number of func-
tion points.
Software Team Organization and Specialization
289
Quality The quality levels for pair-programming applications are not
bad. Defect potentials run to about 2.5 bugs per function point, and
defect removal efficiency is about 93 percent. Therefore, a small iPhone
application of 25 function points might have a total of about 60 bugs, of
which 4 will still be present at release. This is perhaps 15 percent better
than individual developers using manual defect removal and testing.
However, there is no current data that compares pair programming with
individual programming efforts where automated static analysis and
automated testing are part of the equation.
There are few studies to date on the role of specialization
Specialization
in a pair-programming context. However, there are reports of interest-
ing distributions of effort. For example, one of the pair might write test
cases while the other is coding, or one might write user stories while
the other codes.
To date there are no studies of pair programming that concern teams
with notably different backgrounds working on the same application;
that is, a software engineer teamed with an electrical engineer or an
automotive engineer; a software engineer teamed with a medical doctor;
and so forth. The pairing of unlike disciplines would seem to be a topic
that might be worth experimenting with.
Cautions and counter indications The topic of pair programming needs
additional experimentation before it can become a mainstream approach,
if indeed it ever does. The experiments need to include more sophisticated
quality control, and also to compare top-gun individual programmers.
The higher costs of pair programming are not likely to gain adherents
during a strong recession.
There is scarcely enough empirical data about pair pro-
Conclusions
gramming to draw solid conclusions. Experiments and anecdotal results
are generally favorable, but the experiments to date cover only a few
variables and ignore important topics such as the role of static analysis,
automatic testing, inspections, and other quality factors. As the global
recession lengthens and deepens, pair programming may drop from
view due to layoffs and downsizing of software organizations.
Self-Organizing Agile Teams
For several years, as the Agile movement gained adherents, the concept
of small self-organizing teams also gained adherents. The concept of
self-organized teams is that rather than have team members reporting
to a manager or formal team leader, the members of the team would
migrate to roles that they felt most comfortably matched their skills.
290
Chapter Five
In a self-organizing team, every member will be a direct contribu-
tor to the final set of deliverables. In an ordinary department with a
manager, the manager is usually not a direct contributor to the code
to deliverables that reach end users. Therefore, self-organizing teams
should be slightly more efficient than ordinary departments of the same
size, because they would have one additional worker.
In U.S. businesses, ordinary departments average about eight employ-
ees per manager. The number of employees reporting to a manager is
called the span of control. (The actual observed span of control within
large companies such as IBM has ranged from a low of 2 to a high of
30 employees per manager.)
For self-organizing teams, the nominal range of size is about "7 plus or
minus 2." However, to truly match any given size of software project, team
sizes need to range from a low of two up to a maximum of about 12.
A significant historical problem with software has been that of decom-
posing applications to fit existing organization structures, rather than
decomposing the applications into logical pieces based on the funda-
mental architecture.
The practical effect has been to divide large applications into multiple
segments that can be developed by an eight-person department whether
or not that matches the architecture of the application.
In an Agile context, a user representative may be a member of the
team and provides inputs as to the features that are needed, and also
provides experiential reports based on running the pieces of the applica-
tion as they are finished. The user representative has a special role and
normally does not do any code development, although some test cases
may be created by the embedded user representative. Obviously, the
user will provide inputs in terms of user stories, use cases, and informal
descriptions of the features that are needed.
In theory, self-organizing teams are cross-functional, and everyone
contributes to every deliverable on an as-needed basis. However, it is
not particularly effective for people to depart from their main areas of
competence. Technical writers may not make good programmers. Very
few people are good technical writers. Therefore, the best results tend
to be achieved when team members follow their strengths.
However, in areas where everyone (or no one) is equally skilled, all can
participate. Creating effective test cases may be an example where skills
are somewhat sparse throughout. Dealing with security of code is an
area where so few people are skilled that if it is a serious concern, out-
side expertise will probably have to be imported to support the team.
Another aspect of self-organizing teams is the usage of daily status
meetings, which are called Scrum sessions, using a term derived from
the game of rugby. Typically, Scrum sessions are short and deal with
three key issues: (1) what has been accomplished since the last Scrum
Software Team Organization and Specialization
291
session, (2) what is planned between today and the next Scrum session,
and (3) what problems or obstacles have been encountered.
(Scrum is not the only method of meeting and sharing information.
Phone calls, e-mails, and informal face-to-face meetings occur every day.
There may also be somewhat larger meetings among multiple teams,
on an as-needed basis.)
One of the controversial roles with self-organizing teams is that of
Scrum master. Nominally, the Scrum master is a form of coordinator
for the entire project and is charged with setting expectations for work
that spans multiple team members; that is, the Scrum master is a sort
of coach. This role means that the personality and leadership qualities
of the Scrum master exert a strong influence on the overall team.
Demographics Because Agile has been on a rapid growth path for sev-
eral years, the number of small Agile teams is still increasing. As of
2009, the author estimates that in the United States alone there are
probably 35,000 small self-organizing teams that collectively employ
about 250,000 software engineers and other occupations.
Project size The average size of new applications done by self-organizing
teams with seven members is about 1,500 function points, and the
maximum size is perhaps 3,000 function points. (Beyond 3,000 func-
tion points, teams of teams would be utilized.) Self-organizing teams
are seldom used for maintenance or defect repair work, since a bug's
average size is less than 1 function point and needs only one person. For
enhancements to legacy applications, self-organizing teams might be
used for major enhancements in the 150­ to 500­function point range.
For smaller enhancements of 5 to 10 function points, individuals would
probably be used for coding, with perhaps some assistance from testers,
technical writers, and integration specialists.
Although there are methods for scaling up small teams to encom-
pass teams of teams, scaling has been a problem for self-organizing
teams. In fact, the entire Agile philosophy seems better suited to
applications below about 2,500 function points. Very few examples
of large systems greater than 10,000 function points have even been
attempted using Agile or self-organizing teams.
Productivity rates Productivity rates for self-organizing teams on proj-
ects of 1,500 function points are usually in the range of 15 function
points per staff month. They sometimes top 20 function points per staff
month for applications where the team has significant expertise and may
drop below 10 function points per staff month for unusual or complex
projects.
292
Chapter Five
Productivity rates for individual sprints are higher, but that fact is
somewhat irrelevant because the sprints do not include final integration
of all components, system test of the entire application, and the final
user documentation.
Self-organizing team projects tend to minimize the performance
ranges of individuals and may help to bring novices up to speed fairly
quickly. However, if the range of performance on a given team exceeds
about 2-to-1, those at the high end of the performance range will become
dissatisfied with the work of those at the low end of the range.
Development schedules for new development by self-
Schedules
organizing teams for typical 1,500­function point projects usually range
between about 9 months and 18 months and would average perhaps
12 calendar months for the entire application.
However, the Agile approach is to divide the entire application into a
set of segments that can be developed independently. These are called
sprints and would typically be of a size that can be completed in perhaps
one to three months. For an application of 1,500 function points, there
might be five sprints of about 300 function points each. The schedule
for each sprint might be around 2.5 calendar months.
Quality The quality levels for self-organizing teams are not bad, but
usually don't achieve the levels of methods such as Team Software
Process (TSP) where quality is a central issue. Typical defect potentials
run to about 4.5 bugs per function point, and defect removal efficiency
is about 92 percent.
Therefore, an application of 1,500 function points developed by a
self-organizing Agile team might have a total of about 6,750 bugs, of
which 540 would still be present at release. Of these, about 80 might
be serious bugs.
However, if tools such as automated static analysis and automated
testing are used, then defect removal efficiency can approach 97 percent.
In this situation, only about 200 bugs might be present at release. Of
these, perhaps 25 might be serious.
There are few studies to date on the role of specialization
Specialization
in self-organizing teams. Indeed, some enthusiasts of self-organizing
teams encourage generalists. They tend to view specialization as being
similar to working on an assembly line. However, generalists often have
gaps in their training and experience. The kinds of specialists who might
be useful would be security specialists, test specialists, quality assur-
ance specialists, database specialists, user-interface specialists, network
specialists, performance specialists, and technical writers.
Software Team Organization and Specialization
293
Cautions and counter indications The main caution about self-organizing
teams is that the lack of a standard and well-understood structure opens up
the team to the chance of power struggles and disruptive social conflicts.
A second caution is that scaling Agile up from small applications to
large systems with multiple teams in multiple locations has proven to
be complicated and difficult.
A third caution is that the poor measurement practices associated
with Agile and with many self-organizing teams give the method the
aura of a cult rather than of an engineering discipline. The failure either
to measure productivity or quality, or to report benchmarks using stan-
dard metrics is a serious deficiency.
Conclusions The literature and evidence for self-organizing Agile teams
is somewhat mixed and ambiguous. For the first five years of the Agile
expansion, self-organizing teams were garnering a majority of favorable
if subjective articles.
Since about the beginning of 2007, on the other hand, an increasing
number of articles and reports have appeared that raise questions about
self-organizing teams and that even suggest that they be abolished due
to confusion as to roles, disruptive power struggles within the teams,
and outright failures of the projects.
This is a typical pattern within the software industry. New develop-
ment methods are initially championed by charismatic individuals and
start out by gaining a significant number of positive articles and positive
books, usually without any empirical data or quantification of results.
After several years, problems begin to be noted, and increasing num-
bers of applications that use the method may fail or be unsuccessful. In
part this may be due to poor training, but the primary reason is that
almost no software development method is fully analyzed or used under
controlled conditions prior to deployment. Poor measurement practices
and a lack of benchmarks are also chronic problems that slow down
evaluation of software methods.
Unfortunately, self-organizing teams originated in the context of Agile
development. Agile has been rather poor in measuring either productiv-
ity or quality, and creates almost no effective benchmarks. When Agile
projects are measured, they tend to use special metrics such as story
points or use-case points, which are not standardized and lack empirical
collections of data and benchmarks.
Team Software Process (TSP) Teams
The concept of Team Software Process (TSP) was developed by Watts
Humphrey based on his experiences at IBM and as the originator of
the capability maturity model (CMM) for the Software Engineering
Institute (SEI).
294
Chapter Five
The TSP concept deals with the roles and responsibilities needed to
achieve successful software development. But TSP is built on individual
skills and responsibilities, so it needs to be considered in context with
the Personal Software Process (PSP). Usually, software engineers and
specialists learn PSP first, and then move to TSP afterwards.
Because of the background of Watts Humphrey with IBM and with
the capability maturity model, the TSP approach is congruent with the
modern capability maturity model integrated (CMMI) and appears to
satisfy many of the criteria for CMMI level 5, which is the top or highest
level of the CMMI structure.
Because TSP teams are self-organizing teams, they have a surface
resemblance to Agile teams, which are also self-organizing. However,
the Agile teams tend to adopt varying free-form structures based on the
skills and preferences of whoever is assigned to the team.
The TSP teams, on the other hand, are built on a solid underpinning
of specific roles and responsibilities that remain constant from project
to project. Therefore, with TSP teams, members are selected based on
specific skill criteria that have been shown to be necessary for successful
software projects. Employees who lack needed skills would probably not
become members of TSP teams, unless training were available.
Also, prior training in PSP is mandatory for TSP teams. Other kinds
of training such as estimating, inspections, and testing may also be used
as precursors.
Another interesting difference between Agile teams and TSP teams is
the starting point of the two approaches. The Agile methods were origi-
nated by practitioners whose main concerns were comparatively small
IT applications of 1,500 or fewer function points. The TSP approach was
originated by practitioners whose main concerns were large systems
software applications of 10,000 or more function points.
The difference in starting points leads to some differences in skill sets
and specialization. Because small applications use few specialists, Agile
teams are often populated by generalists who can handle design, coding,
testing, and even documentation on an as-needed basis.
Because TSP teams are often involved with large applications, they
tend to utilize specialists for topics such as configuration control, inte-
gration, testing, and the like.
While both Agile and TSP share a concern for quality, they tend to go
after quality in very different fashions. Some of the Agile methods are
based on test-driven development, or creating test cases prior to creat-
ing the code. This approach is fairly effective. However, Agile tends to
avoid formal inspections and is somewhat lax on recording defects and
measuring quality.
With TSP, formal inspections of key deliverables are an integral part, as
is formal testing. Another major difference is that TSP is very rigorous in
Software Team Organization and Specialization
295
measuring every single defect encountered from the first day of require-
ments through delivery, while defect measures during Agile projects are
somewhat sparse and usually don't occur before testing.
Both Agile and TSP may utilize automated defect tracking tools, and
both may utilize approaches such as static analysis, automated testing,
and automated test library controls.
Some other differences between Agile and TSP do not necessarily affect
the outcomes of software projects, but they do affect what is known about
those outcomes. Agile tends to be lax on measuring productivity and qual-
ity, while TSP is very rigorous in measuring task hours, earned value,
defect counts, and many other quantified facts.
Therefore, when projects are finished, Agile projects have only vague
and unconvincing data that demonstrates either productivity or qual-
ity results. TSP, on the other hand, has a significant amount of reliable
quantified data available.
TSP can be utilized with both hierarchical and matrix organization
structures, although hierarchical structures are perhaps more common.
Watts Humphrey reports that TSP is used for many different kinds of
software, including defense applications, civilian government applica-
tions, IT applications, commercial software in companies such as Oracle
and Adobe, and even by some of the computer game companies, where
TSP has proven to be useful in eliminating annoying bugs.
Demographics TSP is most widely used by large organizations that
employ between perhaps 1,000 and 50,000 total software personnel.
Because of the synergy between TSP and the CMMI, it is also widely
used by military and defense software organizations. These large organi-
zations tend to have scores of specialized skills and hundreds of projects
going on at the same time.
The author estimates that there are about 500 companies in the
United States now using TSP. While usage may be experimental in some
of these companies, usage is growing fairly rapidly due to the success
of the approach. The number of software personnel using TSP in 2009
is perhaps 125,000 in the United States.
Project size The average size of new applications done by TSP teams
with eight employees and a manager is about 2,000 function points.
However, TSP organizations can be scaled up to any arbitrary size, so
even large systems in excess of 100,000 function points can be handled
by TSP teams working in concert. For large applications with multiple
TSP teams, some specialist teams such as testing, configuration control,
and integration also support the general development teams.
Another caveat with multiple teams attempting to cooperate is that
when more than about a dozen teams are involved simultaneously,
296
Chapter Five
some kind of a project office may be needed for overall planning and
coordination.
Productivity rates for TSP departments on projects of
Productivity rates
2,000 function points are usually in the range of 14 to18 function points
per staff month. They sometimes top 22 function points per staff month
for applications where the team has significant expertise, and may drop
below 10 function points per staff month for unusual or complex proj-
ects. Productivity tends to be inversely proportional to application size
and declines as applications grow larger.
Schedules Development schedules for new development by TSP groups
with eight team members working on a 2,000­function point project
usually range between about 12 months and 20 months and would aver-
age perhaps 14 calendar months for the entire application.
Quality The quality levels for TSP organizations are exceptionally good.
Average defect potentials with TSP run to about 4.0 bugs per func-
tion point, and defect removal efficiency is about 97 percent. Delivered
defects would average about 0.12 per function point.
Therefore, an application of 2,000 function points developed by a single
TSP department might have a total of about 8,000 bugs, of which 240 would
still be present at release. Of these, about 25 might be serious bugs.
However, if in addition to pretest inspections, tools such as automated
static analysis and automated testing are used, then defect removal
efficiency can approach 99 percent. In this situation, only about 80 bugs
might be present at release. Of these, perhaps 8 might be serious bugs,
which is a rate of only 0.004 per function point.
Generally, as application sizes increase, defect potentials also increase,
while defect removal efficiency levels decline. Interestingly, with TSP,
this rule may not apply. Some of the larger TSP applications achieve
more or less the same quality as small applications.
Another surprising finding with TSP is that productivity does not
seem to degrade significantly as application size goes up. Normally,
productivity declines with application size, but Watts Humphrey reports
no significant reductions across a wide range of application sizes. This
assertion requires additional study, because that would make TSP
unique among software development methods.
TSP envisions a wide variety of specialists. Most TSP
Specialization
teams will have numerous specialists for topics such as architecture,
testing, security, database design, and many others.
Interestingly, the TSP approach does not recommend software quality
assurance (SQA) as being part of a standard TSP team. This is because
Software Team Organization and Specialization
297
of the view that the TSP team itself is so rigorous in quality control that
SQA is not needed.
In companies where SQA groups are responsible for collecting quality
data, TSP teams will provide such data as needed, but it will be collected
by the team's own personnel rather than by an SQA person or staff
assigned to the project.
The main caution about TSP organiza-
Cautions and counter indications
tions and projects is that while they measure many important topics,
they do not use standard metrics such as function points. The TSP use
of task hours is more or less unique, and it is difficult to compare task
hours against standard resource metrics.
Another caution is that few if any TSP projects have ever submit-
ted benchmark data to any of the formal software benchmark groups
such as the International Software Benchmarking Standards Group
(ISBSG). As a result, it is almost impossible to compare TSP against
other methods without doing complicated data conversion.
It is technically feasible to calculate function point totals using sev-
eral of the new high-speed function point methods. In fact, quantifying
function points for both new applications and legacy software now takes
only a few minutes. Therefore, reporting on quality and productivity
using function points would not be particularly difficult.
Converting task-hour data into normal workweek and work-month
information would be somewhat more troublesome, but no doubt the
data could be converted using algorithms or some sort of rule-based
expert system.
It would probably be advantageous for both Agile and TSP projects
to adopt high-speed function point methods and to submit benchmark
results to one or more of the benchmark organizations such as ISBSG.
Conclusions The TSP approach tends to achieve a high level of successful
applications and few if any failures. As a result, it deserves to be studied
in depth.
From observations made during litigation for projects that failed or
never operated successfully, TSP has not yet had failures that ended up in
court. This may change as the number of TSP applications grows larger.
TSP emphasizes the competence of the managers and technical staff,
and it emphasizes effective quality control and change management
control. Effective estimating and careful progress tracking also are stan-
dard attributes of TSP projects. The fact that TSP personnel are carefully
trained before starting to use the method, and that experienced mentors
are usually available, explains why TSP is seldom misused.
With Agile, for example, there may be a dozen or more variations of
how development activities are performed, but they still use the name
298
Chapter Five
"Agile" as an umbrella term. TSP activities are more carefully defined
and used, so when the author visited TSP teams in multiple companies,
the same activities carried out the same way were noted.
Because of the emphasis on quality, TSP would be a good choice as the
construction method for standard reusable components. It also seems to
be a good choice for hazardous applications where poor quality might
cause serious problems; that is, in medical systems, weapons systems,
financial applications, and the like.
Conventional Departments with Hierarchical
Organization Structures
The concept of hierarchical organizations is the oldest method for
assigning social roles and responsibilities on the planet. The etymology
of the word "hierarchy" is from the Greek, and the meaning is "rule by
priests." But the concept itself is older than Greece and was also found
in Egypt, Sumer, and most other ancient civilizations.
Many religions are organized in hierarchical fashion, as are military
organizations. Some businesses are hierarchical if they are privately
owned. Public companies with shareholders are usually semi-hierarchical,
in that the operating units report upward level-by-level to the president
or chief executive officer (CEO). The CEO, however, reports to a board
of directors elected by the shareholders, so the very top level of a public
company is not exactly a true hierarchy.
In a hierarchical organization, units of various sizes each have a formal
leader or manager who is appointed to the position by higher authorities.
While the appointing authority is often the leader of the next highest
level of organization in the structure, the actual power to appoint is usu-
ally delegated from the top of the hierarchy. Once appointed, each leader
reports to the next highest leader in the same chain of command.
While appointed leaders or managers at various levels have author-
ity to issue orders and to direct their own units, they are also required
to adhere to directives that descend from higher authorities. Progress
reports flow back up to higher authorities.
In business hierarchies, lower level managers are usually appointed
by the manager of the next highest level. But for executive positions
such as vice presidents the appointments may be made by a committee
of top executives. The purpose of this, at least in theory, is to ensure
the competence of the top executives of the hierarchy. However, the
recent turmoil in the financial sector and the expanding global reces-
sion indicates that top management tends to be a weak link in far too
many companies.
It should be noted that the actual hierarchical structure of an orga-
nization and its power structure may not be identical. For example,
Software Team Organization and Specialization
299
in Japan during the Middle Ages, the emperor was at the top of the
formal government hierarchy, but actual ruling power was vested in a
military organization headed by a commander called the shogun. Only
the emperor could appoint the shogun, but the specific appointment
was dictated by the military leadership, and the emperor had almost
no military or political power.
A longstanding issue with hierarchical organizations is that if the
leader at the top of the pyramid is weak or incompetent, the entire
structure may be at some risk of failing. For hierarchical governments,
weak leadership may lead to revolutions or loss of territory to strong
neighbors.
For hierarchical business organizations, weak leadership at the top
tends to lead to loss of market share and perhaps to failure or bank-
ruptcy. Indeed analysis of the recent business failures from Enron
through Lehmann does indicate that the top of these hierarchies did
not have the competence and insight necessary to deal with serious
problems, or even to understand what the problems were.
It is an interesting business phenomenon that the life expectancy of a
hierarchical corporation is approximately equal to the life expectancies
of human beings. Very few companies live to be 100 years old. As the
global recession lengthens and deepens, a great many companies are
likely to expire, although some will expand and grow stronger.
A hierarchical organization has two broad classes of employees. One
of these classes consists of the workers or specialists who actually do
the work of the enterprise. The second class consists of the managers
and executives to whom the workers report. Of course, managers also
report to higher-level managers.
The distinction between technical work and managerial work is so
deeply embedded in hierarchical organizations that it has created two
very distinct career paths: management and technical work.
When starting out their careers, young employees almost always
begin as technical workers. For software, this means starting out as
software engineers, programmers, systems analysts, technical writers,
and the like. After a few years of employment, workers need to make
a career choice and either get promoted into management or stay with
technical work.
The choice is usually determined by personality and personal inter-
ests. Many people like technical work and never want to get into manage-
ment. Other people enjoy planning and coordination of group activities
and opt for a management career.
There is an imbalance in the numbers of managers and technical
workers. In most companies, the managerial community totals to about
15 percent of overall employment, while the technical workers total to
about 85 percent. Since managers are not usually part of the production
300
Chapter Five
process of the company, it is important not to have an excessive number
of managers and executives. Too many managers and executives tend to
degrade operational performance. This has been noted in both business
and military organizations.
It is interesting that up to a certain point, the compensation levels
of technical workers and managers are approximately the same. For
example, in most corporations, the top technical workers can have com-
pensation that equals third-line managers. However, at the very top of
corporations, there is a huge imbalance.
The CEOs of a number of corporations and some executive vice presi-
dents have compensation packages that are worth millions of dollars. In
fact, some executive compensation packages are more than 250 times the
compensation of the average worker within the company. As the global
recession deepens, these enormous executive compensation packages are
being challenged by both shareholders and government regulators.
Another topic that is beginning to be questioned is the span of control,
or the number of technical workers who report to one manager. For his-
torical reasons that are somewhat ambiguous, the average department
in the United States has about eight technical workers reporting to one
manager. The ranges observed run from two employees per manager to
about 30 employees per manager.
Assuming an average of eight technical workers per manager, then
about 12.5 percent of total employment would be in the form of first-line
managers. When higher-level managers are included, the overall total
is about 15 percent.
From analyzing appraisal scores and examining complaints against
managers in large corporations, it appears that somewhat less than
15 percent of the human population is qualified to be effective in man-
agement. In fact, only about 10 percent (or less) seem to be qualified to
be effective in management.
That being said, it might be of interest to study raising the average
span of control from 8 workers per manager up to perhaps 12 workers
per manager. Weeding out unqualified managers and restoring them to
technical work might improve overall efficiency and reduce the social
discomfort caused by poor management.
Practicing managers state that increasing the span of control would
lower their ability to control projects and understand the actual work of
their subordinates. However, time and motion studies carried out by the
author in large corporations such as IBM found that software managers
tended to spend more time in meetings with other managers than in dis-
cussions or meetings with their own employees. In fact, a possible law of
business is "managerial meetings are inversely proportional to the span
of control." The more managers on a given project, the more time they
spend with other managers rather than with their own employees.
Software Team Organization and Specialization
301
Another and more controversial aspect of this study had to do with
project failure rates, delays, and other mishaps. For large projects with
multiple managers, the failure rates seem to correlate more closely
to the number of managers involved with the projects than with the
number of software engineers and technical workers.
While the technical workers often managed to do their jobs and get
along with their colleagues in other departments, managerial efforts tend
to be diluted by power struggles and debates with other managers.
This study needs additional research and validation. However, it led
to the conclusion that increasing the span of control and reducing mana-
gerial numbers tends to raise the odds of a successful software project
outcome. This would especially be true if the displaced managers hap-
pened to be those of marginal competence for managerial work.
In many hierarchical departments with generalists, the same people
do both development and maintenance. It should be noted that if the
same software engineers are responsible for both development and
maintenance concurrently, it will be very difficult to estimate their
development work with accuracy. This is because maintenance work
involved with fixing high-severity defects tends to preempt software
development tasks and therefore disrupts development schedules.
Another topic of significance is that when exit interviews are reviewed
for technical workers, two troubling facts are noted: (1) technical work-
ers with the highest appraisal scores tend to leave in the largest num-
bers; and (2) the most common reason cited for leaving a company is
"I don't like working for bad management."
Another interesting phenomenon about management in hierarchical
organizations is termed "the Peter Principle" and needs to be mentioned
briefly. The Peter Principle was created by Dr. Lawrence J. Peter and
Raymond Hull in the 1968 book of the same name. In essence, the Peter
Principle holds that in hierarchical organizations, workers and manag-
ers are promoted based on their competence and continue to receive
promotions until they reach a level where they are no longer competent.
As a result, a significant percentage of older employees and managers
occupy jobs for which they are not competent.
The Peter Principle may be amusing (it was first published in a
humorous book), but given the very large number of cancelled software
projects and the even larger number of schedule delays and cost over-
runs, it cannot be ignored or discounted in a software context.
Assuming that the atomic unit of a hierarchical software organization
consists of eight workers who report to one manager, what are their
titles, roles, and responsibilities?
Normally, the hierarchical mode of organization is found in compa-
nies that utilize more generalists than specialists. Because software
specialization tends to increase with company size, the implication is
302
Chapter Five
that hierarchical organizations are most widely deployed for small to
midsize companies with small technical staffs. Most often, hierarchical
organizations are found in companies that employ between about 5 and
50 software personnel.
The primary job title in a hierarchical structure would be programmer
or software engineer, and such personnel would handle both develop-
ment and maintenance work.
However, the hierarchical organization is also found in larger companies
and in companies that do have specialists. In this case, an eight-person
department might have a staffing complement of five software engineers,
two testers, and a technical writer all reporting to the same manager.
Large corporations have multiple business units such as marketing,
sales, finance, human resources, manufacturing, and perhaps research.
Using hierarchical principles, each of these might have its own software
organization dedicated to building the software used by a specific business
unit; that is, financial applications, manufacturing support applications,
and so forth.
But what happens when some kind of a corporate or enterprise appli-
cation is needed that cuts across all business units? Cross-functional
applications turned out to be difficult in traditional hierarchical or
"stovepipe" organizations.
Two alternative approaches were developed to deal with cross-
functional applications. Matrix management was one, and it will be
discussed in the next section of this chapter. The second was enter-
prise resource planning (ERP) packages, which were created by large
software vendors such as SAP and Oracle to handle cross-functional
business applications.
As discussed in the next topic, the matrix-management organization
style is often utilized for software groups with extensive specializa-
tion and a need for cross-functional applications that support multiple
business units.
In the software world, hierarchical organizations are
Demographics
found most often in small companies that employ between perhaps
5 and 50 total software personnel. These companies tend to adopt a
generalist philosophy and have few specialists other than some tech-
nical skills such as network administration and technical writing. In
a generalist context, hierarchical organizations of about five to eight
software engineers reporting to a manager handle development, testing,
and maintenance activities concurrently.
The author estimates that there are about 10,000 such small compa-
nies in the United States. The number of software personnel working
under hierarchical organization structures is perhaps 250,000 in the
United States as of 2009.
Software Team Organization and Specialization
303
Hierarchical structures are also found in some large companies, so
perhaps another 500,000 people work in hierarchical structures inside
large companies and government agencies.
The average size of new applications done by hierarchical
Project size
teams with eight employees and a manager is about 2,000 function
points. However, one of the characteristics of hierarchical organizations
is that they can cooperate on large projects, so even large systems in
excess of 100,000 function points can be handled by multiple depart-
ments working in concert.
The caveat with multiple departments attempting to cooperate is
that when more than about a dozen are involved simultaneously, some
kind of project office may be utilized for overall planning and coordina-
tion. Some of the departments involved may handle integration, testing,
configuration control, quality assurance, technical writing, and other
specialized topics.
Productivity rates for hierarchical departments on
Productivity rates
projects of 2,000 function points are usually in the range of 12 func-
tion points per staff month. They sometimes top 20 function points per
staff month for applications where the team has significant expertise,
and may drop below 10 function points per staff month for unusual
or complex projects. Productivity tends to be inversely proportional to
application size and declines as applications grow larger.
Development schedules for new development by a single
Schedules
hierarchical group with eight team members working on a 2,000­
function point project usually range between about 14 months and
24 months and would average perhaps 18 calendar months for the
entire application.
The quality levels for hierarchical departments are fairly aver-
Quality
age. Defect potentials run to about 5.0 bugs per function point, and
defect removal efficiency is about 85 percent. Delivered defects would
average about 0.75 per function point.
Therefore, an application of 2,000 function points developed by a
single hierarchical department would have a total of about 10,000 bugs,
of which 1,500 would still be present at release. Of these, about 225
might be serious bugs.
However, if pretest inspections are used, and if tools such as auto-
mated static analysis and automated testing are used, then defect
removal efficiency can approach 97 percent. In this situation, only
about 300 bugs might be present at release. Of these, perhaps 40 might
be serious.
304
Chapter Five
There are few studies to date on the role of specialization
Specialization
in hierarchical software organization structures. Because of common
gaps in the training and experience of generalists, some kinds of special-
ization are needed for large applications. The kinds of specialists that
might be useful would be security specialists, test specialists, quality
assurance specialists, database specialists, user-interface specialists,
network specialists, performance specialists, and technical writers.
Cautions and counter indications The main caution about hierarchical
organization structures is that software work tends to be artificially
divided to match the abilities of eight-person departments, rather than
segmented based on the architecture and design of the applications
themselves. As a result, some large functions in large systems are arbi-
trarily divided between two or more departments when they should be
handled by a single group.
While communication within a given department is easy and sponta-
neous, communication between departments tends to slow down due to
managers guarding their own territories. Thus, for large projects with
multiple hierarchical departments, there are high probabilities of power
struggles and disruptive social conflicts, primarily among the manage-
ment community.
Conclusions The literature on hierarchical organizations is interesting
but incomplete. Much of the literature is produced by enthusiasts for
alternate forms of organization structures such as matrix management,
Agile teams, pair programming, clean-room development, and the like.
Hierarchical organizations have been in continuous use for software
applications since the industry began. While that fact might seem to
indicate success, it is also true that the software industry has been
characterized by having higher rates of project failures, cost overruns,
and schedule overruns than any other industry. The actual impact of
hierarchical organizations on software success or software failure is still
somewhat ambiguous as of 2009.
Other factors such as methods, employee skills, and management
skills tend to be intertwined with organization structures, and this
makes it hard to identify the effect of the organization itself.
Conventional Departments with Matrix
Organization Structures
The history of matrix management is younger than the history of soft-
ware development itself. The early literature on matrix management
seemed to start around the late 1960s, when it was used within NASA
for dealing with cross-functional projects associated with complex space
programs.
Software Team Organization and Specialization
305
The idea of matrix management soon moved from NASA into the
civilian sector and was eventually picked up by software organizations
for dealing with specialization and cross-functional applications.
In a conventional hierarchical organization, software personnel of
various kinds report to managers within a given business unit. The
technical employees may be generalists, or the departments may include
various specialists too, such as software engineers, testers, and techni-
cal writers. If a particular business unit has ten software departments,
each of these departments might have a number of software engineers,
testers, technical writers, and so forth.
By contrast, in a matrix organization, various occupation groups and
specialists report to a skill or career manager. Thus all technical writers
might report to a technical publications group; all software engineers
might be in a software engineering group; all testers might be in a test
services group; and so forth.
By consolidating various kinds of knowledge workers within skill-
based organizations, greater job enrichment and more career opportu-
nities tend to occur than when specialists are isolated and fragmented
among multiple hierarchical departments.
Under a matrix organization, when specialists are needed for vari-
ous projects, they are assigned to projects and report temporarily to
the project managers for the duration of the projects. This of course
introduces the tricky concept of employees working for two managers
at the same time.
One of the managers (usually the skill manager) has appraisal and
salary authority over specialist employees, while the other (usually
the project manager) uses their services for completing the project.
The project managers may provide inputs to the skill managers about
job performance.
The manager with appraisal and salary authority over employees is
said to have solid line reporting authority. The manager who merely
borrows the specialists for specific tasks or a specific project is said to
have dotted line authority. These two terms reflect the way organization
charts are drawn.
It is an interesting phenomenon that matrix management is new
enough so that early versions of SAP, Oracle, and some other enterprise
resource planning (ERP) applications did not support dotted-line or
matrix organization structures. As of 2009, all ERP packages now sup-
port matrix organization diagrams.
The literature on matrix management circa 2009 is very strongly
polarized between enthusiasts and opponents. About half of the books
and articles regard matrix management as a major business achieve-
ment. The other half of the books and articles regard matrix manage-
ment as confusing, disruptive, and a significant business liability.
306
Chapter Five
A Google search of the phrase "failures of matrix management"
returned 315,000 citations, while a search of the phrase "successes of
matrix management" returned 327,000 citations. As can be seen, this is
a strong polarization of opinion that is almost evenly divided.
Over the years, three forms of matrix organization have surfaced
called weak matrix, strong matrix, and balanced matrix.
The original form of matrix organization has now been classified
as a weak matrix. In this form of organization, the employees report
primarily to a skill manager and are borrowed by project managers on
an as-needed basis. The project managers have no appraisal author-
ity or salary authority over the employees and therefore depend upon
voluntary cooperation to get work accomplished. If there are conflicts
between the project managers and the skill managers in terms of
resource allocations, the project managers lack the authority to acquire
the skills their projects may need.
Because weak matrix organizations proved to be troublesome, the
strong matrix variation soon appeared. In a strong matrix, the special-
ists may still report to a skill manager, but once assigned to a project,
the needs of the project take precedence. In fact, the specialists may
even be formally assigned to the project manager for the duration of
the project and receive appraisals and salary reviews.
In a balanced matrix, responsibility and authority are nominally
equally shared between the skill manager and the project manager. While
this sounds like a good idea, it has proven to be difficult to accomplish.
As a result, the strong matrix form seems to be dominant circa 2009.
Demographics In the software world, matrix organizations are found
most often in large companies that employ between perhaps 1,000 and
50,000 total software personnel. These large companies tend to have
scores of specialized skills and hundreds of projects going on at the
same time.
The author estimates that there are about 250 such large companies
in the United States with primarily matrix organization. The number
of software personnel working under matrix organization structures is
perhaps 1 million in the United States as of 2009.
Project size The average size of new applications done by matrix teams
with eight employees and a manager is about 2,000 function points.
However, matrix organizations can be scaled up to any arbitrary size, so
even large systems in excess of 100,000 function points can be handled
by multiple matrix departments working in concert.
The caveat with multiple departments attempting to cooperate is that
when more than about a dozen are involved simultaneously, some kind
of a project office may be needed for overall planning and coordination.
Software Team Organization and Specialization
307
With really large applications in excess of 25,000 function points,
some of the departments may be fully staffed by specialists who handle
topics such as integration, testing, configuration control, quality assur-
ance, technical writing, and other specialized topics.
Productivity rates Productivity rates for matrix departments on projects
of 2,000 function points are usually in the range of 10 function points
per staff month. They sometimes top 16 function points per staff month
for applications where the team has significant expertise, and may drop
below 6 function points per staff month for unusual or complex projects.
Productivity tends to be inversely proportional to application size and
declines as applications grow larger.
Schedules Development schedules for new development by a single
matrix group with eight team members working on a 2,000­function
point project usually ranges between about 16 months and 28 months and
would average perhaps 18 calendar months for the entire application.
Quality The quality levels for matrix organizations often are average.
Defect potentials run to about 5.0 bugs per function point, and defect
removal efficiency is about 85 percent. Delivered defects would average
about 0.75 per function point. Matrix and hierarchical organizations are
identical in quality, unless special methods such as formal inspections,
static analysis, automated testing, and other state-of-the-art approaches
have been introduced.
Therefore, an application of 2,000 function points developed by a
single matrix department might have a total of about 10,000 bugs, of
which 1,500 would still be present at release. Of these, about 225 might
be serious bugs.
However, if pretest inspections are used, and if tools such as automated
static analysis and automated testing are used, then defect removal effi-
ciency can approach 97 percent. In this situation, only about 300 bugs
might be present at release. Of these, perhaps 40 might be serious.
As application sizes increase, defect potentials also increase, while
defect removal efficiency levels decline.
Specialization The main purpose of the matrix organization structure is
to support specialization. That being said, there are few studies to date
on the kinds of specialization in matrix software organization structures.
As of 2009, topics such as the numbers of architects needed, the number
of testers needed, and the number of quality assurance personnel needed
for applications of various sizes remains ambiguous.
Typical kinds of specialization are usually needed for large applica-
tions. The kinds of specialists that might be useful would be security
specialists, test specialists, quality assurance specialists, database
308
Chapter Five
specialists, user-interface specialists, network specialists, perfor-
mance specialists, and technical writers.
Cautions and counter indications The main caution about matrix organi-
zation structures is that of political disputes between the skill managers
and the project managers.
Another caution, although hard to evaluate, is that roughly half of the
studies and literature about matrix organization assert that the matrix
approach is harmful rather than beneficial. The other half, however,
says the opposite and claims significant value from matrix organiza-
tions. But any approach with 50 percent negative findings needs to be
considered carefully and not adopted blindly.
A common caution for both matrix and hierarchical organizations is
that software work tends to be artificially divided to match the abilities
of eight-person departments, rather than segmented based on the archi-
tecture and design of the applications. As a result, some large functions in
large systems are arbitrarily divided between two or more departments
when they should be handled by a single group.
While technical communication within a given department is easy
and spontaneous, communication between departments tends to slow
down due to managers guarding their own territories. Thus, for large
projects with multiple hierarchical or matrix departments, there are
high probabilities of power struggles and disruptive social conflicts,
primarily among the management community.
Conclusions The literature on matrix organizations is so strongly polar-
ized that it is hard to find a consensus. With half of the literature praising
matrix organizations and the other half blaming them for failures and
disasters, it is not easy to find solid empirical data that is convincing.
From observations made during litigation for projects that failed or
never operated successfully, there seems to be little difference between
hierarchical and matrix organizations. Both matrix and hierarchical
organizations end up in court about the same number of times.
What does make a difference is the competence of the managers and
technical staff, and the emphasis on effective quality control and change
management control. Effective estimating and careful progress tracking
also make a difference, but none of these factors are directly related to
either the hierarchical or matrix organization styles.
Specialist Organizations in Large Companies
Because development software engineers are not the only or even the
largest occupation group in big companies and government agencies,
it is worthwhile to consider what kinds of organizations best serve the
needs of the most common occupation groups.
Software Team Organization and Specialization
309
In approximate numerical order by numbers of employees, the major
specialist occupations would be
1. Maintenance software engineers
2. Test personnel
3. Business analysts and systems analysts
4. Customer support personnel
5. Quality assurance personnel
6. Technical writing personnel
7. Administrative personnel
8. Configuration control personnel
9. Project office staff
Estimating specialists
Planning specialists
Measurement and metrics specialists
Scope managers
Process improvement specialists
Standards specialists
Many other kinds of personnel perform technical work such as net-
work administration, operating data centers, repair of workstations and
personal computers, and other activities that center around operations
rather than software. These occupations are important, but are outside
the scope of this book.
Following are discussion of organization structures for selected
specialist groups.
Software Maintenance Organizations
For small companies with fewer than perhaps 50 software personnel,
maintenance and development are usually carried out by the same
people, and there are no separate maintenance groups. For that matter,
some forms of customer support may also be tasked to the software
engineering community in small companies.
However, as companies grow larger, maintenance specialization tends
to occur. For companies with more than about 500 software personnel,
maintenance groups are the norm rather than the exception.
(Note: The International Software Benchmarking Standards Group
(ISBSG) has maintenance benchmark data available for more than
310
Chapter Five
400 projects and is adding new data monthly. Refer to www.ISBSG.org
for additional information.)
The issue of separating maintenance from development has both
detractors and adherents.
The detractors of separate maintenance groups state that separating
maintenance from development may require extra staff to become famil-
iar with the same applications, which might artificially increase overall
staffing. They also assert that if enhancements and defect repairs are
taking place at the same time for the same applications and are done by
two different people, the two tasks might interfere with each other.
The adherents of separate maintenance groups assert that because
bugs occur randomly and in fairly large numbers, they interfere with
development schedules. If the same person is responsible for adding a
new feature to an application and for fixing bugs, and suddenly a high-
severity bug is reported, fixing the bug will take precedence over doing
development. As a result, development schedules will slip and probably
slip so badly that the ROI of the application may turn negative.
Although both sets of arguments have some validity, the author's
observations support the view that separate maintenance organizations
are the most useful for larger companies that have significant volumes
of software to maintain.
Separate maintenance teams have higher productivity rates in find-
ing and fixing problems than do developers. Also, having separate main-
tenance change teams makes development more predictable and raises
development productivity.
Some maintenance groups also handle small enhancements as well
as defect repairs. There is no exact definition of a "small enhancement,"
but a working definition is an update that can be done by one person in
less than one week. That would limit the size of small enhancements to
about 5 or fewer function points.
Although defect repairs and enhancements are the two most common
forms of maintenance, there are actually 23 different kinds of mainte-
nance work performed by large organizations, as shown in Table 5-2.
Although the 23 maintenance topics are different in many respects,
they all have one common feature that makes a group discussion pos-
sible: they all involve modifying an existing application rather than
starting from scratch with a new application.
Each of the 23 forms of modifying existing applications has a dif-
ferent reason for being carried out. However, it often happens that
several of them take place concurrently. For example, enhancements
and defect repairs are very common in the same release of an evolving
application.
The maintenance literature has a number of classifications for main-
tenance tasks such as "adaptive," "corrective," or "perfective." These seem
img
Software Team Organization and Specialization
311
TABLE 5-2
Twenty-Three Kinds of Maintenance Work
1.
Major enhancements (new features of greater than 20 function points)
2.
Minor enhancements (new features of less than 5 function points)
3.
Maintenance (repairing defects for good will)
4.
Warranty repairs (repairing defects under formal contract)
5.
Customer support (responding to client phone calls or problem reports)
6.
Error-prone module removal (eliminating very troublesome code segments)
7.
Mandatory changes (required or statutory changes)
8.
Complexity or structural analysis (charting control flow plus complexity metrics)
9.
Code restructuring (reducing cyclomatic and essential complexity)
10.
Optimization (increasing performance or throughput)
11.
Migration (moving software from one platform to another)
12.
Conversion (changing the interface or file structure)
13.
Reverse engineering (extracting latent design information from code)
14.
Reengineering (transforming legacy applications to modern forms)
15.
Dead code removal (removing segments no longer utilized)
16.
Dormant application elimination (archiving unused software)
17.
Nationalization (modifying software for international use)
18.
Mass updates such as Euro or Year 2000 repairs
19.
Refactoring, or reprogramming applications to improve clarity
20.
Retirement (withdrawing an application from active service)
21.
Field service (sending maintenance members to client locations)
22.
Reporting bugs or defects to software vendors
23.
Installing updates received from software vendors
to be classifications that derive from academia. While there is nothing
wrong with them, they manage to miss the essential point. Maintenance
overall has only two really important economic distinctions:
1. Changes that are charged to and paid for by customers (enhance-
ments)
2. Changes that are absorbed by the company that built the software
(bug repairs)
Whether a company uses standard academic distinctions of mainte-
nance activities or the more detailed set of 23 shown here, it is important
to separate costs into the two buckets of customer-funded or self-funded
expenses.
Some companies such as Symantec charge customers for service
calls, even for reporting bugs. The author regards such charges as being
unprofessional and a cynical attempt to make money out of incompetent
quality control.
312
Chapter Five
There are also common sequences or patterns to these modification
activities. For example, reverse engineering often precedes reengineer-
ing, and the two occur so often together as to almost constitute a linked
set. For releases of large applications and major systems, the author
has observed from six to ten forms of maintenance all leading up to the
same release.
In recent years, the Information Technology Infrastructure Library
(ITIL) has had a significant impact on maintenance, customer sup-
port, and service management in general. The ITIL is a rather large
collection of more than 30 books and manuals that deal with service
management, incident reporting, change teams, reliability criteria,
service agreements, and a host of other topics. As this book is being
written in 2009, the third release of the ITIL is under way.
It is an interesting phenomenon of the software world that while
ITIL has become a major driving force in service agreements within
companies for IT service, it is almost never used by commercial vendors
such as Microsoft and Symantec for agreements with their customers.
In fact, it is quite instructive to read the small print in the end-user
license agreements (EULAs) that are always required prior to using
the software.
When these agreements are read, it is disturbing to see clauses that
assert that the vendors have no liabilities whatsoever, and that the
software is not guaranteed to operate or to have any kind of quality
levels.
The reason for these one-sided EULA agreements is that software
quality control is so bad that even major vendors would go bankrupt if
sued for the damages that their products can cause.
For many IT organizations and also for commercial software groups,
a number of functions are joined together under a larger umbrella: cus-
tomer support, maintenance (defect repairs), small enhancements (less
than 5 function points), and sometimes integration and configuration
control.
In addition, several forms of maintenance work deal with software
not developed by the company itself:
1. Maintenance of commercial applications such as those acquired
from SAP, Oracle, Microsoft, and the like. The maintenance tasks
here involve reporting bugs, installing new releases, and possibly
making custom changes for local conditions.
2. Maintenance of open-source and freeware applications such as
Firefox, Linux, Google, and the like. Here, too, the maintenance
tasks involve reporting bugs and installing new releases, plus cus-
tomization as needed.
Software Team Organization and Specialization
313
3. Maintenance of software added to corporate portfolios via mergers
or acquisitions with other companies. This is a very tricky situa-
tion that is fraught with problems and hazards. The tasks here can
be quite complex and may involve renovation, major updates, and
possibly migration from one database to another.
In addition to normal maintenance, which combines defect repairs
and enhancements, legacy applications may undergo thorough and
extensive modernization, called renovation.
Software renovation can include surgical removal of error-prone
modules, automatic or manual restructuring to reduce complexity,
revision or replacement of comments, removal of dead code segments,
and possibly even automatic conversion of the legacy application
from old or obsolete programming languages into newer program-
ming languages.
Renovation may also include data mining to extract business rules
and algorithms embedded in the code but missing from specifications
and written descriptions of the code. Static analysis and automatic test-
ing tools may also be included in renovation. Also, it is now possible to
generate function point totals for legacy applications automatically, and
this may also occur as part of renovation activities.
The observed effect of software renovation is to stretch out the useful
life of legacy applications by an additional ten years. Renovation reduces
the number of latent defects in legacy code, and therefore reduces future
maintenance costs by about 50 percent per calendar year for the applica-
tions renovated. Customer support costs are also reduced.
As the recession deepens and lengthens, software renovation will
become more and more valuable as a cost-effective alternative to retir-
ing legacy applications and redeveloping them. The savings accrued
from renovation could reduce maintenance costs so significantly
that redevelopment could occur using the savings that accrue from
renovation.
If a company does plan to renovate legacy applications, it is appro-
priate to fix some of the chronic problems that no doubt are present in
the original legacy code. The most obvious of these would be to remove
security vulnerabilities, which tend to be numerous in legacy applications.
The second would be to improve quality by using inspections, static
analysis, automated testing, and other modern techniques such as TSP
during renovations.
A combination of the Team Software Process (TSP), the Caja security
architecture from Google, and perhaps the E programming language, which
is more secure than most languages, might be considered for renovating
applications that deal with financial or valuable proprietary data.
314
Chapter Five
For predicting the staffing and effort associated with software main-
tenance, some useful rules of thumb have been developed based on
observations of maintenance groups in companies such as IBM, EDS,
Software Productivity Research, and a number of others.
Maintenance assignment scope = the amount of software that one
maintenance programmer can successfully maintain in a single calen-
dar year. The U.S. average as of 2009 is about 1,000 function points. The
range is between a low of about 350 function points and a high of about
5,500 function points. Factors that affect maintenance assignment scope
include the experience of the maintenance team, the complexity of the
code, the number of latent bugs in the code, the presence or absence of
"error-prone modules" in the code, and the available tool suites such as
static analysis tools, data mining tools, and maintenance workbenches.
This is an important metric for predicting the overall number of main-
tenance programmers needed.
(For large applications, knowledge of the internal structure is vital
for effective maintenance and modification. Therefore, major systems
usually have their own change teams. The number of maintenance pro-
grammers in such a change team can be calculated by dividing the size
of the application in function points by the appropriate maintenance
assignment scope, as shown in the previous paragraph.)
Defect repair rates = the average number of bugs or defects that
a maintenance programmer can fix in a calendar month of 22 working
days. The U.S. average is about 10 bugs repaired per calendar month.
The range is from fewer than 5 to about 17 bugs per staff month. Factors
that affect this rate include the experience of the maintenance program-
mer, the complexity of the code, and "bad-fix injections," or new bugs
accidentally injected into the code created to repair a previous bug. The
U.S. average for bad-fix injections is about 7 percent.
Renovation productivity = the average number of function points
per staff month for renovating software applications using a full suite
of renovation support tools. The U.S. average is about 65 function points
per staff month. The range is from a low of about 25 function points per
staff month for highly complex applications in obscure languages to
more than 125 function points per staff month for applications of mod-
erate complexity in fairly modern languages. Other factors that affect
this rate include the overall size of the applications, the presence or
absence of "error-prone modules" in the application, and the experience
of the renovation team.
(Manual renovation without automated support is much more dif-
ficult, and hence productivity rates are much lower--in the vicinity of
14 function points per staff month. This is somewhat higher than new
development, but still close to being marginal in terms of return on
investment.)
Software Team Organization and Specialization
315
Software does not age gracefully. Once software is put into production,
it continues to change in three important ways:
1. Latent defects still present at release must be found and fixed after
deployment.
2. Applications continue to grow and add new features at a rate of
between 5 percent and 10 percent per calendar year, due either to
changes in business needs, or to new laws and regulations, or both.
3. The combination of defect repairs and enhancements tends to
gradually degrade the structure and increase the complexity of
the application. The term for this increase in complexity over
time is called entropy. The average rate at which software entropy
increases is about 1 percent to 3 percent per calendar year.
A special problem with software maintenance is caused by the
fact that some applications use multiple programming languages.
As many as 15 different languages have been found within a single
large application.
Multiple languages are troublesome for maintenance because they
add to the learning chores of the maintenance teams. Also some (or all)
of these language may be "dead" in the sense that there are no longer
working compilers or interpreters. This situation chokes productivity
and raises the odds of bad-fix injections.
Because software defect removal and quality control are imperfect,
there will always be bugs or defects to repair in delivered software appli-
cations. The current U.S. average for defect removal efficiency is only
about 85 percent of the bugs or defects introduced during development.
This has been the average for more than 20 years.
The actual values are about 5 bugs per function point created during
development. If 85 percent of these are found before release, about
0.75 bug per function point will be released to customers.
For a typical application of 1,000 function points or 100,000 source
code statements, that implies about 750 defects present at delivery.
About one fourth, or 185 defects, will be serious enough to stop the
application from running or will create erroneous outputs.
Since defect potentials tend to rise with the overall size of the appli-
cation, and since defect removal efficiency levels tend to decline with
the overall size of the application, the overall volume of latent defects
delivered with the application rises with size. This explains why super-
large applications in the range of 100,000 function points, such as
Microsoft Windows and many enterprise resource planning (ERP)
applications, may require years to reach a point of relative stability.
These large systems are delivered with thousands of latent bugs or
defects.
316
Chapter Five
Of course, average values are far worse than best practices. A com-
bination of formal inspections, static analysis, and automated testing
can bring cumulative defect removal efficiency levels up to 99 percent.
Methods such as the Team Software Process (TSP) can lower defect
potentials down below 3.0 per function point.
Unless very sophisticated development practices are followed, the first
year of the release of a new software application will include a heavy
concentration of defect repair work and only minor enhancements.
However, after a few years, the application will probably stabilize as
most of the original defects are found and eliminated. Also after a few
years, new features will increase in number.
As a result of these trends, maintenance activities will gradually
change from the initial heavy concentration on defect repairs to a longer-
range concentration on new features and enhancements.
Not only is software deployed with a significant volume of latent
defects, but the phenomenon of bad-fix injection has been observed
for more than 50 years. Roughly 7 percent of all defect repairs will
contain a new defect that was not there before. For very complex and
poorly structured applications, these bad-fix injections have topped
20 percent.
Even more alarming, once a bad fix occurs, it is very difficult to cor-
rect the situation. Although the U.S. average for initial bad-fix injection
rates is about 7 percent, the secondary injection rate against previous
bad fixes is about 15 percent for the initial repair and 30 percent for the
second. A string of up to five consecutive bad fixes has been observed,
with each attempted repair adding new problems and failing to correct
the initial problem. Finally, the sixth repair attempt was successful.
In the 1970s, the IBM Corporation did a distribution analysis of
customer-reported defects against their main commercial software
applications. The IBM personnel involved in the study, including the
author, were surprised to find that defects were not randomly distrib-
uted through all of the modules of large applications.
In the case of IBM's main operating system, about 5 percent of the
modules contained just over 50 percent of all reported defects. The most
extreme example was a large database application, where 31 modules
out of 425 contained more than 60 percent of all customer-reported bugs.
These troublesome areas were known as error-prone modules.
Similar studies by other corporations such as AT&T and ITT found
that error-prone modules were endemic in the software domain. More
than 90 percent of applications larger than 5,000 function points were
found to contain error-prone modules in the 1980s and early 1990s.
Summaries of the error-prone module data from a number of companies
were published in the author's book Software Quality: Analysis and
Guidelines for Success.
Software Team Organization and Specialization
317
Fortunately, it is possible to surgically remove error-prone modules
once they are identified. It is also possible to prevent them from occur-
ring. A combination of defect measurements, formal design inspections,
formal code inspections, and formal testing and test-coverage analysis
have proven to be effective in preventing error-prone modules from
coming into existence.
Today, in 2009, error-prone modules are almost nonexistent in organiza-
tions that are higher than level 3 on the capability maturity model (CMM)
of the Software Engineering Institute. Other development methods such
as the Team Software Process (TSP) and Rational Unified Process (RUP)
are also effective in preventing error-prone modules. Several forms of
Agile development such as extreme programming (XP) also seem to be
effective in preventing error-prone modules from occurring.
Removal of error-prone modules is a normal aspect of renovating
legacy applications, so those software applications that have under-
gone renovation will have no error-prone modules left when the work
is complete.
However, error-prone modules remain common and troublesome for
CMMI level 1 organizations. They are also alarmingly common in legacy
applications that have not been renovated and that are maintained
without careful measurement of defects.
Once deployed, most software applications continue to grow at annual
rates of between 5 percent and 10 percent of their original functionality.
Some applications, such as Microsoft Windows, have increased in size
by several hundred percent over a ten-year period.
The combination of continuous growth of new features coupled with
continuous defect repairs tends to drive up the complexity levels of aging
software applications. Structural complexity can be measured via met-
rics such as cyclomatic and essential complexity using a number of com-
mercial tools. If complexity is measured on an annual basis and there
is no deliberate attempt to keep complexity low, the rate of increase is
between 1 percent and 3 percent per calendar year.
However, and this is important, the rate at which entropy or com-
plexity increases is directly proportional to the initial complexity of the
application. For example, if an application is released with an average
cyclomatic complexity level of less than 10, it will tend to stay well struc-
tured for at least five years of normal maintenance and enhancement
changes.
But if an application is released with an average cyclomatic com-
plexity level of more than 20, its structure will degrade rapidly, and its
complexity levels might increase by more than 2 percent per year. The
rate of entropy and complexity will even accelerate after a few years.
As it happens, both bad-fix injections and error-prone modules tend to
correlate strongly (although not perfectly) with high levels of complexity.
318
Chapter Five
A majority of error-prone modules have cyclomatic complexity levels
of 10 or higher. Bad-fix injection levels for modifying high-complexity
applications are often higher than 20 percent.
Here, too, renovation can reverse software entropy and bring cyclo-
matic complexity levels down below 10, which is the maximum safe
level of code complexity.
There are several difficulties in exploring software maintenance costs
with accuracy. One of these difficulties is the fact that maintenance
tasks are often assigned to development personnel who interleave both
development and maintenance as the need arises. This practice makes
it difficult to distinguish maintenance costs from development costs,
because the programmers are often rather careless in recording how
time is spent.
Another and very significant problem is that a great deal of software
maintenance consists of making very small changes to software appli-
cations. Quite a few bug repairs may involve fixing only a single line of
code. Adding minor new features, such as perhaps a new line-item on a
screen, may require fewer than 50 source code statements.
These small changes are below the effective lower limit for counting
function point metrics. The function point metric includes weighting
factors for complexity, and even if the complexity adjustments are set to
the lowest possible point on the scale, it is still difficult to count function
points below a level of perhaps 15 function points.
An experimental method called micro function points has been devel-
oped for small maintenance changes and bug repairs. This method is
similar to standard function points, but drops down to three decimal
places of precision and so can deal with fractions of a single function
point.
Of course, the work of making a small change measured with micro
function points may be only an hour or less. But in large companies,
where as many as 20,000 such changes are made in a year, the cumula-
tive costs are not trivial. Micro function points are intended to eliminate
the problem that small maintenance updates have not been subject to
formal economic analysis.
Quite a few maintenance tasks involve changes that are either a frac-
tion of a function point, or may at most be fewer than 5 function points
or about 250 Java source code statements. Although normal counting
of function points is not feasible for small updates, and micro function
points are still experimental, it is possible to use the backfiring method
of converting counts of logical source code statements into equivalent
function points. For example, suppose an update requires adding 100
Java statements to an existing application. Since it usually takes about
50 Java statements to encode 1 function point, it can be stated that this
small maintenance project is about 2 function points in size.
Software Team Organization and Specialization
319
Because of the combination of 23 separate kinds of maintenance work
mixed with both large and small updates, maintenance effort is harder to
estimate and harder to measure than in conventional software develop-
ment. As a result, there are many fewer maintenance benchmarks than
development benchmarks. In fact, there is much less reliable information
about maintenance than about almost any other aspect of software.
Maintenance activities are frequently outsourced to either domestic or
offshore outsource companies. For a variety of business reasons, main-
tenance outsource contracts seem to be more stable and less likely to
end up in court than software development contracts.
The success of maintenance outsource contracts is because of two
major factors:
1. Small maintenance changes do not have the huge cost and schedule
slippage rates associated with major development projects.
2. Small maintenance changes to existing software almost never fail
completely. A significant number of development projects do fail and
are never completed at all.
There may be other reasons as well, but the fact remains that main-
tenance outsource contracts seem more stable and less likely to end up
in court than development outsource contracts.
Maintenance is the dominant work of the software industry in 2009
and will probably stay the dominant activity for the indefinite future.
For software, as with many other industries, once the industry passes
50 years of age, more workers are involved with repairing existing prod-
ucts than there are workers involved with building new products.
Demographics In the software world, separate maintenance organiza-
tions are found most often in large companies that employ between
perhaps 500 and 50,000 total software personnel.
The author estimates that there are about 2,500 such large compa-
nies the United States with separate maintenance organizations. The
number of software personnel working on maintenance in maintenance
organizations is perhaps 800,000 in the United States as of 2009. (The
number of software personnel who perform both development and main-
tenance is perhaps 400,000.)
The average size of software defects is less than 1 function
Project size
point, which is why micro-function points are needed. Enhancements
or new features typically range from a low of perhaps 5 function points
to a high of perhaps 500 function points. However, there are so many
enhancements, that software applications typically grow at a rate of
around 8 percent per calendar year for as long as they are being used.
320
Chapter Five
Productivity rates for defect repairs are only about
Productivity rates
10 function points per staff month, due to the difficulty of finding the
exact problem, plus the need for regression testing and constructing
new releases. Another way of expressing defect repair productivity is to
use defects or bugs fixed per month, and a typical value would be about
10 bugs per staff month.
The productivity rates for enhancements average about 15 function
points per staff month, but vary widely due to the nature and size of the
enhancement, the experience of the team, the complexity of the code,
and the rate at which requirements change during the enhancement.
The range for enhancements can be as low as about 5 function points
per staff month, or as high as 35 function points per staff month.
Development schedules for defect repairs range from a few
Schedules
hours to a few days, with one major exception. Defects that are abeyant,
or cannot be replicated by the change teams, may take weeks to repair
because the internal version of the application used by the change team
may not have the defect. It is necessary to get a great deal more infor-
mation from users in order to isolate abeyant defects.
Fixing a bug is not the same as issuing a new release. Within some
companies such as IBM, maintenance schedules in the sense of defect
repairs vary with the severity level of the bugs reported; that is, severity
1 bugs (most serious), about 1 week; severity 2 bugs, about two weeks;
severity 3 bugs, next release; severity 4 bugs, next release or whenever
it is convenient.
Development schedules for enhancements usually run from about
1 month up to 9 months. However, many companies have fixed release
intervals that aggregate a number of enhancements and defect repairs
and release them at the same time. Microsoft "service packs" are one
example, as are the intermittent releases of Firefox. Normally, fixed
release intervals are either every six months or once a year, although
some may be quarterly.
Quality The main quality concerns for maintenance or defect repairs
are threefold: (1) higher defect potentials for maintenance and enhance-
ments than for new development, (2) the presence or absence of error-
prone modules in the application, and (3) the bad-fix injection rates for
defect repairs, which average about 7 percent.
Maintenance and enhancement defect potentials are higher than for
new development and run to about 6.0 bugs per function point. Defect
removal efficiency is usually lower than for new development and is only
about 83 percent. As a result, delivered defects would average about
1.08 per function point.
Software Team Organization and Specialization
321
An additional quality concern that grows slowly worse over a period
of years is that application complexity (as measured by cyclomatic com-
plexity) slowly increases because changes tend to degrade the original
structure. As a result, each year, defect potentials may be slightly higher
than the year before, while bad-fix injections may increase. Unless the
application is renovated, these problems tend to become so bad that
eventually the application can no longer be safely modified.
In addition to renovation, other approaches such as formal inspections
for major enhancements and significant defect repairs, static analysis,
and automatic testing can raise defect removal efficiency levels above
95 percent. However, bad-fix injections and error-prone modules are
still troublesome.
The main purpose of the maintenance organization
Specialization
structures is to support maintenance specialization. While not every-
one enjoys maintenance, it happens that quite a few programmers and
software engineers do enjoy it.
Other specialist work in a maintenance organization includes inte-
gration and configuration control. Maintenance software engineers
normally do most of the testing on small updates and small enhance-
ments, although formal test organizations may do some specialized
testing such as system testing prior to a major release.
Curiously, software quality assurance (SQA) is seldom involved
with defect repairs and minor enhancements carried out by main-
tenance groups. However, SQA specialists usually do work on major
enhancements.
Technical writers don't have a major role in software maintenance,
but may occasionally be involved if enhancements trigger changes in
user manuals or HELP text.
That being said, few studies to date deal with either personality or
technical differences between successful maintenance programmers and
successful development programmers.
The main caution about maintenance
Cautions and counter indications
specialization and maintenance organizations is that they tend to lock
personnel into narrow careers, sometimes limited to repairing a single
application for a period of years. There is little chance of career growth
or knowledge expansion if a software engineer spends years fixing bugs
in a single software application. Occasionally, switching back and forth
from maintenance to development is a good practice for minimizing
occupational boredom.
The literature on maintenance organizations is very
Conclusions
sparse compared with the literature on development. Although there are
322
Chapter Five
some good books, there are few long-range studies that show application
growth, entropy increase, and defect trends over multiple years.
Given that software maintenance is the dominant activity of the
software industry in 2009, a great deal more research and study are
indicated. Research is needed on data mining of legacy applications to
extract business rules; on removing security vulnerabilities from legacy
code; on the costs and value of software renovation; and on the applica-
tion of quality control methods such as inspections, static analysis, and
automated testing to legacy code.
Customer Support Organizations
In small companies with few software applications and few custom-
ers or application users, support may be carried out on an informal
basis by the development team itself. However, as numbers of customers
increase and numbers of applications needing support increase, a point
will soon be reached where a formal customer support organization will
be needed.
Informal rules of thumb for customer support indicate that customer
support staffing is dependent on three variables:
1. Number of customers
2. Number of latent bugs or defects in released software
3. Application size measured in terms of function points or lines
of code
One full-time customer support person would probably be needed for
applications that meet these criteria: 150 customers, 500 latent bugs in
the software (75 serious bugs), and 10,000 function points or 500,000
source code statements in a language such as Java.
The most effective known method for improving customer support is to
achieve much better application quality levels than are typical today in
2009. Every reduction of about 220 latent defects at delivery can reduce
customer support staffing needs by one person. This is based on the
assumption that customer support personnel speak to about 30 custom-
ers per day, and each released defect is encountered by 30 customers.
Therefore, each released defect occupies one day for one customer sup-
port staff member, and there are 220 working days per year.
Some companies attempt to reduce customer support costs by charg-
ing for support calls, even to report bugs in the applications! This is
an extremely bad business practice that primarily offends customers
without benefiting the companies. Every customer faced with a charge
for customer support is an unhappy customer who is actively in search
of a more sensible competitive product.
Software Team Organization and Specialization
323
Also, since software is routinely delivered with hundreds of serious
bugs, and since customer reports of those bugs are valuable to soft-
ware vendors, charging for customer support is essentially cutting off
a valuable resource that can be used to lower maintenance costs. Few
companies that charge for support have many happy customers, and
many are losing market shares.
Unfortunately, customer support organizations are among the most
difficult of any kind of software organization to staff and organize well.
There are several reasons for this. The first is that unless a company
charges for customer support (not a recommended practice), the costs
can be high. The second is that customer-support work tends to have
limited career opportunities, and this makes it difficult to attract and
keep personnel.
As a result, customer support was one of the first business activities
to be outsourced to low-cost offshore providers. Because customer sup-
port is labor intensive, it was also among the first business activities
to attempt to automate at least some responses. To minimize the time
required for discussions with live support personnel, there are a variety
of frequently asked questions (FAQ) and other topics that users can
access by phone or e-mail prior to speaking with a real person.
Unfortunately, these automated techniques are often frustrating to
users because they require minutes of time dealing with sometimes
arcane voice messages before reaching a real person. Even worse,
these automated voice messages are almost useless for the hard of
hearing.
That being said, companies in the customer support business have
made some interesting technical innovations with voice response sys-
tems and also have developed some fairly sophisticated help-desk pack-
ages that keep track of callers or e-mails, identify bugs or defects that
have been previously reported, and assist with other administrative
functions.
Because calls and e-mail from customers contain a lot of potentially
valuable information about deep bugs and security flaws, prudent com-
panies want to capture this information for analysis and to use it as part
of their quality and security improvement programs.
At a sociological level, an organization called the Service and Support
Professionals Association (SSPA) not only provides useful information
for support personnel, but also evaluates the customer support of vari-
ous companies and issues awards and citations for excellence. The SSPA
group also has conferences and events dealing with customer support.
(The SSPA web site is www.thesspa.com.)
SSPA has an arrangement with the well-known J.D. Power and
Associates to evaluate customer service in order to motivate companies
324
Chapter Five
by issuing various awards. As an example, the SSPA web site mentions
the following recent awards as of 2009:
ProQuest Business Solutions--Most improved
IBM Rochester--Sustained excellence for three consecutive years
Oracle Corporation--Innovative support
Dell--Mission critical support
RSA Security--Best support for complex systems
For in-house support as opposed to commercial companies that sell
software, the massive compendium of information contained in the
Information Technology Infrastructure Library (ITIL) spells out great
topics such as Help-Desk response time targets, service agreements,
incident management, and hundreds of other items of information.
Software customer support is organized in a multitier arrangement
that uses automated and FAQ as the initial level, and then brings in
more expertise at other levels. An example of such a multitier arrange-
ment might resemble the following:
Level 0--Automated voice messages, FAQ, and pointers to available
downloads
Level 1--Personnel who know basics of the application and common
bugs
Level 2--Experts in selected topics
Level 3--Development personnel or top-gun experts
The idea behind the multilevel approach is to minimize the time
requirements of developers and experts, while providing as much useful
information as possible in what is hopefully an efficient manner.
As mentioned in a number of places in this book, the majority of customer
service calls and e-mails are due to poor quality and excessive numbers
of bugs. Therefore, more sophisticated development approaches such as
using Team Software Process (TSP), formal inspections, static analysis,
automated testing, and the like will not only reduce development costs and
schedules, but will also reduce maintenance and customer support costs.
It is interesting to consider how one of the J.D. Power award recipi-
ents, IBM Rochester, goes about customer support:
"There is a strong focus on support responsiveness, in terms of both time
to response as well as the ability to provide solutions. When customers
call in, there is a target that within a certain amount of time (a minute or
a couple of minutes), the call must be answered. IBM does not want long
hold times where customers spend >10 minutes just waiting for the phone
to be answered.
Software Team Organization and Specialization
325
When problems/defects are reported, the formal fix may take some time.
Before the formal fix is available, the team will provide a temporary solu-
tion in as soon as possible, and a key metric used is "time to first relief."
The first-relief temporary repairs may take less than 24 hours for some
new problems, and even less if the problem is already known.
When formal fixes are provided, a key metric used by IBM Rochester is
the quality of the fixes: percent of defective fixes. The Rochester's defec-
tive fix rate is the lowest among the major platforms in IBM. (Since the
industry average for bad-fix injection is about 7%, it is commendable that
IBM addresses this issue.)
The IBM Rochester support center also conducts a "trailer survey." This is
a survey of customer satisfaction about the service or fix. These surveys
are based on samples of problem records that are closed. IBM Rochester's
trailer survey satisfaction is in the high 90s in terms of percentages of
satisfied customers.
Another IBM Rochester factor could be called the "cultural factor." IBM as
a corporation and Rochester as a lab both have a long tradition of focus on
quality (e.g., winning the Malcolm Baldrige quality award). Because cus-
tomer satisfaction correlates directly with quality, the IBM Rochester prod-
ucts have long had a reputation for excellence (IBM system /34, system/36,
system /38, AS/400, system i, etc.). IBM and Rochester employees are proud
of the quality that they deliver for both products and services."
For major customer problems, teams (support, development, test, etc.)
work together to come up with solutions. Customer feedback has long
been favorable for IBM Rochester, which explains their multiyear award
for customer support excellence. Often when surveyed customers men-
tion explicitly and favorably the amount of support and problem solving
that they receive from the IBM Rochester site.
In the software world, in-house customer support staffed
Demographics
by actual employees is in rapid decline due to the recession. Probably a
few hundred large companies still provide such support, but as layoffs
and downsizing continue to escalate, their numbers will be reduced.
However, for small companies that have never employed full-time
customer support personnel, no doubt the software engineers will still
continue to field customer calls and respond to e-mails. There are prob-
ably 10,000 or more U.S. organizations with between 1 and 50 employees
where customer support tasks are performed informally by software
engineers or programmers.
For commercial software organizations, outsourcing of customer sup-
port to specialized support companies is now the norm. While some of
these support companies are domestic, there are also dozens of customer
326
Chapter Five
support organizations in other countries with lower labor costs than
the United States or Europe. However, as the recession continues, labor
costs will decline in the United States, which now has large pools of
unemployed software technical personnel. Customer support, mainte-
nance, and other labor-intensive tasks may well start to move back to
the United States.
The average size of applications where formal customer
Project size
support is close to being mandatory is about 10,000 function points. Of
course, for any size application, customers will have questions and need
to report bugs. But applications in the 10,000­function point range usu-
ally have many customers. In addition, these large systems always are
released with thousand of latent bugs.
Productivity rates Productivity rates for customer support are not mea-
sured using function points, but rather numbers of customers assisted.
Typically, one tier-1 customer support person on a telephone support
desk can talk to about 30 people per day, which translates into each call
taking about 16 minutes.
For tier 2 and tier 3 customer support, where experts are used, the
work of talking to customers is probably not full time. However, for
problems serious enough to reach tier 2, expect each call to take about
70 minutes. For problems that reach tier 3, there will no doubt be mul-
tiple calls back and forth and probably some internal research. Expect
tier 3 calls to take about 240 minutes.
If a customer is reporting a new bug that has not been identified or fixed,
then days or even weeks may be required. (The author worked as an expert
witness in a lawsuit where the time required to fix one bug in a financial
application was more than nine calendar months. In the course of fixing
this bug, the first four attempts each took about two months. They not only
failed to fix the original bug, but added new bugs in each fix.)
Schedules The primary schedule issue for customer support is the wait
or hold time before speaking to a live support person. Today, in 2009,
reaching a live person can take between 10 minutes and more than 60
minutes of hold time. Needless to say, this is very frustrating to clients.
Improving quality should also reduce wait times. Assuming constant
support staffing, every reduction of ten severity 1 or 2 defects released
in software should reduce wait times by about 30 seconds.
Quality Customer support calls are directly proportional to the number
of released defects or bugs in software. It is theoretically possible that
releasing software with zero defects might reduce the number of cus-
tomer support calls to zero, too. In today's world, where defect removal
Software Team Organization and Specialization
327
efficiency only averages 85 percent and hundreds or thousands of seri-
ous bugs are routinely still present when software is released, there will
be hundreds of customer support calls and e-mails.
It is interesting that some open-source and freeware applications such
as Linux, Firefox, and Avira seem to have better quality levels than equiv-
alent applications released by established vendors such as Microsoft and
Symantec. In part this may be due to the skills of the developers, and in
part it may be due to routinely using tools such as static analysis prior
to release.
Specialization The role of tier-1 customer support is very specialized.
Effective customer support requires a good personality when dealing
with crabby customers plus fairly sophisticated technical skills. Of these
two, the criterion for technical skill is easier to fill then the criterion for
a good personality when dealing with angry or outraged customers. That
being said, there are few studies to date that deal with either personal-
ity or technical skills in support organizations.
In addition to customer support provided by vendors of software, some
user associations and nonprofit groups provide customer support on
a volunteer basis. Many freeware and open-source applications have
user groups that can answer technical questions. Even for commercial
software, it is sometimes easier to get an informed response to a ques-
tion from an expert user than it is from the company that built the
software.
The main caution about customer sup-
Cautions and counter indications
port work is that it tends to lock personnel into narrow careers, some-
times limited to discussing a single application such as Oracle or SAP
for a period of years. There is little chance of career growth or knowledge
expansion.
Another caution is that improving customer support via automation
and expert systems is technically feasible, but many existing patents
cover such topics. As a result, attempts to develop improved customer
support automation may require licensing of intellectual property.
The literature on customer support is dominated by
Conclusions
two very different forms of information. The Information Technology
Infrastructure Library (ITIL) contains more than 30 volumes and more
than 5,000 pages of information on every aspect of customer support.
However, the ITIL library is aimed primarily at in-house customer sup-
port and is not used very much by commercial software vendors.
For commercial software customer support, some trade books are
available, but the literature tends to be dominated by white papers
and monographs published by customer support outsource companies.
328
Chapter Five
Although these tend to be marketing texts, some of them do provide
useful information about the mechanics of customer support. There
are also interesting reports available from companies that provide cus-
tomer-support automation, which is both plentiful and seems to cover
a wide range of features.
Given the fact that customer support is a critical activity of the
software industry in 2009, a great deal more research and study are
indicated. Research is needed on the relationship between quality and
customer support, on the role of user associations and volunteer groups,
and on the potential automation that might improve customer support.
In particular, research is needed on providing customer support for deaf
and hard-of-hearing customers, blind customers, and those with other
physical challenges.
Software Test Organizations
There are ten problems with discussing software test organizations that
need to be highlighted:
1. There are more than 15 different kinds of software testing.
2. Many kinds of testing can be performed either by developers, by
in-house test organizations, by outsource test organizations, or by
quality assurance teams based on company test strategies.
3. With Agile teams and with hierarchical organizations, testers will
probably be embedded with developers and not have separate
departments.
4. Matrix organizations testers would probably be in a separate test-
ing organization reporting to a skill manager, but assigned to spe-
cific projects as needed.
5. Some test organizations are part of quality assurance organizations
and therefore have several kinds of specialists besides testing.
6. Some quality assurance organizations collect data on test results,
but do no testing of their own.
7. Some testing organizations are called "quality assurance" and per-
form only testing. These may not perform other QA activities such
as moderating inspections, measuring quality, predicting quality,
teaching quality, and so on.
8. For any given software application, the number of separate kinds
of testing steps ranges from a low of 1 form of testing to a high of
17 forms of testing based on company test strategies.
Software Team Organization and Specialization
329
9. For any given software application, the number of test and/or qual-
ity assurance organizations that are part of its test strategy can
range from a low of one to a high of five, based on company quality
strategies.
10. For any given defect removal activity, including testing, as many as
11 different kinds of specialists may take part.
As can perhaps be surmised from the ten points just highlighted,
there is no standard way of testing software applications in 2009. Not
only is there no standard way of testing, but there are no standard
measures of test coverage or defect removal efficiency, although both
are technically straightforward measurements.
The most widely used form of test measurement is that of test cover-
age, which shows the amount of code actually executed by test cases.
Test coverage measures are fully automated and therefore easy to do.
This is a useful metric, but much more useful would be to measure
defect removal efficiency as well.
Defect removal efficiency is more complicated and not fully auto-
mated. To measure the defect removal efficiency of a specific test stage
such as unit test, all defects found by the test are recorded. After unit
test is finished, all other defects found by all other tests are recorded,
as are defects found by customers in the first 90 days. When all defects
have been totaled, then removal efficiency can be calculated.
Assume unit test found 100 defects, function test and later test stages
found 200 defects, and customers reported 100 defects in the first
90 days of use. The total number of defects found was 400. Since unit
test found 100 out of 400 defects, in this example, its efficiency is 25
percent, which is actually not far from the 30 percent average value of
defect removal efficiency for unit test.
(A quicker but less reliable method for determining defect removal
efficiency is that of defect seeding. For example, if 100 known bugs were
seeded into the software discussed in the previous paragraph and 25
were found, then the defect removal efficiency level of 25 percent could
be calculated immediately. However, there is no guarantee that the
"tame" bugs that were seeded would be found at exactly the same rate
as "wild" bugs that are made by accident.)
It is an unfortunate fact that most forms of testing are not very effi-
cient and find only about 25 percent to 40 percent of the bugs that are
actually present, although the range is from less than 20 percent to
more than 70 percent.
It is interesting that there is much debate over black box testing,
which lacks information on internals; white box testing, with full vis-
ibility of internal code; and gray box testing, with visibility of internals,
but testing is at the external level.
330
Chapter Five
So far as can be determined, the debate is theoretical, and few experi-
ments have been performed to measure the defect removal efficiency
levels of black, white, or gray box testing. When measures of efficiency
are taken, white box testing seems to have higher levels of defect
removal efficiency than black box testing.
Because many individual test stages such as unit test are so low
in efficiency, it can be seen why several different kinds of testing are
needed. The term cumulative defect removal efficiency refers to the
overall efficiency of an entire sequence of tests or defect removal
operations.
As a result of lack of testing standards and lack of widespread test-
ing effectiveness measurements, testing by itself does not seem to be a
particularly cost-effective approach for achieving high levels of quality.
Companies that depend purely upon testing for defect removal almost
never top 90 percent in cumulative defect removal, and often are below
75 percent.
The newer forms of testing such as test-driven development (TDD)
use test cases as a form of specification and create the test cases first,
before the code itself is created. As a result, the defect removal efficiency
of TDD is higher than many forms of testing and can top 85 percent.
However, even with TDD, bad-fix injection needs to be factored into the
equation. About 7 percent of attempts to fix bugs accidentally include
new bugs in the fixes.
If TDD is combined with other approaches such as formal inspection
of the test cases and static analysis of the code, then defect removal
efficiency can top 95 percent.
There is some ambiguity in the data that deals with automatic testing
versus manual testing. In theory, automatic testing should have higher
defect removal efficiency than manual testing in at least 70 percent
of trials. For example, manual unit testing averages about 30 percent
in terms of defect removal efficiency, while automatic testing may top
50 percent. However, testing skills vary widely among software engi-
neers and programmers, and automatic testing also varies widely. More
study of this topic is indicated.
The poor defect removal efficiency of normal testing brings up an
important question: If testing is not very effective in finding and remov-
ing bugs, what is effective? This is an important question, and it is
also a question that should be answered in a book entitled Software
Engineering Best Practices.
The answer to the question of "What is effective in achieving high
levels of quality?" is that a combination of defect prevention and mul-
tiple forms of defect removal is needed for optimum effectiveness.
Defect prevention refers to methods and techniques that can lower
defect potentials from U.S. averages of about 5.0 per function point.
Software Team Organization and Specialization
331
Examples of methods that have demonstrated effectiveness in terms
of defect prevention include the higher levels of the capability matu-
rity model integration (CMMI), joint application design (JAD), qual-
ity function deployment (QFD), root-cause analysis, Six Sigma for
software, the Team Software Process (TSP), and also the Personal
Software Process (PSP).
For small applications, the Agile method of having an embedded user
as part of the team can also reduce defect potentials. (The caveat with
embedded users is that for applications with more than about 50 users,
one person cannot speak for the entire set of users. For applications with
thousands of users, having a single embedded user is not adequate. In such
cases, focus groups and surveys of many users are necessary.)
As it happens, formal inspections of requirements, design, and code
serve double duty and are very effective in terms of defect prevention as
well as being very effective in terms of defect removal. This is because
participants in formal inspections spontaneously avoid making the
same mistakes that are found during the inspections.
The combination of methods that have been demonstrated to raise
defect removal efficiency levels includes formal inspections of require-
ments, design, code, and test materials; static analysis of code prior to
testing; and then a test sequence that includes at least eight forms of
testing: (1) unit test, (2) new function test, (3) regression test, (4) per-
formance test, (5) security test, (6) usability test, (7) system test, and
(8) some form of external test with customers or clients, such as beta
test or acceptance test.
Such a combination of pretest inspections, static analysis, and at least
eight discrete test stages will usually approach 99 percent in terms of
cumulative defect removal efficiency levels. Not only does this combination
raise defect removal efficiency levels, but it is also very cost-effective.
Projects that top 95 percent in defect removal efficiency levels usually
have shorter development schedules and lower costs than projects that
skimp on quality. And, of course, they have much lower maintenance
and customer support costs, too.
Testing is a teachable skill, and there are a number of for-profit and
nonprofit organizations that offer seminars, classes, and several flavors
of certification for test personnel. While there is some evidence that
certified test personnel do end up with higher levels of defect removal
efficiency than uncertified test personnel, the poor measurement and
benchmark practices of the software industry make that claim some-
what anecdotal. It would be helpful if test certification included a
learning segment on how to measure defect removal efficiency.
Following in Table 5-3 are examples of a number of different forms
of software inspection, static analysis, and testing, with the probable
organization that performs each activity indicated.
img
332
Chapter Five
TABLE 5-3
Forms of Software Defect Removal Activities
Pretest Removal Inspections
Performed by
1.
Requirements
Analysts
2.
Design
Designers
3.
Code
Programmers
4.
Test plans
Testers
5.
Test cases
Testers
6.
Static analysis
Programmers
General Testing
7.
Subroutine test
Programmers
8.
Unit test
Programmers
9.
New function test
Testers or programmers
10.
Regression test
Testers or programmers
11.
System test
Testers or programmers
Special Testing
12.
Performance testing
Performance specialists
13.
Security testing
Security specialists
14.
Usability testing
Human factors specialists
15.
Component testing
Testers
16.
Integration testing
Testers
17.
Nationalization testing
Foreign language experts
18.
Platform testing
Platform specialists
19.
SQA validation testing
Software quality assurance
20.
Lab testing
Hardware specialists
External Testing
21.
Independent testing
External test company
22.
Beta testing
Customers
23.
Acceptance testing
Customers
Special Activities
24.
Audits
Auditors, SQA
25.
Independent verification and validation (IV&V)
IV&V contractors
26.
Ethical hacking
Hacking consultants
Table 5-3 shows 26 different kinds of defect removal activity carried
out by a total of 11 different kinds of internal specialists, 3 specialists
from outside companies, and also by customers. However, only very large
and sophisticated high-technology companies would have such a rich
mixture of specialization and would utilize so many different kinds of
defect removal.
Smaller companies would either have the testing carried out by software
engineers or programmers (who often are not well trained), or they would
img
Software Team Organization and Specialization
333
have a testing group staffed primarily by testing specialists. Testing can
also be outsourced, although as of 2009, this activity is not common.
At this point, it is useful to address three topics that are not well
covered in the testing literature:
1. How many testers are needed for various kinds of testing?
2. How many test cases are needed for various kinds of testing?
3. What is the defect removal efficiency of various kinds of testing?
Table 5-4 shows the approximate staffing levels for the 17 forms of
testing that were illustrated in Table 5-3. Note that this information is
only approximate, and there are wide ranges for each form of testing.
Because testing executes source code, the information in Table 5-4
is based on source code counts rather than on function points. With
more than 700 programming languages ranging from assembly through
TABLE 5-4
Test Staffing for Selected Test Stages
Application language
Java
Application code size
50,000
Application KLOC
50
Function points
1,000
General Testing
Assignment Scope
Test Staff
1.
Subroutine test
10,000
5.00
2.
Unit test
10,000
5.00
3.
New function test
25,000
2.00
4.
Regression test
25,000
2.00
5.
System test
50,000
1.00
Special Testing
6.
Performance testing
50,000
1.00
7.
Security testing
50,000
1.00
8.
Usability testing
25,000
2.00
9.
Component testing
25,000
2.00
10.
Integration testing
50,000
1.00
11.
Nationalization testing
1,50,000
0.33
12.
Platform testing
50,000
1.00
13.
SQA validation testing
75,000
0.67
14.
Lab testing
50,000
1.00
External Testing
15.
Independent testing
7,500
6.67
16.
Beta testing
25,000
2.00
17.
Acceptance testing
25,000
2.00
334
Chapter Five
modern languages such as Ruby and E, the same application illustrated
in Table 5-4 might vary by more than 500 percent in terms of source
code size. Java is the language used in Table 5-4 because it is one of the
most common languages in 2009.
The column labeled "Assignment Scope" illustrates the amount of
source code that one tester will probably be responsible for testing.
Note that there are very wide ranges in assignment scopes based on the
experience levels of test personnel, on the cyclomatic complexity of the
code, and to a certain extent, on the specific language or combination of
languages in the application being tested.
Because the testing shown in Table 5-4 involves a number of differ-
ent people with different skills who probably would be from different
departments, the staffing breakdown for all 17 tests would include
5 developers through unit test; 2 test specialists for integration and
system test; 3 specialists for security, nationalization, and usability
test; 1 SQA specialist; 7 outside specialists from other companies; and
2 customers: 20 people in all.
Of course, it is unlikely that any small application of 1,000 function
points or 50 KLOC (thousands of lines of code) would use (or need) all
17 of these forms of testing. The most probable sequence for a 50-KLOC
Java application would be 6 kinds of testing performed by 5 developers,
2 test specialists, and 2 users, for a total of 9 test personnel in all.
In Table 5-5, data from the previous tables is used as the base for
staffing, but the purpose of Table 5-5 is to show the approximate num-
bers of test cases produced for each test stage, and then the total number
of test cases for the entire application. Here, too, there are major varia-
tions, so the data is only approximate.
The code defect potential for the 50 KLOC code sample of the Java
application would be about 1,500 total bugs, which is equal to 1.5 code
bugs per function point, or 30 bugs per KLOC. (Note that earlier bugs
in requirements and design are excluded and assumed to have been
removed before testing begins.)
If all 17 of the test stages were used, they would probably detect about 95
percent of the total bugs present, or 1,425 in all. That would leave 75 bugs
latent when the application is delivered. Assuming both the numbers for
potential defects and the numbers for test cases are reasonably accurate
(a questionable assumption) then it takes an average of 1.98 test cases to
find 1 bug.
Of course, since only about 6 out of the 17 test stages are usually per-
formed, the removal efficiency would probably be closer to 75 percent, which
is why additional nontest methods such as inspections and static analysis
are needed to achieve really high levels of defect removal efficiency.
If even this small 50-KLOC example uses more than 2,800 test cases,
it is obvious that corporations with hundreds of software applications
img
Software Team Organization and Specialization
335
TABLE 5-5
Test Cases for Selected Test Stages
Application language
Java
Application code size
50,000
Application KLOC
50
Function points
1,000
Test Cases
Total Test
Test Cases
Test Staff
Per KLOC
Cases
Per Person
General Testing
1.
Subroutine test
5.00
12.00
600
120.00
2.
Unit test
5.00
10.00
500
100.00
3.
New function test
2.00
5.00
250
125.00
4.
Regression test
2.00
4.00
200
100.00
5.
System test
1.00
3.00
150
150.00
Special Testing
6.
Performance testing
1.00
1.00
50
50.00
7.
Security testing
1.00
3.00
150
150.00
8.
Usability testing
2.00
3.00
150
75.00
9.
Component testing
2.00
1.50
75
37.50
10.
Integration testing
1.00
1.50
75
75.00
11.
Nationalization testing
0.33
0.50
25
75.76
12.
Platform testing
1.00
2.00
100
100.00
13.
SQA validation testing
0.67
1.00
50
74.63
14.
Lab testing
1.00
1.00
50
50.00
External Testing
15.
Independent testing
6.67
4.00
200
29.99
16.
Beta testing
2.00
2.00
100
50.00
17.
Acceptance testing
2.00
2.00
100
50.00
TOTAL TEST CASES
2825
TEST CASES PER KLOC
56.50
TEST CASES PER PERSON (20 TESTERS)
141.25
will eventually end up with millions of test cases. Once created, test cases
have residual value for regression test purposes. Fortunately, a number
of automated tools can be used to store and manage test case libraries.
The existence of such large test libraries is a necessary overhead
of software development and maintenance. However, this topic needs
additional study. Creating reusable test cases would seem to be of value.
Also, there are often errors in test cases, which is why inspections of test
plans and test cases are useful.
With hundreds of different people creating test cases in large com-
panies and government agencies, there is a good chance that duplicate
tests will accidentally be created. In fact, this does occur, and a study at
img
336
Chapter Five
IBM noted about 30 percent redundancy or duplicates in one software
lab's test library.
The final Table 5-6 in this section shows defect removal efficiency
levels against six sources of error: requirements defects, design defects,
coding defects, security defects, defects in test cases, and performance
defects.
Table 5-6 is complicated by the fact that not every defect removal
method is equally effective against each type of defect. In fact, many
TABLE 5-6
Defect Removal Efficiency by Defect Type
Pretest Removal
Req.
Des.
Code
Sec.
Test
Perf.
Inspections:
defects
defects
defects
defects
defects
defects
1. Requirements
85.00%
2. Design
85.00%
25.00%
3. Code
85.00%
40.00%
15.00%
4. Test plans
85.00%
5. Test cases
85.00%
6. Static analysis
30.00%
87.00%
25.00%
20.00%
General Testing
7. Subroutine test
35.00%
10.00%
8. Unit test
30.00%
10.00%
9. New function test
15.00%
35.00%
10.00%
10. Regression test
15.00%
11. System test
10.00%
20.00%
25.00%
7.00%
25.00%
Special Testing
12. Performance testing
5.00%
10.00%
70.00%
13. Security testing
65.00%
14. Usability testing
10.00%
10.00%
15. Component testing
10.00%
25.00%
16. Integration testing
10.00%
30.00%
17. Nationalization testing
3.00%
18. Platform testing
10.00%
19. SQA validation testing
5.00%
5.00%
15.00%
20. Lab testing
10.00%
10.00%
10.00%
20.00%
External Testing
21. Independent testing
5.00%
30.00%
5.00%
5.00% 10.00%
22. Beta testing
30.00%
25.00%
10.00%
15.00%
23. Acceptance testing
30.00%
20.00%
5.00%
15.00%
Special Activities
24. Audits
15.00%
10.00%
25. Independent verification
and validation (IV&V)
10.00%
10.00%
10.00%
26. Ethical hacking
85.00%
Software Team Organization and Specialization
337
forms of defect removal have 0 percent efficiency against security flaws.
Coding defects are the easiest type of defect to remove; requirements
defects, security defects, and defects in test materials are the most dif-
ficult to eliminate.
Historically, formal inspections have the highest levels of defect
removal efficiency against the broadest range of defects. The more
recent method of static analysis has a commendably high level of defect
removal efficiency against coding defects, but currently operates only on
about 15 programming languages out of more than 700.
The data in Table 5-6 has a high margin of error, but the table itself
shows the kind of data that needs to be collected in much greater volume
to improve software quality and raise overall levels of defect removal effi-
ciency across the software industry. In fact, every software application
larger than 1,000 function points in size should collect this kind of data.
One important source of defects is not shown in Table 5-6 and that
is bad-fix injection. About 7 percent of bug repairs contain a fresh
bug in the repair itself. Assume that unit testing found and removed
100 bugs in an application. But there is a high probability that 7 new
bugs would be accidentally injected into the application due to errors
in the fixes themselves. (Bad-fix injections greater than 25 percent may
occur with error-prone modules.)
Bad-fix injection is a very common source of defects in software, but it
is not well covered either in the literature on testing or in the literature
on software quality assurance.
Another quality issue that is not well covered is that of error-
prone modules. As mentioned elsewhere in this book, bugs are not
randomly distributed, but tend to clump in a small number of very
buggy modules.
If an application contains one or more error-prone modules, then
defect removal efficiency levels against those modules may be only half
of the values shown in Table 5-6, and bad-fix injection rates may top
25 percent. This is why error-prone modules can seldom be repaired, but
need to be surgically removed and replaced by a new module.
In spite of the long history of testing and the large number of test per-
sonnel employed by the software industry, a great deal more research
is needed. Some of the topics that need research are automatic genera-
tion of test cases from specifications, developing reusable test cases,
better predictions of test case numbers and removal efficiency, and
much better measurement of test results in terms of defect removal
efficiency levels.
Demographics In the software world, testing has long been one of the
major development activities, and test personnel are among the largest
software occupation groups. But to date there is no accurate census of
338
Chapter Five
test personnel, due in part to the fact that so many different kinds of
specialists get involved in testing.
Because testing is on the critical path for releasing software, there
is a tendency for software project managers or even senior executives
to put pressure on test personnel to truncate testing when schedules
are slipping. By having test organizations reporting to separate skill
managers, as opposed to project or application managers, this adds a
measure of independence.
However, testing is such an integral part of software development that
test personnel need to be involved essentially from the first day that
development begins. Whether testers report to skill managers or are
embedded in project teams, they need early involvement during require-
ment and design. This is especially true with test-driven development
(TDD), where test cases are an integral part of the requirements and
design processes.
The minimum size of applications where formal testing
Project size
is mandatory is about 100 function points. As a rule, the larger the
application, the more kinds of pretest defect removal activities and
more kinds of testing are needed to be successful or even to finish the
application at all.
For large systems less than 10,000 function points, inspections, static
analysis, security analysis, and about ten forms of testing are needed
to achieve high levels of defect removal efficiency. Unfortunately, many
companies skimp on testing and nontest activities, so U.S. average
results are embarrassingly bad: 85 percent cumulative defect removal
efficiency. These results have been fairly flat or constant from 1996
through 2009.
Productivity rates There are no effective productivity rates for testing.
There are no effective size metrics for test cases. At a macro level, testing
productivity can be measured by using "work hours per function point"
or the reciprocal "function points per staff month," but those measures
are abstract and don't really capture the essence of testing.
Measures such as "test cases created per month" or "test cases exe-
cuted per month" send the wrong message, because they might encour-
age extra testing simply to puff up the results and not raise defect
removal efficiency.
Measures such as "defects detected per month" are unreliable, because
for really top-gun developers, there may not be very many defects to
find. The "cost per defect" metric is also unreliable for the same reason.
Testers will still run many test cases whether an application has any
bugs or not. As a result, cost per defect rises as defect quantities go
down; hence the cost per defect metric penalizes quality.
Software Team Organization and Specialization
339
The primary schedule issues for test personnel are those of
Schedules
test case creation and test case execution. But testing schedules depend
more upon the number of bugs found and the time it takes to repair the
bugs than on test cases.
One factor that is seldom measured but also delays test schedules
is bugs or defects in test cases themselves. A study done some years
ago by IBM found more bugs in test cases than in the applications
being tested. This topic is not well covered by the testing literature.
(This was the same study that had found about 30 percent redun-
dant or duplicate test cases in test libraries.) Running duplicate test
cases adds to testing costs and schedules, but not to defect removal
efficiency levels.
When testing starts on applications with high volumes of defects, the
entire schedule for the project is at risk, because testing schedules will
extend far beyond their planned termination. In fact, testing delays due
to excessive defect volumes is the main reason for software schedule
delays.
The most effective way to minimize test schedules is to have very few
defects present because pretest inspections and static analysis found
most of them before testing began. Defect prevention such as TSP or
joint application design (JAD) can also speed up test schedules.
For the software industry as a whole, delays in testing due to exces-
sive bugs is a major cause of application cost and schedule overruns
and also of project cancellations. Because long delays and cancellation
trigger a great deal of litigation, high defect potentials and low levels
of defect removal efficiency are causative factors in breach of contract
lawsuits.
Quality Testing by itself has not been efficient enough in finding bugs to
be the only form of defect removal used on major software applications.
Testing alone almost never tops 85 percent defect removal efficiency,
with the exception of the newer test-driven development (TDD), which
can hit 90 percent.
Testing combined with formal inspections and static analysis achieves
higher levels of defect removal efficiency, shorter schedules, and lower
costs than testing alone. Moreover, these savings not only benefit devel-
opment, but also lower the downstream costs of customer support and
maintenance.
Readers who are executives and qualified to sign contracts are
advised to consider 95 percent as the minimum acceptable level of
defect removal efficiency. Every outsource contract, every internal qual-
ity plan, and every license with a software vendor should require proof
that the development organization will top 95 percent in defect removal
efficiency.
340
Chapter Five
Testing specialization covers a wide range of skills.
Specialization
However, for many small companies with a generalist philosophy, soft-
ware developers may also serve as software testers even though they
may not be properly trained for the role.
For large companies, a formal testing department staffed by testing
specialists will give better results than development testing by itself.
For very large multinational companies and for companies that build
systems and embedded software, test and quality assurance specialists
will be numerous and have many diverse skills.
There are several forms of test certification available. Testers who go
to the trouble of achieving certification are to be commended for taking
their work seriously. However, there is not a great deal of empirical
data that compares the defect removal efficiency levels of tests carried
out by certified testers versus the same kind of testing performed by
uncertified testers.
The main caution about testing is that
Cautions and counter indications
it does not find very many bugs or defects. For more than 50 years, the
software industry has routinely delivered large software applications
with hundreds of latent bugs, in spite of extensive testing.
A second caution about testing is that testing cannot find require-
ments errors such as the famous Y2K problem. Once an error becomes
embedded in requirements and is not found via inspections, quality func-
tion deployment (QFD), or some other nontest approach, all that testing
will accomplish is to confirm the error. This is why correct requirements
and design documents are vital for successful testing. This also explains
why formal inspections of requirements and design documents raise
testing efficiency by about 5 percent per test stage.
The literature on testing is extensive but almost totally
Conclusions
devoid of quantitative data that deals with defect removal efficiency,
with testing costs, with test staffing, with test specialization, with
return on investment (ROI), or with the productivity of test personnel.
However, there are dozens of books and hundreds of web sites with
information on testing.
Several nonprofit organizations are involved with testing, such as the
Association for Software Testing (AST) and the American Society for
Quality (ASQ). There is also a Global Association for Software Quality
(GASQ).
There are local and regional software quality organizations in many
cities. There are also for-profit test associations that hold a number of
conferences and workshops, and also offer certification exams.
Given the central role of testing over the past 50 years of software
engineering, the gaps in the test literature are surprising and dismaying.
Software Team Organization and Specialization
341
A technical occupation that has no clue about the most efficient and cost-
effective methods for preventing or removing serious errors is not qualified
to be called "engineering."
Some of the newer forms of testing such as test-driven development
(TDD) are moving in a positive direction by shifting test case develop-
ment to earlier in the development cycle, and by joining test cases with
requirements and design. These changes in test strategy result in higher
levels of defect removal efficiency coupled with lower costs as well.
But to achieve really high levels of quality in a cost-effective manner,
testing alone has always been insufficient and remains insufficient in
2009. A synergistic combination of defect prevention and a multiphase
suite of defect removal activities that combine inspections, static analysis,
automated testing, and manual testing provide the best overall results.
For the software industry as a whole, defect potentials have been far
too high, and defect removal efficiency far too low for far too many years.
This unfortunate combination has raised development costs, stretched
out development schedules, caused many failures and also litigation,
and raised maintenance and customer support costs far higher than
they should be.
Defect prevention methods such as Team Software Process (TSP),
quality function deployment (QFD), Six Sigma for software, joint appli-
cation design (JAD), participation in inspections, and certified reusable
components have the theoretical potential of lowering defect potentials
by 80 percent or more compared with 2009. In other words, defect poten-
tials could drop from about 5.0 per function point down to about 1.0 per
function point or lower.
Defect removal combinations that include formal inspections, static
analysis, test-driven development, using both automatic and manual
testing, and certified reusable test cases could raise average defect
removal efficiency levels from today's approximate average of about
85 percent in 2009 up to about 97 percent. Levels that approach
99.9 percent could even be achieved in many cases.
Effective combinations of defect prevention and defect removal
activities are available in 2009 but seldom used except by a few very
sophisticated organizations. What is lacking is not so much the tech-
nologies that improve quality, but awareness of how effective the best
combinations really are. Also lacking is awareness of how ineffective
testing alone can be. It is lack of widespread quality measurements
and lack of quality benchmarks that are delaying improvements in
software quality.
Also valuable are predictive estimating tools that can predict both
defect potentials and the defect removal efficiency levels of any com-
bination of review, inspection, static analysis, automatic test stage,
and manual test stage. Such tools exist in 2009 and are marketed by
342
Chapter Five
companies such as Software Productivity Research (SPR), SEER, Galorath,
and Price Systems. Even more sophisticated tools that can predict the
damages that latent defects cause to customers exist in prototype form.
The final conclusion is that until the software industry can routinely
top 95 percent in average defect removal efficiency levels, and hit 99
percent for critical software applications, it should not even pretend
to be a true engineering discipline. The phrase "software engineering"
without effective quality control is a hoax.
Software Quality Assurance (SQA)
Organizations
The author of this book worked for five years in IBM's Software Quality
Assurance organizations in Palo Alto and Santa Teresa, California. As a
result, the author may have a residual bias in favor of SQA groups that
function along the lines of IBM's SQA groups.
Within the software industry, there is some ambiguity about the role
and functions of SQA groups. Among the author's clients (primarily
Fortune 500 companies), following is an approximate distribution of
how SQA organizations operate:
In about 50 percent of companies, SQA is primarily a testing orga-
nization that performs regression tests, performance tests, system
tests, and other kinds of testing that are used for large systems as
they are integrated. The SQA organization reports to a vice president
of software engineering, to a CIO, or to local development managers
and is not an independent organization. There may be some respon-
sibility for measuring quality, but testing is the main focus. These
SQA organizations tend to be quite large and may employ more than
25 percent of total software engineering personnel.
In about 35 percent of companies, SQA is a focal point for estimating
and measuring quality and ensuring adherence to local and national
quality standards. But the SQA group is separate from testing orga-
nizations, and performs only limited and special testing such as stan-
dards adherence. To have an independent view, the SQA organization
reports to its own vice president of quality and is not part of the devel-
opment or test organizations. (This is the form of SQA that IBM had
when the author worked there.) These organizations tend to be fairly
small and employ between 1 percent and 3 percent of total software
engineering personnel.
About 10 percent of companies have a testing organization but no
SQA organization at all. The testing group usually reports to the CIO
or to a vice president or senior software executive. In such situations,
testing is the main focus, although there may be some measurement
Software Team Organization and Specialization
343
of quality. While the testing organization may be large, the staffing
for SQA is zero.
In about 5 percent of companies, there is a vice president of SQA and
possibly one or two assistants, but nobody else. In this situation, SQA
is clearly nothing more than an act that can be played when custom-
ers visit. Such organizations may have testing groups that report
to various development managers. The so-called SQA organizations
where there are executives but no SQA personnel employ less than
one-tenth of one percent of total software engineering personnel.
Because software quality assurance (SQA) is concerned with more than
testing, it is interesting to look at the activities and roles of "traditional"
SQA groups that operate independently from test organizations.
1. Collecting and measuring software quality during development and
after release, including analyzing test results and test coverage. In
some organizations such as IBM, defect removal efficiency levels
are also calculated.
2. Predicting software quality levels for major new applications,
including construction of special quality estimating tools.
3. Performing statistical studies of quality or carrying out root-cause
analysis.
4. Examining and teaching quality methods such as quality function
deployment (QFD) or Six Sigma for software.
5. Participating in software inspections as moderators or recorders,
and also teaching inspections.
6. Ensuring that local, national, and international quality standards
are followed. SQA groups are important for achieving ISO 9000
certification, for example.
7. Monitoring the activities associated with the various levels of the
capability maturity model integration (CMMI). SQA groups play a
major part in software process improvements and ascending to the
higher levels of the CMMI.
8. Performing specialized testing such as standards adherence.
9. Teaching software quality topics to new employees.
10. Acquiring quality benchmark data from external organizations
such as the International Software Benchmarking Standards
Group (ISBSG).
A major responsibility of IBM's SQA organization was determining
whether the quality level of new applications was likely to be good
344
Chapter Five
enough to ship the application to customers. The SQA organization
could stop delivery of software that was felt to have insufficient quality
levels.
Development managers could appeal an SQA decision to stop the
release of questionable software, and the appeal would be decided by
IBM's president or by a senior vice president. This did not happen often,
but when it did, the event was taken very seriously by all concerned.
The fact that the SQA group was vested with this power was a strong
incentive for development managers to take quality seriously.
Obviously, for SQA to have the power to stop delivery of a new applica-
tion, the SQA team had to have its own chain of command and its own
senior vice president independent of the development organization. If
SQA had reported to a development executive, then threats or coercion
might have made the SQA role ineffective.
One unique feature of the IBM SQA organization was a formal "SQA
research" function, which provided time and resources for carrying out
research into topics that were beyond the state of the art currently avail-
able. For example, IBM's first quality estimation tool was developed
under this research program. Researchers could submit proposals for
topics of interest, and those selected and approved would be provided
with time and with some funding if necessary.
Several companies encourage SQA and other software engineering
personnel to write technical books and articles for outside journals such
as CrossTalk (the U.S. Air Force software journal) or some of the IEEE
journals.
One company, ITT, as part of its software engineering research lab,
allowed articles to be written during business hours and even provided
assistance in creating camera-ready copy for books. It is a significant
point that authors should be allowed to keep the royalties from the
technical books that they publish.
It is an interesting phenomenon that almost every company with
defect removal efficiency levels that average more than 90 percent has
a formal and active SQA organization. Although formal and active SQA
groups are associated with better-than-average quality, the data is not
sufficient to assert that SQA is the primary cause of high quality.
The reason is that most organizations that have low software quality
don't have any measurements in place, and their poor quality levels only
show up if they commission a special assessment, or if they are sued
and end up in court.
It would be nice to say that organizations with formal SQA teams aver-
age greater than 90 percent in defect removal efficiency and that similar
companies doing similar software that lack formal SQA teams average
less than 80 percent in defect removal efficiency. But the unfortunate fact
is that only the companies with formal SQA teams are likely to know
Software Team Organization and Specialization
345
what their defect removal efficiency levels are. In fact, quality measure-
ment practices are so poor that even some companies that do have an
SQA organization do not know their defect removal efficiency levels.
In the software world, SQA is not large numerically,
Demographics
but has been a significant source of quality innovation. There are per-
haps 5,000 full-time SQA personnel employed in the United States as
of 2009.
SQA organizations are very common in companies that build sys-
tems software, embedded software, or commercial software, such as SAP,
Microsoft, Oracle, and the like. SQA organizations are less common in
IT groups such as banks and finance companies, although they do occur
within the larger companies.
Many cities have local SQA organizations, and there are also national
and international equality associations as well.
There is one interesting anomaly with SQA support of software appli-
cations. Development teams that use the Team Software Process (TSP)
have their own internal equivalent of SQA and also collect extensive
data on bugs and quality. Therefore, TSP teams normally do not have
any involvement from corporate SQA organizations. They of course
provide data to the SQA organization for corporate reporting purposes,
but they don't have embedded SQA personnel.
Normally, SQA involvement is mandatory for large appli-
Project size
cations above about 2,500 function points. While SQA involvement
might be useful for smaller applications, they tend to have better
quality than large applications. Since SQA resources are limited,
concentrating on large applications is perhaps the best use of SQA
personnel.
There are no effective productivity rates for SQA
Productivity rates
groups. However, it is an interesting and important fact that produc-
tivity rates for software applications that do have SQA involvement,
and which manage to top 95 percent in defect removal efficiency,
are usually much better than applications of the same size that
lack SQA.
Even if SQA productivity itself is ambiguous, measuring the quality
and productivity of the applications that are supported by SQA teams
indicates that SQA has significant business value.
The primary schedule issues for SQA teams are the overall
Schedules
schedules for the applications that they support. As with productivity
and quality, there is evidence that an SQA presence on an application
tends to prevent schedule delays.
346
Chapter Five
Indeed if SQA is successful in introducing formal inspections, sched-
ules can even be shortened.
The most effective way to shorten software development schedules is
to have very few defects due to defect prevention, and to remove most
of them prior to testing due to pretest inspections and static analysis.
Since SQA groups push hard for both defect prevention and early defect
removal, an effective SQA group will benefit development schedules--and
especially so for large applications, which typically run late.
For the software industry as a whole, delays due to excessive bugs
are a major cause of application cost and schedule overruns and also of
project cancellations. Effective SQA groups can minimize the endemic
problems.
It is a proven fact that an effective SQA organization can lead to
significant cost reductions and significant schedule improvements for
software projects. Yet because the top executives in many companies do
not understand the economic value of high quality and regard quality
as a luxury rather than a business necessity, SQA personnel are among
the first to be let go during a recession.
Quality The roles of SQA groups center on quality, including quality
measurement, quality predictions, and long-range quality improvement.
SQA groups also have a role in ISO standards and the CMMI. SQA
organizations also teach quality courses and assist in the deployment
of methods such as quality function deployment (QFD) and Six Sigma
for software. In fact, it is not uncommon for many SQA personnel to be
Six Sigma black belts.
There is some uncertainty in 2009 about the role of SQA groups when
test-driven development (TDD) is utilized. Because TDD is fairly new,
the intersection of TDD and SQA is still evolving.
As already mentioned in the testing section of this chapter, read-
ers who are executives and qualified to sign contracts are advised to
consider 95 percent as the minimum acceptable level of defect removal
efficiency. Every outsource contract, every internal quality plan, and
every license with a software vendor should require proof that the devel-
opment organization will top 95 percent in defect removal efficiency.
There is one troubling phenomenon that needs more study. Large
systems above 10,000 function points are often released with hundreds
of latent bugs in spite of extensive testing and sometimes in spite of
large SQA teams. Some of these large systems ended up in lawsuits
where the author happened to an expert witness. It usually happened
that the advice of the SQA teams was not taken, and that the project
manager skimped on quality control in a misguided attempt to compress
schedules.
Software Team Organization and Specialization
347
Specialization SQA specialization covers a wide range of skills that can
include statistical analysis, function point analysis, and also testing.
Other special skills include Six Sigma, complexity analysis, and root-
cause analysis.
The main caution about SQA is that it
Cautions and counter indications
is there to help, and not to hinder. Dogmatic attitudes are counterpro-
ductive for effective cooperation with development and testing groups.
An effective SQA organization can benefit not only qual-
Conclusions
ity, but also schedules and costs. Unfortunately, during recessions, SQA
teams are among the first to be affected by layoffs and downsizing. As
the recession of 2009 stretches out, it causes uncertainty about the
future of SQA in U.S. business.
Because quality benefits costs and schedules, it is urgent for SQA
teams to take positive steps to include measures of defect removal effi-
ciency and measures of the economic value of quality as part of their
standard functions. If SQA could expand the number of formal quality
benchmarks brought in to companies, and collect data for submission
to benchmark groups, the data would benefit both companies and the
software industry.
Several nonprofit organizations are involved with SQA, such as the
American Society for Quality (ASQ). There is also a Global Association
for Software Quality (GASQ).
Local and regional software quality organizations are in many
cities. Also, for-profit SQA associations such as the Quality Assurance
Institute (QAI) hold a number of conferences and workshops, and also
offer certification exams.
SQA needs to assist in introducing a synergistic combination of defect
prevention and a multiphase suite of defect removal activities that
combine inspections, static analysis, automated testing, and manual
testing. There is no silver bullet for quality, but fusions of a variety
of quality methods can be very effective. SQA groups are the logical
place to provide information and training for these effective hybrid
methods.
Effective combinations of defect prevention and defect removal
activities are available in 2009, but seldom used except by a few very
sophisticated organizations. As mentioned in the testing section of
this chapter, what is lacking is not so much the technologies that
improve quality, but awareness of how effective the best combinations
really are. It is lack of widespread quality measurements and lack
of quality benchmarks that are delaying improvements in software
quality.
348
Chapter Five
Also valuable are predictive estimating tools that can predict both defect
potentials and the defect removal efficiency levels of any combination of
review, inspection, static analysis, automatic test stage, and manual test
stage. Normally, SQA groups will have such tools and use them frequently.
In fact, the industry's first software quality prediction tool was developed
by the IBM SQA organization in 1973 in San Jose, California.
The final conclusion is that SQA groups need to keep pushing until
the software industry can routinely top 95 percent in average defect
removal efficiency levels, and hit 99 percent for critical software applica-
tions. Any results less than these are insufficient and unprofessional.
Summary and Conclusions
Fred Brooks, one of the pioneers of software at IBM, observed in his clas-
sic book The Mythical Man Month that software was strongly affected
by organization structures. Not long after Fred published, the author
of this book, who also worked at IBM, noted that large systems tended
to be decomposed to fit existing organization structures. In particular,
some major features were artificially divided to fix standard eight-
person departments.
This book only touches the surface of organizational issues. Deeper
study is needed on the relative merits of small teams versus large teams.
In addition, the "average" span of control of eight employees reporting
to one manager may well be in need of revision. Studies of the effective-
ness of various team sizes found that raising the span of control from
8 up to 12 would allow marginal managers to return to technical work
and would minimize managerial disputes, which tend to be endemic.
Further, since software application sizes are increasing, larger spans of
control might be a better match for today's architecture.
Another major topic that needs additional study is that of really large
software teams that may include 500 or more personnel and dozens
of specialists. There is very little empirical data on the most effective
methods for dealing with such large groups with diverse skills. If such
teams are geographically dispersed, that adds yet another topic that is
in need of additional study.
More recently Dr. Victor Basili, Nachiappan Nagappan, and Brendan
Murphy studied organization structures at Microsoft and concluded
that many of the problems with Microsoft Vista could be traced back to
organizational structure issues.
However, in 2009, the literature on software organization structures
and their impact is sparse compared with other topics that influence
software engineering such as methods, tools, programming languages,
and testing.
Software Team Organization and Specialization
349
Formal organization structures tend to be territorial because manag-
ers are somewhat protective of their spheres of influence. This tends
to narrow the focus of teams. Newer forms of informal organizations
that support cross-functional communication are gaining in popularity.
Cross-functional contacts also increase the chances of innovation and
problem solving.
Software organization structures should be dynamic and change with
technology, but unfortunately, they often are a number of years behind
where they should be.
As the recession of 2009 continues, it may spur additional research
into organizational topics. For example, new subjects that need to be
examined include wiki sites, virtual departments that communicate
using virtual reality, and the effectiveness of home offices to minimize
fuel consumption.
A very important topic with almost no literature is that of dealing
with layoffs and downsizing in the least disruptive way. That topic is
discussed in Chapters 1 and 2 of this book, but few additional citations
exist. Because companies tend to get rid of the wrong people, layoffs
often damage operational efficiency levels for years afterwards.
Another important topic that needs research, given the slow develop-
ment schedules for software, would be a study of global organizations
located in separate time zones eight hours apart, which would allow
software applications and work products to be shifted around the globe
from team to team, and thus permit 24-hour development instead of
8-hour development.
A final organizational topic that needs additional study are the opti-
mum organizations that can create reusable modules and other reusable
deliverables, and then construct software applications from reusable
components rather than coding them on a line-by-line basis.
Readings and References
Brooks, Fred. The Mythical Man-Month. Reading, MA: Addison Wesley, 1995.
Charette, Bob. Software Engineering Risk Analysis and Management. New York:
McGraw-Hill, 1989.
Crosby, Philip B. Quality is Free. New York: New American Library, Mentor Books, 1979.
DeMarco, Tom. Controlling Software Projects. New York: Yourdon Press, 1982.
DeMarco, Tom. Peopleware: Productive Projects and Teams. New York: Dorset
House,1999.
Glass, Robert L. Software Creativity, Second Edition. Atlanta: *books, 2006.
Glass, R.L. Software Runaways: Lessons Learned from Massive Software Project Failures.
Englewood Cliffs, NJ: Prentice Hall, 1998.
Humphrey, Watts. Managing the Software Process. Reading, MA: Addison Wesley, 1989.
Humphrey, Watts. PSP: A Self-Improvement Process for Software Engineers. Upper
Saddle River, NJ: Addison Wesley, 2005.
Humphrey, Watts. TSP ­ Leading a Development Team. Boston: Addison Wesley, 2006.
Humphrey, Watts. Winning with Software: An Executive Strategy. Boston: Addison
Wesley, 2002.
350
Chapter Five
Jones, Capers. Applied Software Measurement, Third Edition. New York: McGraw-Hill,
2008.
Jones, Capers. Estimating Software Costs. New York: McGraw-Hill, 2007.
Jones, Capers. Software Assessments, Benchmarks, and Best Practices. Boston: Addison
Wesley Longman, 2000.
Kan, Stephen H. Metrics and Models in Software Quality Engineering, Second Edition.
Boston: Addison Wesley Longman, 2003.
Kuhn, Thomas. The Structure of Scientific Revolutions. Chicago: University of Chicago
Press, 1996.
Nagappan, Nachiappan, B. Murphy, and V. Basili. The Influence of Organizational
Structure on Software Quality. Microsoft Technical Report MSR-TR-2008-11.
Microsoft Research, 2008.
Pressman, Roger. Software Engineering ­ A Practitioner's Approach, Sixth Edition. New
York: McGraw-Hill, 2005.
Strassmann, Paul. The Squandered Computer. Stamford, CT: Information Economics
Press, 1997.
Weinberg, Gerald M. Becoming a Technical Leader. New York: Dorset House, 1986.
Weinberg, Gerald M. The Psychology of Computer Programming. New York: Van
Nostrand Reinhold, 1971.
Yourdon, Ed. Outsource: Competing in the Global Productivity Race. Upper Saddle River,
NJ: Prentice Hall PTR, 2005.
Yourdon, Ed. Death March ­ The Complete Software Developer's Guide to Surviving
"Mission Impossible" Projects. Upper Saddle River, NJ: Prentice Hall PTR, 1997.