Domain Engineering:
An Empirical Study
William Frakes
C. David Harris, Jr.
Department of Computer Science
Virginia
Polytechnic Institute and
December
21, 2006
Domain Engineering:
An Empirical Study
William
Frakes C.
David Harris, Jr.
December
21, 2006 Frakes
and Harris
Domain Engineering:
An Empirical Study
William
Frakes and C. David Harris Jr.
Virginia
Tech
Abstract: This paper presents a summary and
analysis of data gathered from thirteen domain engineering projects, participant
surveys, and demographic information. Taking a failure modes approach, project
data is compared to an ideal model of the DARE methodology, revealing valuable
insights into points of failure in the domain engineering process. This study suggests
that success is a function of the domain analyst’s command of a specific set of
domain engineering concepts and skills, the time invested in the process, and
persistence in difficult areas. We conclude by presenting strategies to avoid
points of failure in future domain engineering projects.
1.
Introduction
Domain engineering has emerged as an
important topic in software engineering research and practice. While many
methods have been developed to support domain engineering [2], there has been
little empirical analysis of the domain engineering process. This paper
conducts an empirical analysis of thirteen domain engineering projects in a
university setting. Taking a failure
modes approach, this report analyzed the collected project data and various project
outcomes to identify points of improvement. Sources of information include the DARE books,
participants’ demographic information, and survey data taken during and after
the domain engineering exercises. These
artifacts were produced over the course of three years as part of a
graduate-level, advanced software engineering course at Virginia Tech.
These resources present important
opportunities to discover individual strategies for domain engineering. The
data alone presents a contribution to domain engineering research; however, this
report’s analysis and conclusions likewise provide valuable insights for
improving the craft of domain engineering.
2.
Domain
Engineering
2.1.
Context
Software developers have always done
software reuse. Operating system libraries
and programming language syntactic elements are designed to be reused and have
been successful in standardizing computer programming languages and methods.
There are many types of reuse.
[Frakes and Terry 96] For example a programmer may reuse software by selecting
lines of code from preexisting projects and copying into new applications.
He/she may also reuse software functions or libraries. However, developing a framework to
systematically engineer domain specific reusable software components requires an
entirely different process and methodology.
The term software engineering was first popularized in a 1968 NATO
conference with the hopes of applying structured engineering principles to the
process of creating software [9]. As a
discipline, software engineering is an evolutionary leap from the craft of
computer programming. Traditionally,
successful development is the result of intuition, hard work, and individual
talent.
But as commercial demands for
software increase, it becomes impossible to continue traditional ad hoc, single
system development strategies [10]. Current software engineering research aspires
to apply a more formal methodology to software development.
Software engineering models strive
to manage the software development process. Improved requirements gathering,
design, quality, cost, and time estimation are all goals of the various
software engineering processes. The numerous software engineering processes -
waterfall, iterative, capability maturity model, etc- all have the common goal
of improving software development. An emerging goal of software engineering is to
design software assets so that the software can be reused easily. Creating
reusable software components in a particular domain is a goal of domain engineering.
2.2.
Design
for Reuse
Like software engineering, the field
of software reuse has its origins in the late sixties. During this time, Doug
Mcilroy of Bell Laboratories proposed software reuse as an area of study and
recommended basing the software industry upon reusable components [7]. Other research
contributions include Parnas’ notion of programming families and Neighbors’ idea
of domain analysis [2]. Since this time, software reuse has been an active area
of research.
Domain engineering recognizes that software
development communities create software in an application area or a set of
systems that share design decisions [2]. Commercial groups create software
products in a given market area. Problem domains such a software metrics or
vocabulary analysis inspire research, publications, and software from multiple
sources.
The process of domain engineering has
two phases. Domain analysis systematically
analyzes preexisting software, architectures, and documents to capture their
commonalities and variabilities to develop a generic architecture and templates.
Domain implementation develops software artifacts that can be produced new
systems within a domain.
2.3.
Benefits
A core benefit of domain engineering
is that development with reusable software assets – libraries, code generators,
domain specific languages, contributes to improved software quality and
productivity. Software that is reused is
often tested more frequently than software that has a single use or
function. Using reusable assets typically
decreases development costs, for the reuse requires less time, less testing,
and less new documentation. In addition,
reusable assets often improve the accuracy of initial development estimates of
time and cost [7].
2.4.
Literature Review
Several Domain engineering methodologies
exist, including Family-Oriented Abstraction Specification and Translation [8],
Feature-Oriented Domain Analysis (FODA) [9], Organization Model Modeling (ODM),
and Domain Engineering Method and Reusable Algorithmic Libraries (DEMRAL). This
report uses the Domain Analysis and Reuse Environment methodology [6].
2.5.
Domain Analysis Reuse Environment
The Domain Analysis and Reuse Environment
(DARE) methodology is a multidisciplinary approach that has two phases: domain
analysis and domain implementation.
Domain Analysis
In domain analysis, a domain
analyst, with the help of a domain expert, collects domain source code,
documents, and architectures. Through a
systematic study of these domain sources, the domain analyst acquires an
understanding of the commonalities and variabilities of domain systems. Specific activities include the
identification and organization of important vocabularies, the development of feature
tables, and the creation of facet tables and generic architectures. These products of analysis provide the
material for domain implementation.
Domain Implementation
Software reuse can be divided into
two types: parts-based and formal language based.
Parts-based reuse is better known. It
typically introduces reusable software “parts” into a language. Examples of parts-based reuse include the
reuse of programming functions, classes, and libraries. Java, for example,
provides a large collection of reusable java libraries in the form of Java
Application Resource (JAR) packages. Parts based reuse can also include
language frameworks - interacting collections of objects, such as the C++
Standard Template Library, or the Microsoft .NET Framework.
The second type of reuse, formal
language based, involves creating a general, formal language that has the
reusable information embedded into it [8]. Formal language based reuse can be
further divided into two types: programming language based and generative.
Domain specific languages that are
created for a small problem area are called Little
Languages.[Bentley] R, for example,
is a statistics language that readily processes data sets and produces summary
statistics and graphs with a few commands. Others examples of little languages include:
Lex (Lexical Analysis), Yacc (Yet
Another Compiler Compiler), Csound for dealing with sound, Make for managing
software project compilation, Extensible
Stylesheet Language Transformations (XSLT) for translating XML documents, and
Ant (Anther Neat Tool) for managing Java projects.[20,21]
In the generative approach, domain
knowledge is built into the generator such that based upon a few specifications,
an application can be generated.
AndroMDA, for example, is a model-driven architecture that translates
UML into software that integrates with several common architectures, such as
Spring, EJB, Struts, Hibernate, JSF, and Java. Other examples of language
specific application generators are Sun’s Netbeans and Microsoft Visual
Studio. Both of these programming user
interfaces contain a collection of default project types that can be generated
automatically. They also facilitate adding commonly used application components
and provide ready access to software libraries.
Systematic Reuse and Product Line
Engineering
The idea of applying a product line
approach to systematic software reuse has its origins in manufacturing, where
products are produced in a structured way.
This formal creation takes full advantage of the knowledge of the
commonalities and variabilities of each release. Software Engineering Institute
defines a software product line as “… a set of software-intensive systems that
share a common, managed set of features satisfying the specific needs of a
particular market segment or mission and that are developed from a common set
of core assets in a prescribed way.”
2.6.
A Unique
Measures of software reuse are difficult
to estimate. In the past, individual organizations like NASA and Motorola have
instigated reuse programs and reported improved reuse rates from ~20% to 79% [26],
and 14% to 85% [27]. An industry-wide perspective, however, is a difficult
statistic to come by. Research indicates
that there are great differences between reuse practices among different
industries, and the factors that influence the success and failures of software
reuse practices are complex [29].
This study explored this complex
failure space by capturing the efforts of thirteen individuals’ first domain
engineering exercises. Surveys of the participants in this study indicate they
reuse software on average 30% across all the lifecycle objects in their
organizations and 28% of the lifecycle objects they personally create. In addition, none had a work program for
software reuse education or a means to measure levels of reuse. [Appendix C].
From a
failure modes perspective, this study offers insight into the factors that
contribute to both success and failure in domain engineering. The collection of
data alone provides valuable information, but it is hoped that its analysis and
the conclusions of this report provide helpful guidance for future domain
engineering efforts, and will contribute more reuse rates in the workplace.
3.
The Study
3.1.
Demographics
In this study, thirteen subjects
completed domain engineering in several domains using DARE-COTS [1] as part of
an advanced graduate course in software engineering. Projects were completed in
a 14-week period from January to April of 2005 and January to April 2006. None
of the subjects had prior knowledge of domain engineering. Demographic
information on the subjects is reported in table 3,1.
Demographic information includes the
subject’s degree level, the subject’s degree area he/she was pursing (computer
science or information systems), the subject’s years of industrial experience, the
subject’s primary programming language, the subject’s chosen domain, and the subject’s
language used in the project domain implementation.
Subject |
Level |
Degree |
Experience |
Languages in order of familiarity |
Domain |
Implementation Language |
1 |
M.S. |
MIS |
0 |
NA |
Conflation |
Java |
2 |
M.S. |
CS |
9.5 |
VB.NET, C#,
VB6, PL/SQL, Java C++ |
AHLTA Longitudinal Domain Book |
NA |
3 |
M.S. |
CS |
5 |
Java, PL/SQL,
Perl, C |
Open source
java metrics |
Java |
4 |
M.S. |
CS |
3 |
C++ |
Symmetric
encryption |
C |
5 |
M.S. |
CS |
15 |
C++,C, Fortran,
Assembler, Java |
Simple metrics |
Perl |
6 |
M.S. |
CS |
10 |
C++, C#, Java,
Perl |
Sentence
Alignment Systems |
C# |
7 |
Ph.D. |
CS |
1 |
Java, C++, C |
Conflation |
Java |
8 |
|
CS |
13 |
C++, C |
Conflation |
Perl |
9 |
MS |
CS |
3 |
Java, Ruby, C++ |
Blog domain |
NA |
10 |
M.S. |
CS |
7 |
Visual Basic,
C++, C |
Personal
information management systems |
Visual Basic |
11 |
M.S. |
MIS |
|
|
Conflation |
NA |
12 |
M.S. |
CS |
1 |
C++, Pascal, C |
Static code
metrics |
C |
13 |
M.S. |
CS |
6 |
Java, SQL, C# |
Object Oriented
Software Metric Tools |
Java |
Table 1: Participant Demographics
3.2.
Assignment
Students were given the following
semester-long assignment:
3.3.
DARE Book Structure
The completed DARE book contained
the following sections:
DARE
Definition: (Domain Analysis Reuse Environment) [1]
DARE Book provides a detailed specifications of the domain
Domain Sources
Vocabulary Analysis
Architecture Analysis, code analysis
Summary information, glossary, bibliography index, and
appendix
DARE Book
Domain sources
Source documentation
Source code
Source notes
System description
System architectures
System feature tables
Vocabulary
Basic vocabulary, facets, synonyms, templates, thesaurus
Architecture
Generic architecture
Generic features
Code structure
Reusable Components
Summary
Domain Glossary
Bibliography
Index
Appendix
3.4.
The
Paper Outline
The remainder of this paper is as
follows: Section 4 defines an idealized model or the DARE methodology for
domain engineering. Data collected from the thirteen projects is presented in
section 5, and then data is analyzed for points of failure in section 6. These
observations are summarized in section 7’s themes of failure, followed by section
8’s principles of success. Section 9,
10, and 11 then suggest practical implications of these observations for future
DARE projects endeavors and future research.
4.
DARE
Methodology Model
4.1.
Domain
Analysis
Domain engineering’s fist phase,
domain analysis, is the process of analyzing a given domain to derive domain
models. Sources of this analysis include exemplar code, exemplar text and
domain expert information. Examples of models include generic architectures,
facet tables, feature tables and templates.
4.1.1.
Scoping
the Domain
Domain
analysis begins by selecting a domain that can be bounded and scoped to a
manageable level. With the help of a
domain expert, the domain analyst scopes a domain and creates a formal domain
model. Scoping the domain involves gathering suitable domain exemplars,
documents, and notes, then describing the domain verbally. Set notation may be necessary to state
clearly what is in and what is not in the domain [25]. The success of this is dependent, in part, on
the degree to which a chosen domain is suitable for analysis.
4.1.2.
Domain
Suitability
The suitability of a domain for analysis
is a function of several factors. First among these is the availability of at
least three exemplar systems. The suitability of exemplar systems for domain
analysis is a function of system complexity, stability, size, and formality. System
complexity increases with the number of system types, and the complexity of the
system source code.
System stability is related to the
quality of the project. Elements of stability of an exemplar system
include: code quality, maintenance
support, release history, experience of authors, comment to code ratio, coding
style, and complexity metric.
The size of a system can be a
determining factor in whether the system can be scoped to a manageable size for
domain analysis. For instance, the
domain of software metrics might better be scoped to static complexity analysis.
A system with high formality will
contain documentation, architectures, and ample software comments. An exemplar system produced in a formal
methodology contributes to stability.
4.1.3.
Vocabulary
Analysis
The vocabulary analysis steps
include gathering the domain exemplar code, exemplar text and domain expert information,
and compiling an initial word set. Then
using automated processes and domain knowledge, this set of words is reduced to
a manageable set of key words. Diagram 1
represents the vocabulary analysis methodology.
Various vocabulary analysis methods include the use of stop lists,
stemming, conflation, clustering, and frequency.
Cluster tables group words together
around a point of commonality. The
points of commonality become the column titles of the facet table with the
cluster words becoming the rows or points of variability. The facet table maps directly to a template whereby
descriptive words describe the generic system with the facet table’s columns
mapping to variables. This template then is used in the creation of the
reusable asset.
Diagram 1: Vocabulary Analysis
The cluster
tables, facet table, and template are all derived from the key word set. At each step of the process, domain analyst
system knowledge plays a key role in refining the tables.
4.1.4.
Architectural
Analysis
The domain analyst, along with the
domain expert, creates a system feature table for each exemplar system. System
feature tables describe the attributes of each of the systems.
System architectures are
architectures available in the documentation of the exemplar systems.
A generic architecture is formed, in
large part, by an analysis of the set of available system architectures and the
template produced by the vocabulary analysis.
4.1.5.
Software Analysis
The exemplar code is analyzed using
a variety of techniques to understand how it can be incorporated in a reusable
asset. Software components, such as
classes, functions, and libraries, can be identified and incorporated into a
parts-based reusable asset. Software
metrics can be used to identify areas of complexity or maintainability that can
assist in a redesigned, reusable component. This process is aided by analysts
that possess domain expertise, programming language knowledge, and an
understanding of software analysis metrics.
4.2.
Domain Implementation
Reusable
assets are derived by domain implementation based on domain models Assets can
be parts based components, domain specific languages, or application generators.
In a parts-based implementation, domain knowledge gained from software analysis
can contribute to creating reusable language components, such as classes and
libraries. These parts can be incorporated and reused in future software
applications. In a formal language
implementation, the template and generic architectures provide frameworks for
creating domain specific languages or application generators.
5.
Project Data
The following section presents data
collected from thirteen DARE projects. Data considerations are organized by
measures of time, measures of size, domain scope, vocabulary analysis,
architectural analysis, and dare book tables.,
The Discussions of their implications can be found in section 6.
5.1.
Activities Time Log
Participants were instructed to
provide a log of time spent on each stage of the process.
All times are given in hours except
where noted.
Table 2. Activities Time Log Entries |
|||||||||||||
Project Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
Book
Creation |
|
12 |
|
20 |
|
|
|
2 |
|
|
|
|
|
Finding
good tools |
|
16.25 |
50 |
|
|
|
|
|
|
|
|
|
|
gathering
source information |
8 |
2 |
|
6 |
|
|
1 ,,, 2
weeks |
13 |
|
|
|
38 |
|
Documents |
|
|
|
|
|
|
|
|
|
|
9 |
|
2 |
source
code |
|
|
|
|
|
|
|
|
|
|
4 |
|
0.5 |
system
descriptions |
|
|
|
|
|
|
|
0.5 |
|
|
|
|
|
system
architectures |
|
|
|
|
|
|
2 … 3
weeks |
8 |
|
|
8 |
16 |
|
system
feature tables |
|
|
|
|
|
2 |
3 days |
0.5 |
|
|
|
20 |
3 |
source
notes |
|
|
|
|
|
|
|
|
|
|
|
|
1 |
expert
systems |
|
|
|
|
|
12 |
|
5 |
|
|
|
|
0.5 |
study of
domain |
|
|
|
46 |
|
|
|
3 |
|
|
9 |
|
|
domain
scope |
1 |
|
|
|
|
|
3 weeks |
3 |
|
|
2 |
|
|
vocabulary
analysis |
4 |
21 |
20 |
12 |
30 |
13 |
2 weeks |
0 |
0 |
0 |
3 |
|
|
basic
vocabulary |
|
|
|
|
1 |
|
|
|
|
|
|
|
1 |
frequency
analysis |
|
|
|
|
|
4 |
|
|
|
|
|
|
|
cluster
analysis |
|
|
|
|
|
8 |
|
|
|
|
|
|
2 |
facet
table |
|
|
|
|
1 |
2 |
4 days |
2 |
|
|
|
20 |
0.5 |
synonym
table |
|
|
|
|
1 |
0.5 |
|
|
|
|
|
|
0.5 |
template |
|
|
|
|
1 |
1 |
|
|
|
|
|
|
0.5 |
thesaurus |
|
|
|
|
1 |
0.5 |
|
|
|
|
|
|
0.5 |
vocabulary
notes |
|
|
|
|
1 |
|
|
1 |
|
|
|
|
|
code
analysis |
30 |
|
3 |
2 |
2 |
4 |
5 days |
1 |
|
|
3 |
|
|
source
notes |
|
|
|
|
|
|
|
3 |
|
|
|
|
|
Arch
analysis |
5 |
14 |
60 |
10 |
2 |
|
|
|
|
|
|
|
|
generic
architecture |
|
|
|
|
2 |
4 |
|
4 |
|
|
6 |
16 |
0.5 |
generic
feature table |
|
|
|
|
2 |
|
1.5 days |
1 |
|
|
2 |
16 |
|
architecture
notes |
|
|
|
|
2 |
|
1.5 days |
3 |
|
|
|
|
|
implementation/
reusable component |
5 |
|
55 |
80 |
4 |
24 |
5 weeks |
5 |
|
55 |
3 |
|
0.5 |
reusable
algorithm |
|
|
|
|
|
|
|
62 |
|
|
|
|
|
glossary |
|
|
|
|
|
|
|
5 |
|
|
|
|
2 |
testing |
|
|
|
|
|
|
|
|
|
15 |
|
|
0.5 |
5.1.1.
Number
of time log entries for each participant.
This table considers the number of
time log entries made by each participant. The variability of these measures
may play a role in understanding different project outcomes.
Table 3 Number of Time Log Entries |
|||||||||||||
3 |
5 |
5 |
6 |
6 |
7 |
10 |
10 |
12 |
13 |
14 |
19 |
3 |
|
Statistical Summary 1 - Number of
time Log Entries |
|||
Sample
Size, n: |
13 |
Range |
16 |
Mean |
8.69 |
Minimum |
3 |
Median |
7 |
1st
Quartile |
5 |
Midrange |
11 |
2nd
Quartile |
7 |
Variance |
23.06 |
3rd
Quartile |
12 |
Standard
Deviation |
4.80 |
Maximum |
19 |
5.1.2.
Total
time creating the DARE project
This table considers the total time
invested in DARE project. Time investment may be an indicator of project
success.
Table 4. Total Time of DARE
project |
||||||||||||
53 |
65.25 |
188 |
176 |
65 |
75 |
NA |
1222 |
NA |
70 |
49 |
126 |
13 |
Statistical Summary 2,
Total Book Time Statistical Summary |
|||
Sample Size, n: |
11 |
Range |
175 |
Mean |
91.11 |
Minimum |
13 |
Median |
70 |
1st Quartile |
53 |
Midrange |
100.5 |
2nd Quartile |
70 |
Variance |
3014.79 |
3rd Quartile |
126 |
Standard Deviation |
54.9 |
Maximum |
188 |
5.1.3.
Time
Preparing the Domain
For purposes of analysis the report
groups together the time log steps of table 2 from “finding good tools” to
“domain scope”. These steps comprise an important phase of the DARE book,
referred to in this paper as “preparing the domain”.
Table 5. Total time preparing the
domain |
||||||||||||
9 |
18.25 |
50 |
52 |
15 |
14 |
NA |
33 |
NA |
NA |
32 |
74 |
7 |
Statistical Summary 3, Domain
Preparation Time |
|||
Sample
Size, n: |
10 |
Range |
67 |
Mean |
30.22 |
Minimum |
7 |
Median |
25.125 |
1st
Quartile |
12 |
Midrange |
40.5 |
2nd
Quartile |
25.12 |
Variance |
498.83 |
3rd
Quartile |
50 |
Standard
Deviation |
22.3 |
Maximum |
74 |
5.1.4.
Time
invested in vocabulary analysis
In this
report, vocabulary analysis encompasses the creation time log steps, table 2,
from “vocabulary analysis” to “vocabulary notes”. Diagram 1 of section 4.1
graphically shows the collection of steps involved in vocabulary analysis.
Table 6. Total time spent on
vocabulary analysis |
||||||||||||
4 |
21 |
20 |
12 |
36 |
29 |
NA |
3 |
NA |
NA |
3 |
20 |
5 |
Statistical Summary 4 - Vocabulary
Analysis Time |
|||
Sample
Size, n: |
10 |
Range |
33 |
Mean |
15 |
Minimum |
3 |
Median |
16 |
1st
Quartile |
4 |
Midrange |
19.2 |
2nd
Quartile |
16 |
Variance |
129.55 |
3rd
Quartile |
21 |
Standard
Deviation |
11.38 |
Maximum |
36 |
5.1.5.
Time
Implementing Reusable Assets:
The time for implementing reusable
assets is derived from the creating time log entries, table 2, “implementing
reusable component”, “reusable algorithm” and “testing”
Table 7. Time for Domain Implementation |
|||||||||||||
implementation/ reusable component |
5 |
|
55 |
80 |
4 |
24 |
5 weeks |
5 |
|
55 |
3 |
|
0.5 |
reusable algorithm |
|
|
|
|
|
|
|
62 |
|
|
|
|
|
Testing |
|
|
|
|
|
|
|
|
|
15 |
|
|
|
Total |
5 |
|
55 |
80 |
4 |
24 |
|
67 |
|
70 |
3 |
|
0.5 |
Statistical Summary 5, Time
Implementing Reusable Assets |
|||
Sample
Size, n: |
9 |
Range |
79.5 |
Mean |
34.2 |
Minimum |
0.5 |
Median |
24 |
1st
Quartile |
4 |
Midrange |
40.25 |
2nd
Quartile |
24 |
Variance |
1108.19 |
3rd
Quartile |
67 |
Standard
Deviation |
33.28 |
Maximum |
80 |
5.1.6.
Time
Groups
In the category of implementing
reusable assets, there were to two distinct groups: those who spent 50 hours or
more, and those who spent 5 or less hours.
The data points break down as follows:
Table 8. Time spent implementing
Reusable Assets |
||
<=
5 |
0.5,
3, 4, 5 |
|
>=50 |
55,
67, 70,80 |
|
5.1.7.
Time
Outliers
Outliers of the time entry log,
table 2, are listed below. Although
there is a good deal of variation of the time entry log, outliers in the following
categories are noteworthy. The following
table lists areas of the project where students experienced difficulty
completing the task.
Time outliers |
|
Finding good tools |
50 |
Gathering Source Information |
38 |
Study of domain |
46 |
Architectural Analysis |
60 |
5.2.
Book
size
Book size was measured by the number
of words, number of lines, and number of pages.
There was considerable variation in
book size due to the different individual approaches to book construction. Book
style ranged from those who included domain sources with their projects - the
system exemplar code, vocabulary analysis results, code analysis, etc., to
those whose books were little more than an outline with references.
Two areas that greatly increased the
size of books were the inclusion of system source code and intermediary
vocabulary analysis tables. In one
project, for example, a 422 page book was reduced to 20 pages after removing
376 exemplar source code and 26 pages of vocabulary analysis.
Book Size |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
Words |
12966 |
6448 |
3101 |
6334 |
3218 |
3350 |
10396 |
1416 |
2962 |
74939 |
28068 |
3396 |
1083 |
Lines |
4021 |
4757 |
457 |
3047 |
1202 |
683 |
3,636 |
613 |
428 |
26648 |
6628 |
1076 |
227 |
Pages |
81 |
68 |
17 |
60 |
17 |
19 |
65 |
32 |
19 |
422 |
125 |
20 |
8 |
Words - source code |
10,060 |
6,185 |
3101 |
6334 |
3218 |
3350 |
3282 |
1416 |
2962 |
9,104 |
17314 |
3396 |
1083 |
Lines - source code |
3,103 |
4,494 |
457 |
3047 |
1202 |
683 |
1518 |
613 |
428 |
6653 |
3658 |
1076 |
227 |
Pages - source code |
62 |
62 |
17 |
60 |
17 |
19 |
28 |
32 |
19 |
46 |
78 |
20 |
8 |
Words - vocabulary frequency analysis |
10,060 |
3,060 |
3101 |
6334 |
3218 |
3350 |
10396 |
1416 |
2962 |
68869 |
24008 |
3396 |
1083 |
Lines - vocabulary frequency analysis |
3,103 |
1,257 |
457 |
3047 |
1202 |
683 |
3,636 |
613 |
428 |
20598 |
4598 |
1076 |
227 |
Pages - vocabulary frequency analysis |
62 |
41 |
17 |
60 |
17 |
19 |
65 |
32 |
19 |
379 |
91 |
20 |
8 |
Words - source and vocabulary |
1,060 |
2,787 |
3101 |
6334 |
3218 |
3350 |
3282 |
1416 |
2962 |
3036 |
13252 |
3396 |
1083 |
Lines -source and vocabulary |
3,103 |
993 |
457 |
3047 |
1202 |
683 |
1518 |
613 |
428 |
604 |
1627 |
1076 |
227 |
Pages - source and vocabulary |
62 |
35 |
17 |
60 |
17 |
19 |
28 |
32 |
19 |
11 |
44 |
20 |
8 |
The following charts display DARE
book size by word count, line count, and page count.
5.2.1.
Book
Word Count
Chart 6 shows the total word count
for DARE books. These measurements do not remove exemplar source code or
vocabulary analysis.
Table 6, Word Count in DARE book. |
||||||||||||
12966 |
6448 |
3101 |
6334 |
3218 |
3350 |
10396 |
1416 |
2962 |
74939 |
28068 |
3396 |
1083 |
Statistical Summary 6 : DARE Book
Word Count |
|||
Sample
Size, n: |
13 |
Range |
73856 |
Mean |
12129 |
Minimum |
1083 |
Median |
3396 |
1st
Quartile |
3101 |
Midrange |
38011 |
2nd
Quartile |
3396 |
Variance |
4.086476e+8 |
3rd
Quartile |
10396 |
Standard
Deviation |
20215.03 |
Maximum |
74939 |
5.2.2.
Word
Count without exemplar code and
vocabulary analysis
Here shows the total word count for
all projects with exemplar source code and vocabulary analysis removed. These measurements have less variance with source
code and vocabulary analysis removed.
Table 7, DARE book word count with
exemplary code and vocabulary frequency analysis removed. |
||||||||||||
1060 |
2787 |
3101 |
6334 |
3218 |
3350 |
3282 |
1416 |
2962 |
3036 |
1924 |
3396 |
1083 |
Statistical Summary 7: Book Words minus source and vocabulary |
|||
Sample
Size, n: |
13 |
Range |
5274 |
Mean |
2842.23 |
Minimum |
1060 |
Median |
3036 |
1st
Quartile |
1924 |
Midrange |
3697 |
2nd
Quartile |
3036 |
Variance |
1.863497e+6 |
3rd
Quartile |
3282 |
Standard
Deviation |
1365.1 |
Maximum |
6334 |
5.2.3.
Line
Count – code and vocabulary
Chart 8 displays the total line
count for all projects with exemplar source code and vocabulary analysis
removed.
Table 8, DARE book line count
without exemplary source code and vocabulary frequency analysis |
||||||||||||
3103 |
993 |
457 |
3047 |
1202 |
683 |
1518 |
613 |
428 |
604 |
438 |
1076 |
227 |
Statistical Summary 8: Line Count
minus code and vocabulary |
|||
Sample
Size, n: |
13 |
Range |
2876 |
Mean |
1106.84 |
Minimum |
227 |
Median |
683 |
1st
Quartile |
457 |
Midrange |
1665 |
2nd
Quartile |
683 |
Variance |
893486.8 |
3rd
Quartile |
1202 |
Standard
Deviation |
945.24 |
Maximum |
3103 |
5.2.4.
Page
Count – code and vocabulary
Here shows the total page count for
all projects with exemplar source code and vocabulary analysis removed
Table 9, DARE book page count without exemplary source code and
vocabulary frequency analysis |
||||||||||||
62 |
35 |
17 |
60 |
17 |
19 |
28 |
32 |
19 |
11 |
17 |
20 |
8 |
Statistical Summary 9: Page Count
minus code and vocabulary |
|||
Sample
Size, n: |
13 |
Range |
54 |
Mean |
26.53 |
Minimum |
8 |
Median |
19 |
1st
Quartile |
17 |
Midrange |
35 |
2nd
Quartile |
19 |
Variance |
291.26 |
3rd
Quartile |
32 |
Standard
Deviation |
17.06 |
Maximum |
62 |
5.3.
Domain Scope
Domain analysts are to describe
their domains verbally, describing what is in and what is not in the domain,
using set notation if necessary. These
measurements capture this domain scoping statement, by measuring the number of
words and sets in the statement.
5.3.1.
Number of Words
Chart 10
displays the number of words present in each projects domain scope statement
Table 10.a , Domain Scope Word
Count |
||||||||||||
64 |
NA |
178 |
1075 |
141 |
688 |
160 |
229 |
NA |
255 |
NA |
NA |
18 |
Statistical Summary, word count in the domain scope section
Statistical Summary 10: Domain
Scope Word Count |
|||
Sample
Size, n: |
9 |
Range |
1057 |
Mean |
312 |
Minimum |
18 |
Median |
178 |
1st
Quartile |
141 |
Midrange |
546.5 |
2nd
Quartile |
178 |
Variance |
118990.5 |
3rd
Quartile |
255 |
Standard
Deviation |
344.95 |
Maximum |
1075 |
5.3.2.
Mathematical Model using Set Notation
For those
projects that had domain statements, table 10.b notates if the project used set
notation.
Table 10.b Presence of set
notation. |
||||||||||||
1 |
NA |
0 |
1 |
0 |
0 |
0 |
1 |
NA |
0 |
NA |
NA |
0 |
Three of the nine projects that had
a section devoted to scoping the domain, three or 33%, used a set based
mathematical model to describe their domains.
Two
individuals who did not use set notation used different notation. One
participant used pseudo code in the scope statement, and the other used a
top-down function and connector architecture.
5.3.3.
Number of sets
Table 10.c displays for those
projects that used set notation, how many sets were used in their model.
Table 10.c, Number of sets in
domain scope |
||||||||||||
4 |
NA |
NA |
20 |
NA |
NA |
NA |
8 |
NA |
NA |
NA |
NA |
NA |
Of the three projects that used set
notation to scope their domains, the number of sets used in the model was 4, 8,
and 20.
5.3.4.
Domain Sources
A key component to domain analysis
is the selection of domain sources.
These exemplars include source code, documents, and architectures.
5.3.4.1.
Count of Exemplar Sources
Chart 11 shows the number of number
of examples of domain source code used for each project’s domain analysis.
Table 11, Number of
Exemplar sources |
||||||||||||
3 |
7 |
3 |
3 |
3 |
4 |
3 |
8 |
4 |
3 |
3 |
3 |
3 |
Statistical Summary, Exemplar Count
Statistical Summary 11: Exemplar
Count |
|||
Sample
Size, n: |
13 |
Range |
5 |
Mean |
3.86 |
Minimum |
3 |
Median |
3 |
1st
Quartile |
3 |
Midrange |
5.5 |
2nd
Quartile |
3 |
Variance |
2.80 |
3rd
Quartile |
4 |
Standard
Deviation |
1.67 |
Maximum |
8 |
5.3.4.2.
Count
of Exemplar Documents
Chart 12
displays the number of domain documents chosen for the DARE book analysis.
Table 12a Number of Domain Documents |
||||||||||||
5 |
3 |
3 |
7 |
3 |
4 |
4 |
19 |
3 |
3 |
7 |
3 |
3 |
Statistical Summary 12a, Exemplar
Document Count |
|||
Sample
Size, n: |
13 |
Range |
16 |
Mean |
5.15 |
Minimum |
3 |
Median |
3 |
1st
Quartile |
3 |
Midrange |
11 |
2nd
Quartile |
3 |
Variance |
19.47 |
3rd
Quartile |
5 |
Standard
Deviation |
4.41 |
Maximum |
19 |
5.3.5.
System Descriptions Count
Each
project contained system descriptions. This table shows the number of system
descriptions.
Table 12.b ,System
Descriptions Number and Type |
||||||||||||
3
fictitious |
NA |
3 |
5 |
3 |
2 |
3 |
6 |
3
paragraphs |
3
paragraphs |
3
paragraphs |
3
paragraphs |
3 |
The majority of the projects contained
system descriptions. A system template
is presented in Appendix D. Of this set,
three projects did not follow the system template; rather, these projects described
the system verbally in a paragraph or less.
92% of the projects contained system descriptions.
5.4.
Architectural Analysis
As part of the domain analysis,
participants undertook an architectural analysis of their domains. There were three points of data taken with
their analysis: the number of system
architectures in each DARE book, the types of architectures in this set, and
the word count of the architectural analysis section of the domain book.
5.4.1.
Number of system architecture
This
table contains the number of architectural images contained in each individual’s
DARE book.
Table 13a, Number of System Architectures |
||||||||||||
9 |
14 |
3 |
26 |
4 |
3 |
3 |
6 |
0 |
3 |
5 |
6 |
3 |
Statistical Summary 13: System
Architecture Count |
|||
Sample
Size, n: |
13 |
Range |
26 |
Mean |
6.53 |
Minimum |
0 |
Median |
4 |
1st
Quartile |
3 |
Midrange |
13 |
2nd
Quartile |
4 |
Variance |
46.26 |
3rd
Quartile |
6 |
Standard
Deviation |
6.80 |
Maximum |
26 |
5.4.2.
Types of Architecture
This
table lists the architectural type for each architectural image in the DARE
book.
Table 13.b, Types of Architecture |
||||||
1 |
2 |
3 |
4 |
5 |
6 |
7 |
3
flow chart 3 top down, 3 use case |
7 activity, 7 module |
Object
relationships. Class
diagram, |
Functional
flow and their actions |
functions
and connectors |
Pseudo
code (1), function and connectors (3) |
function
and connectors |
8 |
9 |
10 |
11 |
12 |
13 |
|
Flow (functions, connectors, data) |
class diagrams |
flow (functions and connectors) |
flow (functions, connectors) |
data flow, data flow, data flow,
top down, batch sequential |
class diagrams |
|
5.4.3.
Word Count
This
table shows the number of words for each projects architectural analysis
section.
Table 14.a, Word count
architectural analysis |
||||||||||||
852 |
115 |
0 |
1767 |
179 |
0 |
0 |
0 |
0 |
0 |
30 |
126 |
0 |
Chart 14
Statistical Summary 14:
Architectural Analysis word count |
|||
Sample
Size, n: |
13 |
Range |
1767 |
Mean |
236.07 |
Minimum |
0 |
Median |
0 |
1st
Quartile |
0 |
Midrange |
883.5 |
2nd
Quartile |
0 |
Variance |
265476.2 |
3rd
Quartile |
126 |
Standard
Deviation |
515.24 |
Maximum |
1767 |
Seven projects included only the
architectures diagram but contained no descriptive text.
Six projects included descriptive
text. The number of words ranged from 30
to 1767.
5.4.4.
System Feature Tables
Each analyst was to create feature
tables of his/her domain system exemplars.
The table below shows the number of tables included in his/her system
feature table section.
5.4.4.1.
Number of Tables
Table 14.b Number of System
Feature Tables |
||||||||||||
1 |
1 |
3 |
3 |
3 |
1 |
3 |
6 |
1 |
1 |
1 (3 combined) |
3 |
1 (3) |
Number of Features
This
table shows the number of features in each feature table.
Table 14.c, Number of Features in System Feature Tables |
||||||||||||
4 |
11 |
5,11,5 |
16x3,17x3,20x3 |
11,11,6 |
9x4, |
10,10,10 |
6 (6x1 tables) |
12 |
38x3 |
8 |
13,14,18 |
7 |
There was a great variety of
interpretations of what constituted a feature table.
5.5.
Vocabulary Analysis
The log of activity included the
vocabulary analysis activities. There
was a great deal of variety of interpretation as to what labels to use and what
constituted vocabulary analysis.
Therefore it is helpful to redisplay those elements that constitute the
vocabulary analysis phase. Due to the
variety, discussing them individually is impractical, but looking at the total
time spent on vocabulary analysis is.
5.5.1.
Time
invested in vocabulary analysis
Table
4, Total Time Spent on Vocabulary Analysis |
||||||||||||
4 |
21 |
56 |
12 |
36 |
26 |
NA |
3.00 |
NA |
NA |
3 |
20 |
5 |
Statistical Summary 4: Vocabulary
Analysis Time |
|||
Sample
Size, n: |
10 |
Range |
33 |
Mean |
15 |
Minimum |
3 |
Median |
16 |
1st
Quartile |
4 |
Midrange |
19.2 |
2nd
Quartile |
16 |
Variance |
129.55 |
3rd
Quartile |
21 |
Standard
Deviation |
11.38 |
Maximum |
36 |
Of this
group there is one outlier, whereby an individual spent 56 hours on vocabulary
analysis.
These times can be viewed a two
groups – those who spent 12 hours or more and those who spent five hours or
less
Table 4.b, Total time spent on vocabulary analysis |
|
<=5 |
3,3,4,5 |
>=12 |
12,
20,21,36 |
5.5.2.
Manual or Automatic
Vocabulary analysis consists of a
combination of automated and manual processes.
The following table displays whether an analyst uses automatic, manual,
or a combination of both methods.
Table 14.c, Vocabulary Analysis:
manual, automatic, both |
||||||||||||
Manual |
Automatic |
automatic |
both |
Automated |
both |
NA |
both |
both |
both |
both |
both |
manual |
5.5.3.
Original Set
Analysts begin with a raw set of
domain words to be analyzed. For those
who reported their original set, the figures are recorded below.
Table 14.c, Original Set Word
Count |
||||||||||||
|
|
thousands |
thousands |
thousands |
thousands |
thousands |
|
|
|
10790 |
|
|
5.5.4.
Number of words key word set
Following a series of automatic and
manual vocabulary analysis methods, analysts derive a key word set with which
to create facet tables. The number of
each projects key word set is shown in the following table.
Table 15, Number of word in
keyword set |
||||||||||||
39 |
939 |
650 |
34 |
4681 |
39 |
48 |
65 |
14 |
4117 |
32 |
NA |
188 |
Chart 15
Statistical Summary15: Key Word
Set Size |
|||
Sample
Size, n: |
12 |
Range |
4667 |
Mean |
903.83 |
Minimum |
14 |
Median |
56.6 |
1st
Quartile |
36.6 |
Midrange |
2347.5 |
2nd
Quartile |
56.5 |
Variance |
2.764544e+6 |
3rd
Quartile |
794.5 |
Standard
Deviation |
1662.692 |
Maximum |
4681 |
5.5.5.
Key word set, minus outliers
Four individuals’ vocabulary key
word sets contained a key word set greater than 600. This group skewed the group that derived
manageable set of vocabulary. This set
is listed below.
Table 16.a, Key word set |
||||||||||||
39 |
OUTLIER |
OUTLIER |
34 |
OUTLIER |
39 |
48 |
65 |
14 |
OUTLIER |
32 |
NA |
188 |
Chart 16
Statistical Summary 16: Log Entry
Count |
|||
Sample
Size, n: |
8 |
Range |
174 |
Mean |
57.375 |
Minimum |
13 |
Median |
39 |
1st
Quartile |
33 |
Midrange |
101 |
2nd
Quartile |
39 |
Variance |
2993.696 |
3rd
Quartile |
56.5 |
Standard
Deviation |
54.71 |
Maximum |
188 |
5.5.6.
Presence
of Word Clusters
Table
16.b displays the number of projects that used word clusters in their
vocabulary analysis.
Table 16.b, Presence of word
clusters |
||||||||||||
0 |
0 |
0 |
0 |
0 |
1 |
1 |
0 |
0 |
0 |
1 |
0 |
0 |
Out of
the 13 projects, 3 or 23% arranged keywords in clusters.
5.5.7.
Facet
Table
Vocabulary analysis involves
organizing the domain’s key words into facet of commonality and
variability. This is displayed with a
facet table. This section shows
measurements of the facet table.
5.5.7.1.
Number
of facets
The key word set, ideally, is used
to create a single facet table; this word set represents the commonalities and
variabilities of the set of domain systems being analyzed. Below is the number of facets in the feature
tables.
Table 17.a, Number of Facets |
||||||||||||
7 |
8 |
3 |
4 |
3 |
7 |
10 |
6 |
6 |
6 |
4 |
NA |
4 |
Chart 17
Statistical Summary 17: Facet
table facet count |
|||
Sample
Size, n: |
12 |
Range |
7 |
Mean |
5.66 |
Minimum |
3 |
Median |
6 |
1st
Quartile |
4 |
Midrange |
6.5 |
2nd
Quartile |
6 |
Variance |
4.60 |
3rd
Quartile |
7 |
Standard
Deviation |
2.14 |
Maximum |
10 |
5.5.7.2.
Variability
A facet table’s variability is the
number of columns in a facet table. The following table shows the number of
variable elements in each domain’s facet table.
Table 17.b, Facet Table variable
count for each column |
||||||
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8.2.
variability |
NA |
3,3,6,6,8,6,4 |
7,2,3,2,2,2,2,2 |
6,14,3 |
3,3,4 |
3,3,3,4,4,3,3 |
8 |
9 |
10 |
11 |
12 |
13 |
|
3,3,3,4,4,3,3 |
2,2,4,3,6 |
23,23,9,12,10,18 |
2,3,2,11,4,3 |
3,5,7,3,2,3 |
2,2,5,2 |
|
5.5.7.3.
Facet table complexity - Product
A facet
table’s commonalities and variabilities are a representation of the number of
feature choices available for a given generic system. This complexity can be measured by the product
of each facet table’s variable elements.
Table 18.a, Facet Table Complexity
(Product) |
|||||||||||||
62208 |
1344 |
252 |
9216 |
36 |
3888 |
288 |
10283760 |
1584 |
1890 |
40 |
NA |
504 |
|
Table 18.a contains two outliers in
project one and eight. Table 18.b and
chart 18, below, display the facet complexity with these two outliers removed.
Table 18.b Facet Table Complexity
(Product) |
||||||||||||
OUTLIER |
1344 |
252 |
9216 |
36 |
3888 |
288 |
OUTLIER |
1584 |
1890 |
40 |
NA |
504 |
Statistical Summary 18: Facet
Complexity by Product |
|||
Sample
Size, n: |
10 |
Range |
9180 |
Mean |
1904.2 |
Minimum |
36 |
Median |
924 |
1st
Quartile |
252 |
Midrange |
4626 |
2nd
Quartile |
924 |
Variance |
8.009142e+6 |
3rd
Quartile |
1890 |
Standard
Deviation |
2830.043 |
Maximum |
9216 |
5.5.7.4.
Facet table complexity - Sum
A second way to capture each facet
table’s complexity is to study the sum of each column’s variability count.
Table 19, Facet Table Complexity
(Sum) |
||||||||||||
36 |
22 |
23 |
38 |
10 |
23 |
17 |
95 |
25 |
23 |
11 |
NA |
20 |
Chart 19
Statistical Summary 19: Facet
Table Complexity (Sum) |
|||
Sample
Size, n: |
12 |
Range |
85 |
Mean |
28.58 |
Minimum |
10 |
Median |
23 |
1st
Quartile |
18.5 |
Midrange |
52.5 |
2nd
Quartile |
23 |
Variance |
506.08 |
3rd
Quartile |
30.5 |
Standard
Deviation |
22.49 |
Maximum |
95 |
5.5.8.
Template
The template is derived, ideally,
from the facet table. The template is
comprised of descriptive works and variables.
5.5.8.1.
Words
Chart 20, below displays the number of words in
each template description
Table 20, Word Count of the
Template Description |
||||||||||||
52 |
0 |
66 |
95 |
29 |
224 |
104 |
32 |
55 |
88 |
15 |
NA |
83 |
Statistical Summary 20: Number of
Template Words |
|||
Sample
Size, n: |
12 |
Range |
224 |
Mean |
70.25 |
Minimum |
0 |
Median |
60.5 |
1st
Quartile |
30.5 |
Midrange |
112 |
2nd
Quartile |
60.5 |
Variance |
3418.56 |
3rd
Quartile |
91.5 |
Standard
Deviation |
58.46 |
Maximum |
224 |
5.5.8.2.
Template
Variable Word Count
This
section considers the number or variables in the template description.
Table 21, Variable count of
template |
||||||||||||
7 |
0 |
12 |
12 |
3 |
32 |
10 |
6 |
6 |
20 |
3 |
NA |
16 |
Chart 21
Statistical Summary 21: Template
Variable Count |
|||
Sample
Size, n: |
12 |
Range |
32 |
Mean |
10.58 |
Minimum |
0 |
Median |
8.5 |
1st
Quartile |
4.5 |
Midrange |
16 |
2nd
Quartile |
8.5 |
Variance |
78.44 |
3rd
Quartile |
14 |
Standard
Deviation |
8.85 |
Maximum |
32 |
5.6.
Code
Analysis
Of those projects that had software
analysis, the following sets of data describe the time invested and the methods
used.
5.6.1.
Time
Log Entries for Software Analysis
Table
21.b redisplays the projects’ time log entries for code analysis.
Table 21.b, Time Log Entry of Code
Analysis |
||||||||||||
30 |
NA |
3 |
2 |
2 |
4 |
5 days |
1 |
NA |
NA |
3 |
NA |
NA |
5.6.2.
Presence
of Source Analysis
The following table contains a 1 for
those whose project contained vocabulary analysis and 0 for those who did not.
Table 21.c Was their code
analysis? |
||||||||||||
1 |
0 |
0 |
1 |
0 |
1 |
1 |
1 |
0 |
1 |
0 |
0 |
0 |
5.6.3.
Method
This
table indicates if the code analysis was performed manually or with some
automated tools.
Table 21.d Vocabulary analysis:
manual or automated? |
||||||||||||
M |
NA |
M |
A |
NA |
A |
A |
A |
NA |
A |
NA |
NA |
NA |
5.7.
Generic Architectures
From the collection of system
architectures, part of the DARE process is to derive a small number of generic
architectures, representing the domain.
5.7.1.
Number of words
This table contains the number of
words in the DARE book devoted to the generic architecture section.
Table 22 Number of words in
Generic Architecture section |
||||||||||||
349 |
0 |
403 |
91 |
0 |
0 |
136 |
0 |
300 |
24 |
463 |
75 |
0 |
Chart 22
Statistical Summary 22: Word Count
Generic of Architecture |
|||
Sample
Size, n: |
13 |
Range |
688 |
Mean |
155 |
Minimum |
0 |
Median |
77 |
1st
Quartile |
0 |
Midrange |
344 |
2nd
Quartile |
77 |
Variance |
42133 |
3rd
Quartile |
171 |
Standard
Deviation |
205.26 |
Maximum |
688 |
5.7.2.
Diagram count
This diagram contains the number of
generic architectures in the DARE book representing the domain.
Table 23.a, Architectural Diagram
Count |
||||||||||||
1 |
4 |
1 |
2 |
1 |
1 |
1 |
1 |
2 |
1 |
2 |
2 |
1 |
Statistical Summary 23: Number of
Generic Architectures |
|||
Sample
Size, n: |
13 |
Range |
3 |
Mean |
1.53 |
Minimum |
1 |
Median |
1 |
1st
Quartile |
1 |
Midrange |
2.5 |
2nd
Quartile |
1 |
Variance |
0.76 |
3rd
Quartile |
2 |
Standard
Deviation |
0.87 |
Maximum |
4 |
5.7.3.
Type of Architecture
This
table shows the architectural type in each project’s Generic Architecture
section
Table 23.b, Type of Generic
Architecture |
||||||
1 |
2 |
3 |
4 |
5 |
6 |
7 |
functions and connectors … 13
functions (depth 4) |
3 module, 1 activity |
api , state and function |
See note |
functions and connectors (12
functions depth 3) |
functions and connectors |
Flow (function connectors data) |
8 |
9 |
10 |
11 |
12 |
13 |
|
flow (function connectors data) |
abstract flow |
function and connectors |
NA |
top down |
NA |
|
5.8.
Generic
Feature Table
The generic feature table combines
the system feature tables into one table.
Columns represent the systems, and rows enumerate the system features.
Each feature contains points of variability.
The following three tables consider these measurements for project
generic feature tables.
5.8.1.
Presence
of a generic feature table.
This
table shows which projects contains a generic feature table. Nine of the
thirteen projects, or 69% contained a generic feature table.
Table 23.c, The number of tables
of the generic feature table. |
||||||||||||
1 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
5.8.2.
Number
of features
This table shows the number of
features contained in each project’s feature table.
Table 23.d, number of features |
||||||||||||
3 |
0 |
18 |
3 |
3 |
9 |
8 |
5 |
4 |
8 |
8 |
25 |
NA |
5.8.3.
Points
of variability
The
generic feature table is comprised of points of variability for each feature. Where
possible a discrete number is displayed.
Other projects use verbal descriptions.
Table 23.3, Generic feature table
points of variability |
||||||
1 |
2 |
3 |
4 |
5 |
6 |
7 |
Variability
described as "characteristics" difficult to measure. |
NA |
yes/no |
No table
just a sentence, The Fiestal Network pattern can be generalized to allow for
variable block size, key size, number of rounds, and the particular actions
in the function f. Variability is hard
to measure as the function f is not tightly defined |
verbal
description of each feature, not able to measure |
4 |
3 |
8 |
9 |
10 |
11 |
12 |
13 |
|
yes/no |
yes/no |
2 … 8 |
yes/no |
yes/no |
NA |
|
5.9.
Domain
Implementation
Phase two of domain engineering is
domain implementation. The following tables consider the number of projects
that had a domain implementation section and the types of reusable assets
created.
5.9.1.
Presence
of domain implementation.
This table places a 1 where a
reusable component was developed and a 0 where it was not.
Table
24.a ,Was it developed |
||||||||||||
1 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
1 |
0 |
1 |
1 |
Ten of the thirteen, or 76% of the
projects, contained a reusable asset of some kind.
5.9.1.1.
Parts
Based Reusable Asset(s)
This table places a 1 where a parts
based reusable asset was developed and a 0 where it was not.
Table 24.b, Reusable Asset(s) |
||||||||||||
1 |
NA |
1 |
1 |
1 |
1 |
1 |
1 |
NA |
1 |
NA |
1 |
1 |
Ten of the thirteen project, or 69%
of the projects, implemented a parts based reusable component.
5.9.1.2.
Code
Generator
This table places a 1 where a code
generator was developed and a 0 where it was not.
Table 24.c, Code Generator |
||||||||||||
0 |
NA |
0 |
1 |
0 |
0 |
1 |
1 |
NA |
1 |
NA |
0 |
0 |
Four of the 10 projects that had a
section from domain implementation created a domain specific code generator
5.9.1.3.
Little
Language
This table places a 1 where a little
language was developed and a 0 where it was not.
Table 24.d, Domain Specific Formal
Language (Little Language) |
||||||||||||
1 |
NA |
0 |
0 |
0 |
0 |
1 |
1 |
NA |
0 |
NA |
0 |
0 |
Three of the ten project or 33%
developed a little language.
5.9.1.4.
Language
This table contains the language of
the projects reusable asset.
Table 24.e, Language |
||||||||||||
java |
NA |
java |
C |
perl |
.NET |
java |
perl |
java |
vb |
NA |
NA |
java |
5.10.
Tables
Each DARE
book contains glossary, synonym, thesaurus, bibliography and index tables. The
following section considers the size of each of these tables and provides
statistical summaries.
5.10.1.
Glossary
Word Count
Table 25.a, Glossary – Number of Elements |
||||||||||||
20 |
117 |
10 |
30 |
0 |
9 |
0 |
0 |
0 |
4 |
0 |
5 |
0 |
Percent
Contained a Glossary: 54%
Statistical Summary 25.a :
Glossary Elements Count |
|||
Sample
Size, n: |
7 |
Range |
113 |
Mean |
27.85 |
Minimum |
4 |
Median |
10 |
1st
Quartile |
5 |
Midrange |
60.5 |
2nd
Quartile |
10 |
Variance |
1629.81 |
3rd
Quartile |
30 |
Standard
Deviation |
40.37 |
Maximum |
117 |
5.10.2.
Synonym
Word Count
Table 25.b, Synonym Table – Number
of Elements |
||||||||||||
11 |
0 |
3 |
0 |
3 |
13 |
0 |
4 |
0 |
4 |
0 |
0 |
8 |
Percent
Contained a Synonym Table: 54%
Of those who had synonym tables:
Statistical Summary 15.b : Synonym
Table |
|||
Sample
Size, n: |
7 |
Range |
10 |
Mean |
6.5 |
Minimum |
3 |
Median |
4 |
1st
Quartile |
3 |
Midrange |
8 |
2nd
Quartile |
4 |
Variance |
16.95 |
3rd
Quartile |
11 |
Standard
Deviation |
4.11 |
Maximum |
13 |
5.10.3.
Thesaurus
Word Count
Table 25.c, Thesaurus – Number of
Elements |
||||||||||||
12 |
0 |
12 |
0 |
0 |
14 |
0 |
22 |
0 |
0 |
2 |
0 |
8 |
Percent
contained a thesaurus: 46%
Statistical Summary 25.c:
Thesaurus Elements Count |
|||
Sample
Size, n: |
6 |
Range |
20 |
Mean |
11.66 |
Minimum |
2 |
Median |
12 |
1st
Quartile |
8 |
Midrange |
12 |
2nd
Quartile |
12 |
Variance |
43.86 |
3rd
Quartile |
22 |
Standard
Deviation |
6.62 |
Maximum |
22 |
5.10.4.
Bibliography
Element Count
Table 25.d, Bibliography – Number of Elements |
||||||||||||
11 |
0 |
7 |
7 |
0 |
6 |
6 |
20 |
|
3 |
9 |
4 |
0 |
Percent Contained a Bibliography: 69%
Statistical Summary 25.d: Bibliography
Elements Count |
|||
Sample
Size, n: |
9 |
Range |
17 |
Mean |
8.11 |
Minimum |
3 |
Median |
7 |
1st
Quartile |
6 |
Midrange |
11.5 |
2nd
Quartile |
7 |
Variance |
25.61 |
3rd
Quartile |
8 |
Standard
Deviation |
5.06 |
Maximum |
20 |
5.10.5.
Index
Word Count
Table 25.r, Index entry count |
||||||||||||
11 |
0 |
37 |
0 |
0 |
55 |
35 |
6 |
0 |
0 |
0 |
0 |
0 |
Eight of
the thirteen projects or 61% had an index in the book.
6.
Failure Point Observations
The following discussion highlights
failure points observed in the domain engineering projects. These comments are
based on the data of section 5, the surveys, and participant demographic
information. Individual project examples
are given throughout to provide a broader understanding of the challenges faced
by the domain analysts.
6.1.
Variation
As noted in section 3 all
participants of this study undertook domain engineering and the creation of a
DARE book for the first time. With this
consideration, there was understandably a good deal of variety of approaches to
the project. Quantitative variations include the size of the DARE book and the time
spent on each section of the process.
6.1.1.
Book
Size
Measuring the size of the DARE book
proved difficult, as there was no consensus as to whether the source code and/or
data of the vocabulary analysis phase was expected to be in the book. The two extremes were the all inclusive DARE
book, where all data relative to the project went in the book, and the outline
form, where all of the book’s sections contained references to artifacts and
data in outside locations. In one case,
a DARE book measuring 422 pages was reduced to 47 after removing the source
code and the vocabulary analysis of the exemplar system. The book was further
reduced to 20 pages after removing a source code analysis section that listed
all the exemplar systems classes and functions. This does not necessarily have
any implications for the overall success of the project, but in terms of
establishing a reusable methodology, the variation of size makes comparable
measurements of size difficult. Comparing and contrasting book size as an
indicator of success would be an appropriate area for further study.
6.1.2.
Activity
Log
There was much variety of
interpretation of recorded activities and what titles should be given to those
activities. One individual recorded his/her time in days rather than hours. Two
individuals omitted time recording all together. The number of elements entered
varied from 3 to 19, with a median of 8.5.
Finding general themes is difficult
if one looks only at individual time recordings.
It is valuable, however, to look at
the total time spent on the DARE project.
Grouping subsections together also proved helpful. These include:
1.
Domain
Preparation: selecting a domain, gathering sources, and scoping.
2.
Vocabulary
analysis: gathering all domain sources
to construct a general vocabulary, conducting analysis to derive a key word
set, deriving cluster tables, facet tables, and template.
3.
Domain
implementation: creating a reusable software component or generator.
6.1.3.
Omission
of Book Sections
As highlighted in the various tables
of section 5, there was a considerable variety to the completeness of each DARE
book.
6.2.
Time
invested in the project
Those individuals who spent less
than 70 hours on their projects achieved less with their projects. Those that spent less than 70 hours had
either poorly scoped domains or their domains were too small. Those who spent
less than 5 hours creating reusable assets did not create any truly useful
product. In addition, those who spent
five hours or less on vocabulary analysis either did not use any automatic
tools and did not select vocabulary based upon system knowledge, or they only
used automated tools; therefore, they did not create a set that was useful.
Total time spent on DARE project |
||||||||||
13 |
49 |
53 |
65 |
65.25 |
70 |
75 |
122 |
126 |
176 |
188 |
Total time spent on vocabulary
analysis |
|
<=5 |
3,3,4,5 |
>=12 |
12, 20,21,36 |
Time spent implementing Reusable
Assets |
|
<= 5 |
5,4,5,5,3,0.5 55, 55, 67,80 |
>=50 |
6.3.
Domain
Preparation
Those projects that had difficulty
scoping the domain, creating feature tables, and deriving a template through
vocabulary analysis tended to choose domains that were either too big or too
small. Many of those projects failed to clearly scope the domain, which
resulted in a “vague” domain.
6.3.1.
Domain
Choice and Scope
Small domains have few facets and points
of variability. This leads to an inability to construct generic architectures
with adequate features, resulting in trivially small reusable components.
One project focused on a large
software system of a chosen domain.
Actually in one large program, systems exemplars were modules of the
application. Domain scope was therefore artificial. This small domain choice
constricted the exercises processes for the rest of the project. System architectures, feature tables, and
facet tables all had very little variability.
The result was a project with a generic architecture with very little
utility.
A domain that is too big will have
too many facets and points of variability.
There will also be many more possible exemplar systems to include in the
analysis. Large domains may also be very
complex as well, making the analysis phases more difficult.
Subjectively, none of the projects
sampled chose domains that were too big to be scoped. Some projects, however, did have difficulty
scoping the domain to a manageable level.
One project, refined the domain’s
scope iteratively, during the analysis phase. Beginning with the domain, “Symmetric Encryption
Algorithms”, the author considered the complexity of the subject along with the
set of more than 30 algorithms, and refined the domain to “Fiestal Block” symmetric
algorithms. This scope was later further
restricted by discarding consideration of the cryptographic strength of the
algorithm, and focusing on capturing the commonalities and variabilities of the
Fiestal Block architecture.
If a domain is not clearly scoped,
it can affect the rest of the domain analysis process.
Four of the thirteen projects had no
domain scope statement. Of the Nine that
did have domain sections, only 3 or 33% used set notation to describe their
domains. Others relied on verbal descriptions.
One used verbal descriptions and pseudo code.
Those projects that had no domain
scope section tended to be less successful in deriving facets and tables in the
vocabulary analysis. Their reusable
assets also tended to be smaller or non existent.
One project
chose the domain of web based logging software or the “Blog Domain”. Although there were exemplar systems to
analyze, the project contained no domain scope section. Later in the system analysis section the
author commented, the domain is not very
complex but rather young and undiscovered. Since the domain is young, a lot of
the components developed by applications have not been completed and released
which will limit the understanding of the domain…” Whether the result of poor
domain scope, or a domain that is too new to capture; the end result is a
domain that is difficult analyze.
6.3.2.
Exemplar
Choices
6.3.2.1.
Finding
Exemplars
One problem observed by several
subjects was that they were unable to locate exemplar systems. Reasons for this
included trouble searching the web and lack of domain familiarity.
One
individual said, “It took me a long time to decide on which systems to analyze
because most of the open source systems I found on Internet do not have proper
design and architecture information.
After performing me research for several weeks, I went to Dr. Frakes for
help with picking the systems, and he pointed out the ccount system, which has
proper documentations and architecture design description. I picked the other two systems based on their
implementation language, C. I have spent
about 35-40 s in researching and reading over the materials and debating on
whether or not to choose the systems for domain analysis.”
This
individual recorded 38 hours gathering exemplar sources. With a total project
time of 126 hours, this means this project invested 30% of its time finding
exemplar sources.
Class surveys reveal that this is a
recurring theme in the work place as well. In response to the question, “What
problems have you had when trying to reuse software?” Responses are as follows:
o
“Haven’t
established a well organized reuse library. Hard to find codes to reuse, need
experienced programmer to point out the existence of similar code.”
o
“Finding
it””
o
“I
have not reused software before.”
o
“Not
knowing where to find reusable components”
o
“No
pre-existing components”
Domain analysis is based on the
extraction of information from previously built systems. If these exemplars are
inadequate, then the resultant domain models and implementations will be poor.
One of the projects in the software
metrics domain was based on three open source systems found at Sourceforge. The
domain analyst reported that these systems were not good exemplars because they
were poorly coded and had no maintenance support. The comments found in the following project
notations reveal much about the quality of these projects, and hold good lessons
for future analysis.
The author
of one project “…explicitly disclaims future support saying that (the project)
was merely an artifact or byproduct of his academic research. “
Another
stated the project “started as a VC++ implementation produced in my spare time
over a 2 week period …".
Incidentally, this project was removed from SourceForge sometime during
the 14 week period after it was downloaded for the project.
With regard
to selecting these sources, the domain analyst reported: “Three software metrics projects were
selected from SourceForge http://sourceforge.net/ as candidates.
Selection criteria included their self-declaration as a software metric
application, relatively recent age (none are over five years old).”
Each of the
System Descriptions of the SourceForge exemplars listed 5 years as the years of
experience of the authors. One exemplar
stated, “It was developed “… to scratch his own itch and learn a bit of Java at
the same time.”
6.3.2.2.
Finding
Analysis Tools
Some analysts expressed difficulty
finding analysis tools to use in the project.
This problem falls in the same category as finding exemplars. It is,
however, a particular problem to the analysis stages of the DARE process –
vocabulary analysis, architectural analysis, and software analysis.
One project in particular reported
spending 50 hours looking for “good tools”. Considering the total project time
was 193 hours, 26% of time was spent searching for tools. Another 65.25 hour
project spent 16.5 hours looking for tools, or 25% of the project time.
6.4.
Source
Analysis
Of the 13 projects, 6 or 46% had
code analysis sections in their project.
Recorded hourly times of software
analysis were 1, 2,2,3,3,4,30, and “5 days.”
6.4.1.
No
Analysis
One project did not do analysis
because of the poor exemplar quality.
“Due to the differing intents of the authors, their differing Java skill
levels, and their differing use of program and GUI generators, we find it
difficult to perform a direct code analysis of the three programs and wouldn't
use any of them in a real-world application”.
Another stated he/she did not code analysis because they could not find the source code for two of their exemplars.
6.4.2.
Analysis
Problems
Of those
who performed some degree of software analysis, problems came from several
sources.
Lack of
good tools for software analysis caused some individuals problems. One
project indicated that they did not analyze the all the sources, because they
used an application trial version that limited their use.
Yet another
did not find satisfactory analysis tools, and concluded the exercise was
fruitless. As recorded in the DARE book:
” Both the Vanilla and Align CMU
tools were written in C/C++. The
“I do not believe this phase of the
Domain Analysis gave me a deeper understanding of the domain systems
commonalities and variabilities. This may be largely due to the fact that the
systems are simple from a structural point of view (call and return) and they
are small size (~ 1 KLOC).”
One subject did not use any tools,
but manually inspected the source code, looking at the architecture similarity,
code style and quality, location of modules, reusability potential.
Of the five
individuals using software analysis tools,
Source code analysis was, in
summary, one of the weaker stages of the DARE process. Possible reasons for this and solutions are
left to section 7.
6.5.
Vocabulary
Analysis
Vocabulary analysis encompasses the
entire process of analyzing domain source vocabularies to select a key word
set. Disciplines in this stage include various lexical analysis techniques such
as conflation, stemming, and frequency analysis as well as creating word
clusters to produce a facet table and template.
6.5.1.
Deriving
the key word set
Vocabulary
analysis strategies varied across the projects.
In summary in the sampled projects,
there is much room for improvement towards teaching analysts to properly use lexical
techniques.
6.5.2.
Cluster
Tables
There were three projects that
incorporated cluster tables in their analysis.
Of these, the exercise appears to be relevant, and contribute to the creation
of a facet table.
These three successes are in
contrast to the remaining 77% of the projects that did not include cluster
tables in their book.
6.5.3.
Facet
Tables
Regardless of methodology, all 13
projects derived facet tables. Within
this set, there are variations of validity and quality.
One project produced one facet table
for each of its three exemplars, failing to combine the systems into a single
facet table. This represents a
misunderstanding of the purpose of a facet table.
Another project whose domain was one
single large program did not derive a facet table, but rather reproduced the
generic feature table. In this case a
facet table may have not been possible due to the constricting domain.
Facet Count |
7 |
8 |
3 |
4 |
3 |
7 |
10 |
6 |
6 |
6 |
4 |
NA |
7 |
8 |
6.5.4.
Templates
Eleven of the thirteen projects, or
85%, had templates. The table below
illustrates the number of words for each template and the number of
variables.
Word Count |
52 |
NA |
66 |
95 |
29 |
52 |
104 |
32 |
55 |
88 |
15 |
NA |
83 |
52 |
Variables |
7 |
NA |
12 |
12 |
3 |
8 |
10 |
6 |
6 |
20 |
3 |
NA |
16 |
7 |
Two templates that have three
variables might be considered too small.
Two templates with 16 and 20 points of variability might be too
complex.
What is valuable in terms of
observing points of failure is to compare the number of facets with the number
of variables in the templates. Comparing
the two tables one can see that four of the projects more than doubled their
number of variables between the facet table and template. One increased the
variable count by one, and another decreased the variable count by one.
|
7 |
8 |
3 |
4 |
3 |
7 |
10 |
6 |
6 |
6 |
4 |
NA |
7 |
|
7 |
NA |
12 |
12 |
3 |
8 |
10 |
6 |
6 |
20 |
3 |
NA |
16 |
This reveals that in the minds of
the domain analysts, there is not necessarily a one to one mapping of the facet
table to the template, though there should be.
6.6.
System
Analysis
6.6.1.
System
Feature Tables
There was little consensus as to how
to represent system and generic feature tables of a domain. As seen in the
tables below, although all participants had three or greater exemplars, 7 of
the 13 had one system feature table and four of the 13 did not create a generic
feature table.
Number of exemplars |
3 |
7 |
3 |
3 |
3 |
4 |
3 |
8 |
4 |
3 |
3 |
3 |
3 |
Number of system tables |
1 |
1 |
3 |
3 |
3 |
0 |
3 |
6 |
1 |
1 |
1 |
3 |
1
|
Generic
Feature Table |
1 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
0 |
The descriptive table below captures
the different groups as they created system and generic feature tables.
Analyst |
System Feature Tables |
Generic Feature Tables |
3,8,12 |
Three tables, each of a different
system |
One table with systems as columns. |
5,7 |
Three tables, each of a different
system |
One table, features and
variability. ( no system names) |
4 |
Three tables, each of a different
system |
None. A descriptive sentence. |
10 |
One table with systems as columns. |
One table, features and
variability (no system names) |
2,9,13 |
One table with systems as columns. |
None |
11 |
One table with systems as columns. |
One table with systems as columns. |
1 |
One table, features and
specification. (no system names) |
One table, features and specification.
(no system names. |
6 |
None. Referenced the generic feature
table |
One table with systems as columns. |
With the exception of group
(3,8,12), there was in general a misunderstanding that 1) system tables are
distinct tables, each representing a system of the domain, and 2) generic
feature tables combine these systems in to one table, but contain column
headers of each system. To this end, group (5, 7) were correct on system
feature tables, but reduced the generic feature table to a single table without
system names. Group 10 and 2,9,13
produced a correct system table, but labeled them as system feature tables,
leaving them with nothing for the generic feature table.
6.7.
Architectural
Analysis
System
Architecture Count |
9 |
14 |
3 |
26 |
4 |
3 |
3 |
6 |
0 |
3 |
5 |
6 |
3 |
9 |
Generic
Architecture count |
1 |
4 |
1 |
2 |
1 |
1 |
1 |
1 |
2 |
1 |
2 |
2 |
1 |
1 |
Architectural analysis, both at the
system architectural level, and the generic level had representation in all
projects. Subjectively this seems to be
one of the more successful aspect of the DARE book. There may be some points of failure in terms
of architectural validity or failures to map architectures to implementation,
but this will be left to a late study.
6.8.
Supplementary
Material
Although the majority of the
projects contained a glossary, synonym table, thesaurus, and index, these
elements were most likely an afterthought, rather than an important part of the
domain analysis. Reviewing the
statistical summaries of these elements it is evident that as a group their
priority was low. Only project 1 and
project 6 included all the tables in their project.
Glossary – Number of Elements |
||||||||||||
20 |
117 |
10 |
30 |
0 |
9 |
0 |
0 |
0 |
4 |
0 |
5 |
0 |
Percent
Contained a Glossary: 54%
Average
number of elements of those who had glossaries: 27.8
Minimum: 4
Maximum: 117
Median: 10
Range 113
Synonym Table – Number of Elements |
||||||||||||
11 |
0 |
3 |
0 |
3 |
13 |
0 |
4 |
0 |
4 |
0 |
0 |
8 |
Percent
Contained a Synonym Table: 54%
Of those who had synonym tables:
Average: 6.14
Median: 4
Minimum: 3
Maximum: 11
Range 10
Thesaurus – Number of Elements |
||||||||||||
12 |
0 |
12 |
0 |
0 |
14 |
0 |
22 |
0 |
0 |
2 |
0 |
8 |
Percent
contained a thesaurus: 46%
Average: 10.14
Median: 12
Minimum: 2
Maximum: 22
Range 20
Index – Number of Elements |
||||||||||||
11 |
0 |
37 |
0 |
0 |
55 |
35 |
6 |
0 |
0 |
0 |
0 |
0 |
Percent
contained an index: 30%
Average: 10
Median: 11
Minimum: 2
Maximum: 22
Bibliography – Number of Elements |
||||||||||||
11 |
0 |
7 |
7 |
0 |
6 |
6 |
20 |
|
3 |
9 |
4 |
0 |
Percent Contained a Bibliography: 69%
Average: 28.8
Median: 35
Minimum: 6
Maximum: 55
Range: 49
6.9.
Domain
Implementation
Ten of
the thirteen projects implemented some sort of reusable asset.
Of these
ten, all created a reusable component.
Four of these ten implemented a code generator and three created some
aspect of a little language.
Why
specific projects failed to create reusable assets is uncertain and requires
further study..
6.9.1
Those
with no programming experience, but otherwise excellent analyses, failed to
produce working code.
Conflation
(1)
·
Had
well established exemplars and documentation, Sought expert advice - contacted
the author of one project for clarification, used paragraph explanations for
system architectures, domain scope consisted of verbal descriptions and set
notation,
·
Code
analysis consisted of verbally describing each step of the algorithm, class
diagrams, and activity diagrams, and static metrics,
·
Architectural
analysis consisted of pseudo code for each algorithm, top-down functional
diagrams, and a use case diagram
·
Implementation
consisted of three parts based functions designed to performed simple string
manipulations. This was not working code, but represented a best effort from a
non-programmer with a good grasp of the domain.
·
The
little language as well was not operational, but rather was a BNF representation
of the syntactical elements of a little language.
Conflation
(11)
·
Well
established exemplars and documentation.
·
Complete
system architectures
·
Combined
system feature tables into one table, listing systems in columns.
·
Domain
scope consisted of word descriptions and set notations
·
Implemented
word clusters
·
Correct
Facet table
·
Correct
template
·
No
code analysis
·
Two
good generic architectures - a top down functional architectural, and decision flow.
·
Correct
generic feature table.
·
Implementation
consisted of eight conflation functions described in pseudo code.
6.9.2
Those
who had great difficulty in analysis either had no code, or created relatively
simple functions
AHLTA Longitudinal
Domain (2)
·
Chose
a domain that was actually a large program,
·
Systems
were program modules,
·
Had
no domain scope,
·
Confused
the facet table with a system table,
·
Had
no template …
·
No
implementation.
Simple
Metrics (5)
·
Fifteen
years experience,
·
Chose
exemplars of poor quality produced by authors with less than 5 years
experience,
·
Failed
to do any code analysis,
·
Had
generic feature table with only 3 features,
·
Architectural
analysis was lacking as the exemplars had no architectural diagrams.
·
Generated
a reusable asset that counted lines of code and comments.
Blog
Domain (9)
·
Picked
a new domain with little standardization
·
Immature
exemplars,
·
Chose
vocabulary manually,
·
No
code analysis (had five different languages in the domain),
·
No
system architecture,
·
Minimal
generic architecture,
·
Implementation
consisted of an abstract data type high level description.
Static
Code Metrics (12)
·
One
year of programming experience,
·
Spent
38 hours searching for exemplar systems Turned to system expert for help.
·
Had
three exemplars and three sets of documents.
·
Exemplars
did not have architectures, so studied algorithms and created architectures
manually. Produced a top-down functional
architecture and data flow diagrams
·
Misunderstood
facet tables and created three facet tables.
·
Misunderstood
templates and created three templates
·
Created
a reusable component
7. Failure Points Summary
Looking at
the previous collection of specific failure points, it is possible to make few
general statements about the categories of failure.
7.1.
A
new process leading to misunderstood concepts
For all the
participants, this was their first exercise in both domain engineering and creating
a DARE book. As such, subjects were both
learning the domain of domain engineering and their chosen domain. Many of the concepts, key to domain analysis,
were new and were prone to misunderstanding.
7.2.
Shortcomings
of skill set
Irrespective
of an analyst’s experience with the DARE methodology, domain engineering is a
multi-disciplinary process, and any one individual may not have all the
requisite skills necessary to complete all its tasks. Some individuals with MIS
backgrounds had a very thorough analysis phase and then came up short on
implementation. Likewise, some CS
individuals who might rather be coding shortchanged many of the analysis steps,
and produced relatively well developed reusable assets.
7.3.
Interdependent
Stages
The
following failure mode diagram captures many of the failure points discussed.
This fish-bone diagram illustrates how failings upstream of the process
affected subsequent stages.
8. Implications for Success
Having
taken a failure modes approach to the DARE exercises, highlighting principles
of success observed in the data will provide a more complete picture. Suggestions
for improvement in section 9 are based on both failures and successes.
8.1.
Success is a function of Time
We observed that those who spent less
than 70 hours on their project had weaker projects than those who spent greater
than 120 hours all found success.
Total time spent on DARE project |
|
<=75 |
13,49, 53, 65, 65.25, 75 |
> 120 |
122, 126, 176, 188 |
Domain Preparation
Considering the set of operations
from selecting domain sources and scoping the domain, we find a similar
pattern. Those who spent more time found
success, both in this first stage of the project as well as the overall
project.
Preparing the domain ( choosing, finding exemplars, scoping) |
|
< 20 |
7, 9, 14, 15, 18.25 |
> 30 |
33, 50, 52, 74 |
Vocabulary Analysis
Vocabulary analysis is a large portion of a project. The journey from documentation and software
vocabulary to a template involves many skills and as such was the section with
the most errors. This being said, those with stronger overall projects, also
invested more time in vocabulary analysis.
Vocabulary Analysis |
|
< 10 |
3,3,4,5 |
> 20 |
20, 21, 29, 36 |
Reusable Asset
It has
already been observed that there were some individuals without programming
experience. Excluding these from
consideration, and comparing the reusable assets of those who reported
programming skills, more substantial reusable assets were developed by those
who spent more than 20 hours.
Time spent implementing Reusable
Assets |
||
<= 5 |
3, 4, 5, |
|
>=24 |
24, 55, 67, 70,80 |
|
8.3
Success
is a function of persistence
As said before there was a great
variety in the time recording process.
It is assumed that if someone recorded a time that it represents a
significant investment of time on the project.
Viewing the outliers, therefore,
adds more weight to their importance. Outliers may indicate areas where
individuals struggled through a new area, or in the case of programmers, it may
indicate predisposition to programming.
What is significant is that the projects that contain time outliers were
among the set of projects that were more successful. The clear message is that
success comes to those who persist through difficult areas.
Time outliers |
|||||
Finding good tools |
|
50 |
|
|
|
Gathering Source Information |
38 |
|
|
|
|
Study of domain |
|
|
46 |
|
|
Architectural Analysis |
|
60 |
|
|
|
Vocabulary Analysis |
|
|
|
30 |
|
Architectural Analysis |
|
|
|
|
|
Development of reusable asset |
|
55 |
80 |
67 |
|
Total time preparing domain |
|
|
|
|
74 |
Total DARE Time |
126 |
188 |
176 |
122 |
126 |
8.2.
Success
is a function of knowledge
Domain expertise, coupled with an
understanding of the various components of analysis, was a clear indicator of
project success.
One of the strongest projects of the
group came from an individual who did not score highly on any of the outlier
tests, and was only within the 2nd quartile of the total project
time. This 75 hour project spent 14
hours preparing the domain, 29 hours of vocabulary analysis, and 24 hours
developing a reusable asset. This
project’s success factor was going into the project with knowledge of the
domain, strong analysis tools, and a good grasp of architectural analysis.
8.2.1.
Those
with a great deal of programming experience created reusable asset, regardless
of the quality of their analysis.
Sentence
Alignment Systems (6)
·
Combined
system feature tables into one table
·
Found
code analysis fruitless and did not help to better understand and domain,
·
manually
picked out domain vocabulary,
·
Created
functioning reusable component program
Personal
Information Managers (10)
·
Chose
small manageable domain,
·
Chose
toy project exemplars,
·
Manually
selected vocabulary based upon the graphical interface.
·
Code
analysis consisted of listing all functions of the exemplars.
·
Created
a project generator and little language.
Open Source
Java Metrics (3)
·
domain
scope contained no sets,
·
manual
code analysis,
·
simplistic
architecture,
·
Produced
Master’s Thesis level code reuse architecture.
8.2.2.
Those
with strong domain knowledge and programming experience excelled.
Conflation
(7)
·
Described
the domain in pseudo code,
·
Template
contained five templates capturing the five different facet tables,
·
Extracted vocabulary manually,
·
Very strong architectural analysis …
·
Produced functioning reusable component
program and a little language.
Conflation
(8)
·
Thirteen
years experience in C and C++,
·
Strong
analysis,
·
Implementation:
eight reusable parts based functions, application generator and little language
Symmetric
Encryption (4)
·
Three
years experience in C++
·
Manually
picked out vocabulary,
·
No
generic system table,
·
No
thesaurus, synonym table, no cluster tables,
·
Template
had twice as many variables as the facet table …
·
Created
reusable component program and code generator.
8.2.3.
Success
is an iterative process
One project took 46 hours of the 176 project studying the domain. Considering that it invested 80 hours developing the reusable asset, this means they spend 48% of there analysis studying the domain. This project, first based upon symmetric encryption, revised its scope and subsequent facet tables, templates, and generic architecture three times. As the domain and software exemplars were explored, the scope was narrowed to a generic encryption algorithm using the “Fiestal Network”.
9. Recommendations
Running throughout all the examples of this report is the
central message that Domain Engineering and the DARE methodology is must be
understood by the analyst before proceeding.
Regardless of the analyst’s chosen domain, the DARE book will only be as
good as the knowledge of the DARE method that the analyst has. Future DARE
efforts, therefore, will be more successful if they begin with a preliminary
skill assessment, so that the analyst can address those areas where they are
weak.
9.1.
Essential
Vocabulary
Domain
engineering is multi-disciplinary and is based on a set of skills and concepts.
It requires an understanding of such concepts as “system”, “component”,
“induction” and “empirical”. These are
not always taught today’s computer science and information science curricula. Appendix A contains necessary vocabulary for
an understanding of domain engineering.
9.2.
Essential
Skills
The
abilities to find source material and understand software architecture are
examples of skills that are fundamental to successful domain engineering. The
following table contains an outline of these skills. Future domain analysis
exercises might begin by evaluating their skill set.
·
Research
techniques
o
Discovering
appropriate domains
o
Finding
source code
o
Finding
exemplar documents and architectures
·
Proficiency
programming in one language
o
Compiling
source material
o
Software
development
·
Mathematical
o
First
order predicate calculus
o
Set
notation
o
Discrete
Mathematics
·
Software
Architecture Styles
§
Main
Program/Subroutine
§
Batch-Sequential
§
Pipe
and Filter
§
Layered
§
Client/Server
§
Object
Oriented
§
UML
·
Class
diagrams
·
Use
Case
·
Sequence
·
Flow
·
Software
Analysis
o
Lines
of Code
o
Systematic
Complexity
o
Essential
Complexity
o
Memory
Use
o
Algorithmic
analysis
§
Speed/
Efficiency
·
Vocabulary
Analysis
o
Stop
List
o
Stemming
o
Conflation
o
Frequency
Analysis
9.3.
Domain
Engineering Skills
The various
tables and diagrams that are produced during Domain analysis must be clearly
understood by the analysts.
9.4.
DARE
Book Creation
Variations of the time recordings of
the DARE book suggests that it might be beneficial to standardize future time
logging practices. This may include more
explicit direction.
9.4.1.
Formalize
the time recording requirements
Future DARE research will be
improved if the time recordings are standardized. This will improve the correctness of the
recordings and reduce the likelihood of individuals omitting to record some
parts as well as standardize the vocabulary.
This will serve several purposes. It will improve future research
efforts such as this one. It will remind
analysts of the mandatory steps of the process.
Time recordings such as “learning the domain”, “searching for exemplar
sources”, “reviewing vocabulary, essential skills, and domain engineering
skills” will paint a more complete picture of the process.
9.4.2.
Provide
research information regarding time expectations.
It would be beneficial to an analyst,
especially on their first project, to be provided some perspective as to the
time and effort required for success.
Data from this report may play an important part in that. The following table provides time investment
summaries of successful projects.
Time of most successful projects |
Average |
|||||
Total DARE Time |
75 |
122 |
126 |
176 |
188 |
137.4 |
Preparing the Domain |
14 |
33 |
74 |
52 |
50 |
44.6 |
Vocabulary Analysis |
29 |
3 |
20 |
12 |
20 |
16.8 |
Domain Implementation |
24 |
67 |
|
80 |
55 |
45.2 |
Given this small sample, these times
are by no means conclusive, but they do give a general sense of how much time
someone should anticipate investing if the wish to be successful.
9.5.
DARE book project check list
The fish-bone
chart of section 7.3 can serve as the starting place for a DARE book check
list. At each step of the process,
selecting exemplars, scoping the domain, etc. the analyst can evaluate their
book against the failure points provided.
10. Further Research
These projects
contained many data points, and not all were explored. Areas of analysis not addressed in this
report include 1) the quality of the exemplar sources, 2) quality of
architectural diagrams and the subsequent architectural analysis, 3) evaluating
the reusable assets both in terms of their quality as well as exploring if the
reusable assets satisfy the models produced from domain analysis.
Although
the data used in this study has provided a unique opportunity to study the
strategies of individuals implementing the DARE methodology, the sample size of
this report suggests that more examples are needed. This report’s suggestions must be evaluated
against repeated examples.
Survey data
taken during and after the projects provides a wealth of information which can
be further explored. (Appendix C) During this report’s research a new survey
was conducted, opening responses to the current set of project authors as well
as a new class of first time domain engineers (Appendix B). This survey, focusing on vocabulary analysis,
suggests these respondents had more success than observed in the sample set of
this project. Gathering data from these
individuals, and exploring reasons for this delta of improvement would provide
the basis for another report.
11. References
[1] Frakes, W., R. Prieto-Diaz and C.
Fox (1998) DARE: Domain Analysis and Reuse Environment. Annals of Software
Engineering, v. 5, pp. 125 -141.
[2] Frakes, W.B., and Kang, K.
"Software Reuse: Status and Future", IEEE TSE, 2005.
[3] Frakes, W. (2000, Nov. 6-9). "A
Method for Bounding Domains". In IASTED International Conference Software
Engineering and Applications 2000 (SEA 2000),
[4] Frakes, W.B., (Ed.): (2000) Software
Reuse: Advances in Software Reusability, 6th International Conference, ICSR-6
[5] Frakes, William and Sadahiro Isoda.
“Success Factors of Systematic Reuse.” Introduction to special issue on reuse
IEEE Software (September 1994): V11, n5, pp. 14-19.
[6] Frakes, W. “Software Reuse
Introduction and Basic Concepts”, http://frakes.cs.vt.edu/SEportalReuse.htm
[7] D. Weiss and R. Lai, Software
Product Line Engineering, Addison-Wesley, 1999.
[8] K. Kang, et al., Feature-Oriented
Domain Analysis(FODA) Feasibility Study, tech. report CMU/SEI-90-TR-21, Software Eng.
Inst., 1990.
[9] P. Naur
and B. Randell, (Eds.). Software Engineering: Report of a conference sponsored by the NATO Science Committee,
Garmisch, Germany, 7-11 Oct. 1968, Brussels, Scientific Affairs Division, NATO
(1969) 231pp. http://homepages.cs.ncl.ac.uk/brian.randell/NATO/
[10] D Garlan and M Shaw, An Introduction
to Software Architecture.”, Advances
in Software Engineering and Knowledge Engineering, Volume 1 World
Scientific Publishing Company, 1993.
[11] Biggerstaff T, "A Perspective
of Generative Reuse," in Annals of Software Engineering, William Frakes
(ed.), Vol. 5, 1998
[12] K Czarnecki ,U.W. Eisenecker ,“Generative
Programming – Methods,Tools,and Applications, 11-12-2001, http://www.generative-programming.org
[13] Biggerstaff T. J.,Software
Generators, 2006, http://www.softwaregenerators.com
[14] Generative Programming and Component
Engineering: ACM SIGPLAN/SIGSOFT Conference, GPCE 2002,
[15] J Clements, P Graunke, S
Krishnamurthi, M Felleisen, “Little Languages and their Programming
Environments”,
[16] Bentley, "Little languages", Communications
of the ACM, 29(8):711-21, August 1986. Stroustrup, B. (1996)
[17] A. Deursen,P. Klint J. Visser, Domain-Specific Languages:An
Annotated Bibliography CWI, P.O. Box 94079, 1090 GB Amsterdam, The
Netherlands
http://www.cwi.nl/ arie,paulk,jvisser/
[18] W Frakes, T Pole, An Empirical Study
of Representational Methods for Reusable Software Components, IEEE Transactions
on Software Engineering, Vol. 20, No 8, Aug. 1994
[19] Csound, Dr. R Boulanger, 11-12-2006,http://www.cSounds.com
[20] Make, Gnu Project and Free Software
Foundation (FSF), 12-2006 http://www.gnu.org/software/make
[21] D. Clarke, T. Wrigstad, “A Fistful
of Languages”, 12-2006,http://www.littlelanguages.net
[22] H Mili, F Mili, A Mili, Research
Software: Issues and Research Directions, IEEE Transactions on Software
Engineering, Vol. 21, No. 6, June 1995.
[23] C. McClure,
“The three R s of software automation: re-engineering, re-pository, Reusability, ”
Prentice-Hall, 1992.
[24] S.C. Cheung, and J. Kramer,
“Enhancing compositional reachability G. Gruman, “Early reuse practice lives up
to its promise,” IEEE Sop- ware, pp. 87-91, Nov. 1988.
[25] Frakes, W. (2000, Nov. 6-9). "A
Method for Bounding Domains". In IASTED
International Conference Software Engineering and Applications 2000
(SEA 2000),
[26] Basili, V., McGarry, F., Page, G.,
Pajerski, R., Waligora, S., Zelkowitz, M., "Software Process Improvement
in the NASA Software Engineering Laboratory," Technical Report
CMU/SEI-94-TR-22,
[27] Joos, R., "Software Reuse at
Motorola," IEEE Software, September, 1994, pp. 42-47
[28] Lanergan, R.G.,
[29] W. Frakes, C Fox, “Sixteen Questions
about Software Reuse”, Communications of the ACM, June 1995, Vol. 38, No. 6.
12. Appendix A Essential Vocabulary (Domain Engineering Key Word Set)
Domain Engineering: The entire process of reusing domain knowledge in the production of
new systems. Comprised of Domain Analysis and Domain Implementation, information
of a class of similar systems are identified, captured, and organized with the
purpose of systematic reuse. Product line engineering [1]
Domain Analysis: Process of identifying domains,
bounding them, and discovering and documenting the commonalities and
variabilities within related software systems. Leads to Domain Implementation
[1]
Architecture:
Set of components and the interactions between those components, and patterns
that guide their composition and constraints on these patterns. [1]
Architecture Components: are such things as
Client, Server, Databases, Filters, Pipes, and Layers in a hierarchy
Architecture Interactions: can be complex protocols like Client Server or just shared
data.
Examples: Data flow or Pipe and
Filter, Use case, Layered, Blackboard, Interface, Translation, Abstraction /
Object Oriented /, Top Down, Data abstraction, object oriented, Client Server
Data: a
collection of facts or data from which conclusions may be drawn; “statistical
data” [syn: information, fact] Knowledge differs from data or information in
that new knowledge may be created from existing knowledge using logical
inference. If information is data plus meaning then knowledge is information plus
processing.
Inference:
Deriving new knowledge from knowledge or facts. For example: Tom is David’s
father and David is John’s father. With these two points of information, then
one can infer that Tom is John’s grandfather.
Inductive:
arising from inductance; “inductive reactance” 3: of reasoning; proceeding from
particular facts to a general conclusion; “inductive reasoning” Not
scientifically provable, although statistically demonstrable. It is the basis
of the scientific model. For example: 100
clowns sampled have red noses. Bob is a clown. You then can’t prove that Bob
has a red nose, but it is statistically probable.
Deduction involves inferences from general
principles. Can be used in logic proofs
For example: All Apples are Fruit
All fruits grow on trees.
Therefore all apples grow
on trees.
Examples: Predicate Calculus,
Propositional Logic
This is the same as knowledge
Index: Controlled,
Classification (like this outline)
Entity:
Something that exists as a particular and discrete unit.
Object: An
entity that has attributes and may exhibit behaviors.
Element: A
fundamental, irreducible constituent. A member of a set.
Constituent:
One of a set of elements into which a construction or component may be divided
by analysis. Part of a whole
Component: A
constituent element, as of a system.
Module: A
uniform structural component used repeatedly in building.
System: A
group of interacting, interrelated, or interdependent elements forming a
complex whole.
Stemming:
attempt to reduce a word to its stem or root form
Conflation:
To relate word forms. (engine, engineer,
engineering )
Facet Table:
Table listing the commonalities and variabilities of a set of systems. Column
labels represent the commonalities and table rows the variabilities.
System Feature Table: Table listing the categories and features of a single system in a
domain.
Generic Feature Table – Table highlighting the features of a set of system
feature tables. Columns represent individual systems, rows contain features of
the systems.
Commonalities:
Features in common among a set of systems in a given domain.
Variability
– features differing in a set of systems in a domain.
Generic –
representing a combination of systems.
Domain – an
application area or a set of systems that share design decisions.[2]
Software Reuse:
the reusable asset created as a product
of domain implementation.
There are two types of reusable
assets: 1) parts based – (from scratch, reengineered, purchased, free ware) 2) formal
language – ( little language or application generator)
13. Appendix B : Recent Survey
Which automated vocabulary analysis tools did you use? |
|
Frequency Analysis |
(64%) |
Stemming |
(73%) |
Conflation |
(
9%) |
Cluster Tools |
(27%) |
Lex |
(18%) |
None |
(
0%) |
Using domain knowledge to manually pick out words is more useful than
using automated vocabulary tools |
|
Strongly
agree |
(
9%) |
Agree |
(45%) |
Disagree |
(27%) |
Strongly
disagree |
(
9%) |
No
opinion |
(
0%) |
They
are equally useful |
(
9%) |
Vocabulary analysis was helpful in developing the reusable software
asset. |
|
Strongly agree |
(
9%) |
Agree |
(55%) |
Disagree |
(27%) |
Strongly disagree |
(
9%) |
No opinion |
(
0%) |
Which of the following best describes your experience finding
automated tools for vocabulary analysis? |
|
Successful |
(36%) |
Somewhat successful |
(45%) |
Minimally successful |
(
9%) |
Not successful |
(
9%) |
Did not look |
(
0%) |
In my vocabulary analysis I differentiated between nouns and verbs. |
|
True |
(45%) |
False |
(55%) |
At the time of the project the following terms were unclear. |
|
Facet
Table |
(55%) |
Template |
(
9%) |
Feature
Table |
(27%) |
Cluster
Table |
(27%) |
None |
(27%) |
Other |
(
0%) |
In my project, my reusable asset was based on the template. |
|
True |
(18%) |
False |
(18%) |
Not
Sure |
(55%) |
no
answer |
(
9%) |
Which of the DARE phases did you had trouble with? |
|
Finding Domain Source |
18% |
Selecting exemplars |
9% |
Scoping the domain |
18% |
Describing the domain using set
notation |
36% |
Developing system feature tables |
9% |
Performing Architectural Analysis |
18% |
Vocabulary Analysis |
27% |
Developing a reusable asset |
45% |
Other |
27% |
What suggestions do
you have to make improvements in future DARE efforts? |
Emphasize the automated lexical analysis as an aid rather
than an end in itself. It is far too facile to think that automated tools
that simply look for words are sophisticated enough to obviate human thought
and analysis of the material at the semantic level. Yet this was the emphasis
in class. This method offers less than Abbott's original idea of identifying
nouns and verbs in a prose description of system requirements (although he
was not doing domain analysis for reuse, the idea is the same). |
From this project experience, it's clear why subject
matter experts are important. If I did this for a work project, I would make
sure I had a couple SME sources. |
A clear example would be very helpful to understand the
concepts in the beginning of the process. |
More concrete examples. Instructor review of each
component in a section before proceeding to next section and well before
project is due. |
The vocab. analysis was extremely useful for developing
the glossary. Glossary was an extremely valuable asset by itself. Vocab
analysis did not yield any re-usuable components. That was derived from the
arch. analysis. That was a very manual process. |
I thought it was a good project and can not think of any
improvements at this time. |
Provide a unified DARE framework to aid in the process.
Many of the early lectures/papers hinted at such a framework, which in the
end, did not exist. Too much time was spent finding disparate and only
marginally useful tools (just to fulfill the requirements of the DARE process
... most of which did not aid in the process of developing a generic
architecture). |
I found the DARE process to be effective in helping us to
define our specification. I see it as one of those things that gets easier
and more effective with practice. The versatility (and value) of the results became clearer
to me at the end of the process, with respect to the breadth of development
efforts it enabled for programmers, sw architects, engineers, managers,
requirements analysis,..ect. |
What general comments do you have about the DARE process? |
This response does not reflect a
full-blown DARE project, but a toy example for a class. So my answers may not
fit exactly with what you are looking for. Under "Which of the DARE
phases did you had [sic] trouble with?" we did not find domain sources,
select exemplars, use set notation, nor develop a reusable asset as part of
our project. Please feel free to contact me furhter at jseigle@vt.edu |
It was a learning experience and
it's something you need to do multiple times before you get accurate and
consistent results. It's not something you do one time and have it mastered. As far as an automated means for
performing domain analysis the process worked well, but it would be nice to
have an integrated tool to assemble the book. Using COTS is doable, but you
can't count on everyone having all the software packages on their system. For
instance, when I tried to open the ins files in teh DARE example, my system
thought they were some type on configuration file and not Inspiration files. |
I think result was very useful. |
Process was informative and I
learned something. |
Very interesting. I would use it
again. |
The most useful part of the
process for me was having to document system architectures. This helped me to
discover commonalities between the systems which were not apparent before. |
I believe there should be a
stronger emphasis on inclusion of domain modeling techniques for
the generic architecture and subsystems. I see these as invaluable keys to
communicating a detailed specification of the domain. Incorporating the DARE book itself
in a defined, consistent and flexible format (like OneNote) might help
promote understanding, would encourage commonality in communication, and
ultimately reduce analysis time. |
14. Appendix C: Survey of present sample.
Using the following scale, for
each question below, please circle the number that best reflects your opinion
about reuse activities in your organization. |
|||||||
0 |
1 |
2 |
3 |
4 |
5 |
|
|
Not
Applicable |
Never |
|
Sometimes |
|
Always |
Responses |
Average |
We design our software for reuse |
1,3,3,3,3,3,4 |
2.86 |
|||||
We get parts from a reuse library |
1,1,3,3,3,4,4 |
2.71 |
|||||
We have a major subsystem supplied
to by other organizations |
1,2,3,4,4,5,5 |
3.43 |
|||||
We informally reuse pieces of
previous systems in new systems. |
3,4,4,4,4,5,5 |
4.14 |
|||||
We
practice reuse by reusing or adapting parts we’ve previously created |
2,3,3,4,4,4,4 |
3.43 |
|||||
Reuse is practiced by individuals
who adapt their own parts |
3,3,3,4,4,4,5 |
3.71 |
|||||
We
practice software reuse. |
2,3,3,3,3,4,4 |
3.14 |
What percent of the lifecycle
objects your organization creates are typically composed of reused parts. |
Responses |
Average |
Requirements |
0,15,35,45,45 |
24.17 |
Designs |
15,15,15,25,25,55 |
25.00 |
Code |
5,15,25,45,45,75 |
35.00 |
Test Plans |
15,65,0,0,75,75 |
39.29 |
Test
Cases |
0,0,15,45,75,75 |
36.43 |
User Documentation |
5,15,25,45,65 |
24.29 |
Other
Please Specify |
|
|
|
Average All |
30.69 |
What percent of lifecycle objects
you personally create are composed of reused parts? |
Responses |
Average |
Requirements |
35,45,0,5,45,15 |
18.33 |
Designs |
25,55,25,15,15,15 |
25.83 |
Code |
25,75,45,5,45,15 |
39.29 |
Test Plans |
15,65,0,0,75,75 |
38.57 |
Test
Cases |
15,55,5,85,85,25 |
35.00 |
User Documentation |
25,45,0,0,15,15 |
14.29 |
Other
Please Specify |
|
|
|
Average All |
28.55 |
What percentage of the parts your reuse are from external sources? |
Average |
|||||||||
Responses |
0, |
25 |
20-30 |
25 |
50 |
50 |
60 |
80 |
38.75% - 40% |
|
Descriptive Statistics |
Mean |
38.75 |
|
|||||||
Median |
37.5 |
|||||||||
Midrange |
40 |
|||||||||
Variance, |
662.5 |
|||||||||
St Dev |
25.74 |
|||||||||
Range |
80 |
|||||||||
Minimum |
0 |
|||||||||
1st Quartile |
22.5 |
|||||||||
2nd Quartile |
37.5 |
|||||||||
3rd Quartile |
55 |
|||||||||
Maximum |
80 |
|||||||||
Using the following scale, for
each question below circle the number that best reflects your opinion. |
||||||
0 |
1 |
2 |
3 |
4 |
5 |
|
Not
Applicable |
Disagree |
|
Agree somewhat |
|
Always |
Average |
I feel I know how to reuse
software |
2.866 |
|||||
Reuse is economically feasible in
my organization |
3.14 |
|||||
I believe reuse works |
3.71 |
|||||
Software developed elsewhere meets
our standards |
3.00 |
|||||
Project
time constraints allow time for reuse. |
2.57 |
|||||
It’s more fun to write my own
software than to try to reuse |
3.43 |
|||||
I’m
inhibited by the possible legal problems. |
1.71 |
|||||
I’ve had
good experiences with the quality of reusable software |
3.00 |
|||||
CASE tools
have promoted reuse across projects in our organization |
1.00 |
|||||
A common
software development process has promoted reuse across projects in our
organization. |
1.57 |
|||||
Reuse of
parts not designed for reuse will never be cost effective. |
2.43 |
|||||
Reuse is
a process of adapting or modifying existing software to meet new
requirements. |
3.00 |
Using the scale below, please
indicate the frequency of the following situations. |
|||||||
0 |
1 |
2 |
3 |
4 |
5 |
|
|
Not
Applicable |
Never |
|
Sometimes |
|
Always |
Average |
|
The part I needed existed and was
available |
3.00 |
||||||
The parts were probably around
somewhere, but I couldn’t find them |
2.71 |
||||||
There was
a library to look in for the part. |
2.29 |
||||||
I found the part, and understood
it sufficiently to reuse it |
3.14 |
||||||
I found
and understood the part, and it was good enough to reuse |
3.14 |
||||||
I couldn’t integrate the part into
my system without extensive modification |
2.71 |
||||||
If you couldn’t integrate it,
the reason was |
Language
Incompatibilities |
3.17 |
|||||
Improper
Form |
3.67 |
||||||
It was
too slow |
1.50 |
||||||
It took
too much memory |
1.50 |
||||||
Other
(please specify) |
|
||||||
Using the following scale, for
type of reuse below, please write the number that best reflects your opinion. |
||||||
0 |
1 |
2 |
3 |
4 |
5 |
|
Never
Used |
Not
Valuable |
|
Somewhat
Valuable |
|
Very
Valuable |
Average |
|
Responses |
|
||||
Booch |
0,0,3,0,0,0,0,0 |
0.38% |
||||
4 GL |
0,3,0,0,0,0,0,0 |
0.38% |
||||
Unix |
4,3,2,4,3,4,0,3 |
2.88% |
||||
Grace |
0,0,0,0,0,0,0,0 |
0% |
||||
X-widgets |
4,0,0,0,4,0,0,0 |
1% |
||||
Document
Templates |
5,3,3,3,0,4,3,5 |
3.25% |
||||
FORTRAN
Libraries |
0,0,0,4,0,0,0,0 |
0.5% |
||||
Program
templates |
4,4,0,3,2,4,0,3 |
2.5% |
||||
|
0,0,0,0,4,0,0,0 |
0.5% |
||||
Cosmic |
0,0,0,0,0,0,0,0 |
0% |
||||
Other
(please specify) |
|
|
What is your primary job? |
|
Programmer |
3 |
Software
Engineer |
3 |
Systems
Engineer |
0 |
Manager |
1 |
How many years of software engineering
experience have you had? |
Average |
0,1,3,5,7,9,13,15 |
5.63 |
On how many different projects
have you worked? |
Average |
0,2,5,8,8,10,10,11 |
6.00 |
For how many different
organizations have you worked? |
Average |
1,2,3,3,4,5,6 |
3.43 |
What is your highest degree? |
Average |
High
School |
|
BS |
6 |
MS |
1 |
Ph.D |
|
BA |
1 |
In what area is your highest
degree? |
|
CS/IS |
5 |
EE |
1 |
Math |
|
Other
(please specify) |
Education - Psychology |
I was educated about software
reuse in school |
Responses |
Yes |
3 |
No |
4 |
Don’t Know |
1 |
My organization maintains one or
more reusable repositories |
Responses |
Yes |
2 |
No |
4 |
Don’t Know |
1 |
My organization rewards reuse as follows |
Responses |
Recognition |
|
Cash
bonuses |
|
No
rewards |
|
My company has a reuse
organization |
Responses |
Yes |
1 |
No |
5 |
Planning
One |
|
Don’t
Know |
1 |
My organization has an education
program about software reuse |
Average |
Yes |
|
No |
6 |
Planning One |
|
Don’t Know |
1 |
My organization has a program in
place to measure level of reuse |
Responses |
Yes |
|
No |
6 |
Planning One |
|
Don’t Know |
1 |
My organization has a program in
place to measure software quality. |
Responses |
Yes |
3 |
No |
2 |
Planning One |
1 |
Don’t Know |
|
My organization has a program in
place to measure software productivity |
Responses |
Yes |
2 |
No |
4 |
Planning One |
|
Don’t Know |
|
How many employees are in … |
Responses |
|||||
Your
company |
400 |
3,000 |
3,500 |
125 |
50 |
10,000 |
Your
division |
80 |
40 |
200 |
10 |
5 |
200 |
Your project |
21 |
3 |
1 |
3 |
1 |
5 |
What is
the primary business of your company? |
Responses |
Aerospace |
|
Telecommunications |
1 |
Manufacturing |
|
Software |
2 |
Other
(please specify) |
biotech,
military, power |
Please rank order the languages
that are used in your company ( 1 = most common). |
Responses |
|
|
Pascal |
1,2 |
C |
2,3,3,3 |
Lisp |
|
C++ |
1,1,1,1,2,2 |
COBOL |
|
FORTRAN |
3 |
Smalltalk |
|
PL-1 |
|
Assembler |
4 |
Jovial |
|
Other:
Java |
1,1 |
Other:
PL/SQL |
2 |
Other:
Perl |
3 |
Other: C |
4 |
Other:
Visual Basic |
1 |
What problems have you had when
trying to reuse software? |
Since we do not have a systematic
reuse program, we sometimes have compatibility problems. |
Difficult to understand the
module. Performance issue |
Haven’t established a well
organized reuse library. Hard to find codes to reuse, need experienced
programmer to point out the existence of similar code. |
Code bloat, “almost but not quite”
syndrome, corner cases, not pre-existing components, written/prototyped in
matlab |
How to modify similar software |
Finding it |
I have not reused software before |
Not knowing where to find reusable
components |
What other comments about reuse do
you have? |
Reuse is
good but also important as integration of reused modules. |
Works in
some situations, but not all. |
Savings
on testing and verification |
How would you increase reuse in
your environment? |
By having
a systematic reuse program |
Make
reuse repository and use external reusable components. |
Engage
management in importance of reuse and benefit conversations and need to
establish a team to coord. on research. |
Easier
search/discovery |
Create a
reuse library |
15. Appendix D, System Description
DARE System
Description
Domain:
System Name:
EXPERT INFORMATION
Expert Name:
Expert Position: |
|
Engineer |
|
Tester |
(choose one) |
|
Manager |
|
Other |
|
|
Systems Analyst |
|
|
Years of Experience:
Domain Familiarity: |
|
Novice |
|
Expert |
(choose one) |
|
Specialist |
|
Other |
SYSTEM CHARACTERISTICS
Implementation languages used to build this system:
|
|
|
X |
|
C |
|
C++ |
|
Cobol |
|
Fortran |
|
PL/I |
|
Other |
Hardware used to build this system:
|
Mainframe |
|
Minicomputer |
|
Workstation |
|
PC |
|
Mac |
|
Other |
Operating system used to build this system:
|
DOS |
|
Windows |
|
Windows-95 |
|
MacOS |
|
UNIX |
|
VM |
|
VMS |
|
Other |
System Overview:
Architectural Style:
Dataflow Systems |
|
|
||||||
|
|
Batch sequential |
||||||
|
|
Pipes and Filters |
||||||
Call and Return
Systems |
|
|
||||||
|
|
Main program and subroutine |
||||||
|
|
OO Systems |
||||||
|
|
Hierarchical layers |
||||||
Independent
Components |
|
|
||||||
|
|
Communicating processes |
||||||
|
|
Event systems |
||||||
Virtual Machines |
|
|
||||||
|
|
Interpreters |
||||||
|
|
Rule-based systems |
||||||
Repository Centered
Systems |
|
|
||||||
|
|
Database |
||||||
|
|
Hypertext |
||||||
|
|
Blackboard |
||||||
Other: (specify) |
|
|
||||||
Users:
|
Programmer |
|
Manager |
|
Engineer |
|
Other |
|
Technical Support |
|
Staff |
|
End User |
|
|
Tasking:
|
Single Thread |
|
Multi-Thread |
|
Other |
Analysis and Design
|
SA/SD |
|
SADT |
|
JSP |
|
Other |
|
JSD |
|
OOA/OOD |
|
Func. Decomp. |
|
None |
QA Methods:
|
Requirements Reviews |
|
Design Reviews |
|
Formal Inspections |
|
Walkthroughs |
|
Unit Test |
|
Integration Test |
|
System Test |
|
Regression Test |
|
Coverage Analysis |
|
Program Proofs |
|
Cleanroom |
|
Statistical Process Control |
|
Process Audits |
|
Other |
|
None |
|
|
Change Strategy:
|
Evolve Single System |
|
Evolve Multiple Systems |
|
Develop New Systems |
|
Port Uniform System |
|
Port Customized Systems |
|
Other |
|
None |
|
|
Expected Changes:
Support new
languages (presently, only English-German and English-French have been tested)
------------------------------------------------------------------------
Copyright © 1995-98 Software Engineering Guild and Reuse Inc. All Rights
Reserved.