Domain Engineering:
An Empirical Study
William Frakes
C. David Harris, Jr.
Department of Computer Science
Virginia
Polytechnic Institute and
December
21, 2006
Domain Engineering:
An Empirical Study
William
Frakes C.
David Harris, Jr.
December
21, 2006 Frakes
and Harris
Domain Engineering:
An Empirical Study
William
Frakes and C. David Harris Jr.
Virginia
Tech
Abstract: This paper presents a summary and
analysis of data gathered from thirteen domain engineering projects, participant
surveys, and demographic information. Taking a failure modes approach, project
data is compared to an ideal model of the DARE methodology, revealing valuable
insights into points of failure in the domain engineering process. This study suggests
that success is a function of the domain analyst’s command of a specific set of
domain engineering concepts and skills, the time invested in the process, and
persistence in difficult areas. We conclude by presenting strategies to avoid
points of failure in future domain engineering projects.
1.
Introduction
Domain engineering has emerged as an
important topic in software engineering research and practice. While many
methods have been developed to support domain engineering [2], there has been
little empirical analysis of the domain engineering process. This paper
conducts an empirical analysis of thirteen domain engineering projects in a
university setting. Taking a failure
modes approach, this report analyzed the collected project data and various project
outcomes to identify points of improvement. Sources of information include the DARE books,
participants’ demographic information, and survey data taken during and after
the domain engineering exercises. These
artifacts were produced over the course of three years as part of a
graduate-level, advanced software engineering course at Virginia Tech.
These resources present important
opportunities to discover individual strategies for domain engineering. The
data alone presents a contribution to domain engineering research; however, this
report’s analysis and conclusions likewise provide valuable insights for
improving the craft of domain engineering.
2.
Domain
Engineering
2.1.
Context
Software developers have always done
software reuse. Operating system libraries
and programming language syntactic elements are designed to be reused and have
been successful in standardizing computer programming languages and methods.
There are many types of reuse.
[Frakes and Terry 96] For example a programmer may reuse software by selecting
lines of code from preexisting projects and copying into new applications.
He/she may also reuse software functions or libraries. However, developing a framework to
systematically engineer domain specific reusable software components requires an
entirely different process and methodology.
The term software engineering was first popularized in a 1968 NATO
conference with the hopes of applying structured engineering principles to the
process of creating software [9]. As a
discipline, software engineering is an evolutionary leap from the craft of
computer programming. Traditionally,
successful development is the result of intuition, hard work, and individual
talent.
But as commercial demands for
software increase, it becomes impossible to continue traditional ad hoc, single
system development strategies [10]. Current software engineering research aspires
to apply a more formal methodology to software development.
Software engineering models strive
to manage the software development process. Improved requirements gathering,
design, quality, cost, and time estimation are all goals of the various
software engineering processes. The numerous software engineering processes -
waterfall, iterative, capability maturity model, etc- all have the common goal
of improving software development. An emerging goal of software engineering is to
design software assets so that the software can be reused easily. Creating
reusable software components in a particular domain is a goal of domain engineering.
2.2.
Design
for Reuse
Like software engineering, the field
of software reuse has its origins in the late sixties. During this time, Doug
Mcilroy of Bell Laboratories proposed software reuse as an area of study and
recommended basing the software industry upon reusable components [7]. Other research
contributions include Parnas’ notion of programming families and Neighbors’ idea
of domain analysis [2]. Since this time, software reuse has been an active area
of research.
Domain engineering recognizes that software
development communities create software in an application area or a set of
systems that share design decisions [2]. Commercial groups create software
products in a given market area. Problem domains such a software metrics or
vocabulary analysis inspire research, publications, and software from multiple
sources.
The process of domain engineering has
two phases. Domain analysis systematically
analyzes preexisting software, architectures, and documents to capture their
commonalities and variabilities to develop a generic architecture and templates.
Domain implementation develops software artifacts that can be produced new
systems within a domain.
2.3.
Benefits
A core benefit of domain engineering
is that development with reusable software assets – libraries, code generators,
domain specific languages, contributes to improved software quality and
productivity. Software that is reused is
often tested more frequently than software that has a single use or
function. Using reusable assets typically
decreases development costs, for the reuse requires less time, less testing,
and less new documentation. In addition,
reusable assets often improve the accuracy of initial development estimates of
time and cost [7].
2.4.
Literature Review
Several Domain engineering methodologies
exist, including Family-Oriented Abstraction Specification and Translation [8],
Feature-Oriented Domain Analysis (FODA) [9], Organization Model Modeling (ODM),
and Domain Engineering Method and Reusable Algorithmic Libraries (DEMRAL). This
report uses the Domain Analysis and Reuse Environment methodology [6].
2.5.
Domain Analysis Reuse Environment
The Domain Analysis and Reuse Environment
(DARE) methodology is a multidisciplinary approach that has two phases: domain
analysis and domain implementation.
Domain Analysis
In domain analysis, a domain
analyst, with the help of a domain expert, collects domain source code,
documents, and architectures. Through a
systematic study of these domain sources, the domain analyst acquires an
understanding of the commonalities and variabilities of domain systems. Specific activities include the
identification and organization of important vocabularies, the development of feature
tables, and the creation of facet tables and generic architectures. These products of analysis provide the
material for domain implementation.
Domain Implementation
Software reuse can be divided into
two types: parts-based and formal language based.
Parts-based reuse is better known. It
typically introduces reusable software “parts” into a language. Examples of parts-based reuse include the
reuse of programming functions, classes, and libraries. Java, for example,
provides a large collection of reusable java libraries in the form of Java
Application Resource (JAR) packages. Parts based reuse can also include
language frameworks - interacting collections of objects, such as the C++
Standard Template Library, or the Microsoft .NET Framework.
The second type of reuse, formal
language based, involves creating a general, formal language that has the
reusable information embedded into it [8]. Formal language based reuse can be
further divided into two types: programming language based and generative.
Domain specific languages that are
created for a small problem area are called Little
Languages.[Bentley] R, for example,
is a statistics language that readily processes data sets and produces summary
statistics and graphs with a few commands. Others examples of little languages include:
Lex (Lexical Analysis), Yacc (Yet
Another Compiler Compiler), Csound for dealing with sound, Make for managing
software project compilation, Extensible
Stylesheet Language Transformations (XSLT) for translating XML documents, and
Ant (Anther Neat Tool) for managing Java projects.[20,21]
In the generative approach, domain
knowledge is built into the generator such that based upon a few specifications,
an application can be generated.
AndroMDA, for example, is a model-driven architecture that translates
UML into software that integrates with several common architectures, such as
Spring, EJB, Struts, Hibernate, JSF, and Java. Other examples of language
specific application generators are Sun’s Netbeans and Microsoft Visual
Studio. Both of these programming user
interfaces contain a collection of default project types that can be generated
automatically. They also facilitate adding commonly used application components
and provide ready access to software libraries.
Systematic Reuse and Product Line
Engineering
The idea of applying a product line
approach to systematic software reuse has its origins in manufacturing, where
products are produced in a structured way.
This formal creation takes full advantage of the knowledge of the
commonalities and variabilities of each release. Software Engineering Institute
defines a software product line as “… a set of software-intensive systems that
share a common, managed set of features satisfying the specific needs of a
particular market segment or mission and that are developed from a common set
of core assets in a prescribed way.”
2.6.
A Unique
Measures of software reuse are difficult
to estimate. In the past, individual organizations like NASA and Motorola have
instigated reuse programs and reported improved reuse rates from ~20% to 79% [26],
and 14% to 85% [27]. An industry-wide perspective, however, is a difficult
statistic to come by. Research indicates
that there are great differences between reuse practices among different
industries, and the factors that influence the success and failures of software
reuse practices are complex [29].
This study explored this complex
failure space by capturing the efforts of thirteen individuals’ first domain
engineering exercises. Surveys of the participants in this study indicate they
reuse software on average 30% across all the lifecycle objects in their
organizations and 28% of the lifecycle objects they personally create. In addition, none had a work program for
software reuse education or a means to measure levels of reuse. [Appendix C].
From a
failure modes perspective, this study offers insight into the factors that
contribute to both success and failure in domain engineering. The collection of
data alone provides valuable information, but it is hoped that its analysis and
the conclusions of this report provide helpful guidance for future domain
engineering efforts, and will contribute more reuse rates in the workplace.
3.
The Study
3.1.
Demographics
In this study, thirteen subjects
completed domain engineering in several domains using DARE-COTS [1] as part of
an advanced graduate course in software engineering. Projects were completed in
a 14-week period from January to April of 2005 and January to April 2006. None
of the subjects had prior knowledge of domain engineering. Demographic
information on the subjects is reported in table 3,1.
Demographic information includes the
subject’s degree level, the subject’s degree area he/she was pursing (computer
science or information systems), the subject’s years of industrial experience, the
subject’s primary programming language, the subject’s chosen domain, and the subject’s
language used in the project domain implementation.
|
Subject |
Level |
Degree |
Experience |
Languages in order of familiarity |
Domain |
Implementation Language |
|
1 |
M.S. |
MIS |
0 |
NA |
Conflation |
Java |
|
2 |
M.S. |
CS |
9.5 |
VB.NET, C#,
VB6, PL/SQL, Java C++ |
AHLTA Longitudinal Domain Book |
NA |
|
3 |
M.S. |
CS |
5 |
Java, PL/SQL,
Perl, C |
Open source
java metrics |
Java |
|
4 |
M.S. |
CS |
3 |
C++ |
Symmetric
encryption |
C |
|
5 |
M.S. |
CS |
15 |
C++,C, Fortran,
Assembler, Java |
Simple metrics |
Perl |
|
6 |
M.S. |
CS |
10 |
C++, C#, Java,
Perl |
Sentence
Alignment Systems |
C# |
|
7 |
Ph.D. |
CS |
1 |
Java, C++, C |
Conflation |
Java |
|
8 |
|
CS |
13 |
C++, C |
Conflation |
Perl |
|
9 |
MS |
CS |
3 |
Java, Ruby, C++ |
Blog domain |
NA |
|
10 |
M.S. |
CS |
7 |
Visual Basic,
C++, C |
Personal
information management systems |
Visual Basic |
|
11 |
M.S. |
MIS |
|
|
Conflation |
NA |
|
12 |
M.S. |
CS |
1 |
C++, Pascal, C |
Static code
metrics |
C |
|
13 |
M.S. |
CS |
6 |
Java, SQL, C# |
Object Oriented
Software Metric Tools |
Java |
Table 1: Participant Demographics
3.2.
Assignment
Students were given the following
semester-long assignment:
3.3.
DARE Book Structure
The completed DARE book contained
the following sections:
DARE
Definition: (Domain Analysis Reuse Environment) [1]
DARE Book provides a detailed specifications of the domain
Domain Sources
Vocabulary Analysis
Architecture Analysis, code analysis
Summary information, glossary, bibliography index, and
appendix
DARE Book
Domain sources
Source documentation
Source code
Source notes
System description
System architectures
System feature tables
Vocabulary
Basic vocabulary, facets, synonyms, templates, thesaurus
Architecture
Generic architecture
Generic features
Code structure
Reusable Components
Summary
Domain Glossary
Bibliography
Index
Appendix
3.4.
The
Paper Outline
The remainder of this paper is as
follows: Section 4 defines an idealized model or the DARE methodology for
domain engineering. Data collected from the thirteen projects is presented in
section 5, and then data is analyzed for points of failure in section 6. These
observations are summarized in section 7’s themes of failure, followed by section
8’s principles of success. Section 9,
10, and 11 then suggest practical implications of these observations for future
DARE projects endeavors and future research.
4.
DARE
Methodology Model
4.1.
Domain
Analysis
Domain engineering’s fist phase,
domain analysis, is the process of analyzing a given domain to derive domain
models. Sources of this analysis include exemplar code, exemplar text and
domain expert information. Examples of models include generic architectures,
facet tables, feature tables and templates.
4.1.1.
Scoping
the Domain
Domain
analysis begins by selecting a domain that can be bounded and scoped to a
manageable level. With the help of a
domain expert, the domain analyst scopes a domain and creates a formal domain
model. Scoping the domain involves gathering suitable domain exemplars,
documents, and notes, then describing the domain verbally. Set notation may be necessary to state
clearly what is in and what is not in the domain [25]. The success of this is dependent, in part, on
the degree to which a chosen domain is suitable for analysis.
4.1.2.
Domain
Suitability
The suitability of a domain for analysis
is a function of several factors. First among these is the availability of at
least three exemplar systems. The suitability of exemplar systems for domain
analysis is a function of system complexity, stability, size, and formality. System
complexity increases with the number of system types, and the complexity of the
system source code.
System stability is related to the
quality of the project. Elements of stability of an exemplar system
include: code quality, maintenance
support, release history, experience of authors, comment to code ratio, coding
style, and complexity metric.
The size of a system can be a
determining factor in whether the system can be scoped to a manageable size for
domain analysis. For instance, the
domain of software metrics might better be scoped to static complexity analysis.
A system with high formality will
contain documentation, architectures, and ample software comments. An exemplar system produced in a formal
methodology contributes to stability.
4.1.3.
Vocabulary
Analysis
The vocabulary analysis steps
include gathering the domain exemplar code, exemplar text and domain expert information,
and compiling an initial word set. Then
using automated processes and domain knowledge, this set of words is reduced to
a manageable set of key words. Diagram 1
represents the vocabulary analysis methodology.
Various vocabulary analysis methods include the use of stop lists,
stemming, conflation, clustering, and frequency.
Cluster tables group words together
around a point of commonality. The
points of commonality become the column titles of the facet table with the
cluster words becoming the rows or points of variability. The facet table maps directly to a template whereby
descriptive words describe the generic system with the facet table’s columns
mapping to variables. This template then is used in the creation of the
reusable asset.

Diagram 1: Vocabulary Analysis
The cluster
tables, facet table, and template are all derived from the key word set. At each step of the process, domain analyst
system knowledge plays a key role in refining the tables.
4.1.4.
Architectural
Analysis
The domain analyst, along with the
domain expert, creates a system feature table for each exemplar system. System
feature tables describe the attributes of each of the systems.
System architectures are
architectures available in the documentation of the exemplar systems.
A generic architecture is formed, in
large part, by an analysis of the set of available system architectures and the
template produced by the vocabulary analysis.
4.1.5.
Software Analysis
The exemplar code is analyzed using
a variety of techniques to understand how it can be incorporated in a reusable
asset. Software components, such as
classes, functions, and libraries, can be identified and incorporated into a
parts-based reusable asset. Software
metrics can be used to identify areas of complexity or maintainability that can
assist in a redesigned, reusable component. This process is aided by analysts
that possess domain expertise, programming language knowledge, and an
understanding of software analysis metrics.
4.2.
Domain Implementation
Reusable
assets are derived by domain implementation based on domain models Assets can
be parts based components, domain specific languages, or application generators.
In a parts-based implementation, domain knowledge gained from software analysis
can contribute to creating reusable language components, such as classes and
libraries. These parts can be incorporated and reused in future software
applications. In a formal language
implementation, the template and generic architectures provide frameworks for
creating domain specific languages or application generators.
5.
Project Data
The following section presents data
collected from thirteen DARE projects. Data considerations are organized by
measures of time, measures of size, domain scope, vocabulary analysis,
architectural analysis, and dare book tables.,
The Discussions of their implications can be found in section 6.
5.1.
Activities Time Log
Participants were instructed to
provide a log of time spent on each stage of the process.
All times are given in hours except
where noted.
|
Table 2. Activities Time Log Entries |
|||||||||||||
|
Project Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
|
Book
Creation |
|
12 |
|
20 |
|
|
|
2 |
|
|
|
|
|
|
Finding
good tools |
|
16.25 |
50 |
|
|
|
|
|
|
|
|
|
|
|
gathering
source information |
8 |
2 |
|
6 |
|
|
1 ,,, 2
weeks |
13 |
|
|
|
38 |
|
|
Documents |
|
|
|
|
|
|
|
|
|
|
9 |
|
2 |
|
source
code |
|
|
|
|
|
|
|
|
|
|
4 |
|
0.5 |
|
system
descriptions |
|
|
|
|
|
|
|
0.5 |
|
|
|
|
|
|
system
architectures |
|
|
|
|
|
|
2 … 3
weeks |
8 |
|
|
8 |
16 |
|
|
system
feature tables |
|
|
|
|
|
2 |
3 days |
0.5 |
|
|
|
20 |
3 |
|
source
notes |
|
|
|
|
|
|
|
|
|
|
|
|
1 |
|
expert
systems |
|
|
|
|
|
12 |
|
5 |
|
|
|
|
0.5 |
|
study of
domain |
|
|
|
46 |
|
|
|
3 |
|
|
9 |
|
|
|
domain
scope |
1 |
|
|
|
|
|
3 weeks |
3 |
|
|
2 |
|
|
|
vocabulary
analysis |
4 |
21 |
20 |
12 |
30 |
13 |
2 weeks |
0 |
0 |
0 |
3 |
|
|
|
basic
vocabulary |
|
|
|
|
1 |
|
|
|
|
|
|
|
1 |
|
frequency
analysis |
|
|
|
|
|
4 |
|
|
|
|
|
|
|
|
cluster
analysis |
|
|
|
|
|
8 |
|
|
|
|
|
|
2 |
|
facet
table |
|
|
|
|
1 |
2 |
4 days |
2 |
|
|
|
20 |
0.5 |
|
synonym
table |
|
|
|
|
1 |
0.5 |
|
|
|
|
|
|
0.5 |
|
template |
|
|
|
|
1 |
1 |
|
|
|
|
|
|
0.5 |
|
thesaurus |
|
|
|
|
1 |
0.5 |
|
|
|
|
|
|
0.5 |
|
vocabulary
notes |
|
|
|
|
1 |
|
|
1 |
|
|
|
|
|
|
code
analysis |
30 |
|
3 |
2 |
2 |
4 |
5 days |
1 |
|
|
3 |
|
|
|
source
notes |
|
|
|
|
|
|
|
3 |
|
|
|
|
|
|
Arch
analysis |
5 |
14 |
60 |
10 |
2 |
|
|
|
|
|
|
|
|
|
generic
architecture |
|
|
|
|
2 |
4 |
|
4 |
|
|
6 |
16 |
0.5 |
|
generic
feature table |
|
|
|
|
2 |
|
1.5 days |
1 |
|
|
2 |
16 |
|
|
architecture
notes |
|
|
|
|
2 |
|
1.5 days |
3 |
|
|
|
|
|
|
implementation/
reusable component |
5 |
|
55 |
80 |
4 |
24 |
5 weeks |
5 |
|
55 |
3 |
|
0.5 |
|
reusable
algorithm |
|
|
|
|
|
|
|
62 |
|
|
|
|
|
|
glossary |
|
|
|
|
|
|
|
5 |
|
|
|
|
2 |
|
testing |
|
|
|
|
|
|
|
|
|
15 |
|
|
0.5 |
5.1.1.
Number
of time log entries for each participant.
This table considers the number of
time log entries made by each participant. The variability of these measures
may play a role in understanding different project outcomes.
|
Table 3 Number of Time Log Entries |
|||||||||||||
|
3 |
5 |
5 |
6 |
6 |
7 |
10 |
10 |
12 |
13 |
14 |
19 |
3 |
|

|
Statistical Summary 1 - Number of
time Log Entries |
|||
|
Sample
Size, n: |
13 |
Range |
16 |
|
Mean |
8.69 |
Minimum |
3 |
|
Median |
7 |
1st
Quartile |
5 |
|
Midrange |
11 |
2nd
Quartile |
7 |
|
Variance |
23.06 |
3rd
Quartile |
12 |
|
Standard
Deviation |
4.80 |
Maximum |
19 |
5.1.2.
Total
time creating the DARE project
This table considers the total time
invested in DARE project. Time investment may be an indicator of project
success.
|
Table 4. Total Time of DARE
project |
||||||||||||
|
53 |
65.25 |
188 |
176 |
65 |
75 |
NA |
1222 |
NA |
70 |
49 |
126 |
13 |

|
Statistical Summary 2,
Total Book Time Statistical Summary |
|||
|
Sample Size, n: |
11 |
Range |
175 |
|
Mean |
91.11 |
Minimum |
13 |
|
Median |
70 |
1st Quartile |
53 |
|
Midrange |
100.5 |
2nd Quartile |
70 |
|
Variance |
3014.79 |
3rd Quartile |
126 |
|
Standard Deviation |
54.9 |
Maximum |
188 |
5.1.3.
Time
Preparing the Domain
For purposes of analysis the report
groups together the time log steps of table 2 from “finding good tools” to
“domain scope”. These steps comprise an important phase of the DARE book,
referred to in this paper as “preparing the domain”.
|
Table 5. Total time preparing the
domain |
||||||||||||
|
9 |
18.25 |
50 |
52 |
15 |
14 |
NA |
33 |
NA |
NA |
32 |
74 |
7 |

|
Statistical Summary 3, Domain
Preparation Time |
|||
|
Sample
Size, n: |
10 |
Range |
67 |
|
Mean |
30.22 |
Minimum |
7 |
|
Median |
25.125 |
1st
Quartile |
12 |
|
Midrange |
40.5 |
2nd
Quartile |
25.12 |
|
Variance |
498.83 |
3rd
Quartile |
50 |
|
Standard
Deviation |
22.3 |
Maximum |
74 |
5.1.4.
Time
invested in vocabulary analysis
In this
report, vocabulary analysis encompasses the creation time log steps, table 2,
from “vocabulary analysis” to “vocabulary notes”. Diagram 1 of section 4.1
graphically shows the collection of steps involved in vocabulary analysis.
|
Table 6. Total time spent on
vocabulary analysis |
||||||||||||
|
4 |
21 |
20 |
12 |
36 |
29 |
NA |
3 |
NA |
NA |
3 |
20 |
5 |

|
Statistical Summary 4 - Vocabulary
Analysis Time |
|||
|
Sample
Size, n: |
10 |
Range |
33 |
|
Mean |
15 |
Minimum |
3 |
|
Median |
16 |
1st
Quartile |
4 |
|
Midrange |
19.2 |
2nd
Quartile |
16 |
|
Variance |
129.55 |
3rd
Quartile |
21 |
|
Standard
Deviation |
11.38 |
Maximum |
36 |
5.1.5.
Time
Implementing Reusable Assets:
The time for implementing reusable
assets is derived from the creating time log entries, table 2, “implementing
reusable component”, “reusable algorithm” and “testing”
|
Table 7. Time for Domain Implementation |
|||||||||||||
|
implementation/ reusable component |
5 |
|
55 |
80 |
4 |
24 |
5 weeks |
5 |
|
55 |
3 |
|
0.5 |
|
reusable algorithm |
|
|
|
|
|
|
|
62 |
|
|
|
|
|
|
Testing |
|
|
|
|
|
|
|
|
|
15 |
|
|
|
|
Total |
5 |
|
55 |
80 |
4 |
24 |
|
67 |
|
70 |
3 |
|
0.5 |

|
Statistical Summary 5, Time
Implementing Reusable Assets |
|||
|
Sample
Size, n: |
9 |
Range |
79.5 |
|
Mean |
34.2 |
Minimum |
0.5 |
|
Median |
24 |
1st
Quartile |
4 |
|
Midrange |
40.25 |
2nd
Quartile |
24 |
|
Variance |
1108.19 |
3rd
Quartile |
67 |
|
Standard
Deviation |
33.28 |
Maximum |
80 |
5.1.6.
Time
Groups
In the category of implementing
reusable assets, there were to two distinct groups: those who spent 50 hours or
more, and those who spent 5 or less hours.
The data points break down as follows:
|
Table 8. Time spent implementing
Reusable Assets |
||
|
<=
5 |
0.5,
3, 4, 5 |
|
|
>=50 |
55,
67, 70,80 |
|
5.1.7.
Time
Outliers
Outliers of the time entry log,
table 2, are listed below. Although
there is a good deal of variation of the time entry log, outliers in the following
categories are noteworthy. The following
table lists areas of the project where students experienced difficulty
completing the task.
|
Time outliers |
|
|
Finding good tools |
50 |
|
Gathering Source Information |
38 |
|
Study of domain |
46 |
|
Architectural Analysis |
60 |
5.2.
Book
size
Book size was measured by the number
of words, number of lines, and number of pages.
There was considerable variation in
book size due to the different individual approaches to book construction. Book
style ranged from those who included domain sources with their projects - the
system exemplar code, vocabulary analysis results, code analysis, etc., to
those whose books were little more than an outline with references.
Two areas that greatly increased the
size of books were the inclusion of system source code and intermediary
vocabulary analysis tables. In one
project, for example, a 422 page book was reduced to 20 pages after removing
376 exemplar source code and 26 pages of vocabulary analysis.
|
Book Size |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
|
Words |
12966 |
6448 |
3101 |
6334 |
3218 |
3350 |
10396 |
1416 |
2962 |
74939 |
28068 |
3396 |
1083 |
|
Lines |
4021 |
4757 |
457 |
3047 |
1202 |
683 |
3,636 |
613 |
428 |
26648 |
6628 |
1076 |
227 |
|
Pages |
81 |
68 |
17 |
60 |
17 |
19 |
65 |
32 |
19 |
422 |
125 |
20 |
8 |
|
Words - source code |
10,060 |
6,185 |
3101 |
6334 |
3218 |
3350 |
3282 |
1416 |
2962 |
9,104 |
17314 |
3396 |
1083 |
|
Lines - source code |
3,103 |
4,494 |
457 |
3047 |
1202 |
683 |
1518 |
613 |
428 |
6653 |
3658 |
1076 |
227 |
|
Pages - source code |
62 |
62 |
17 |
60 |
17 |
19 |
28 |
32 |
19 |
46 |
78 |
20 |
8 |
|
Words - vocabulary frequency analysis |
10,060 |
3,060 |
3101 |
6334 |
3218 |
3350 |
10396 |
1416 |
2962 |
68869 |
24008 |
3396 |
1083 |
|
Lines - vocabulary frequency analysis |
3,103 |
1,257 |
457 |
3047 |
1202 |
683 |
3,636 |
613 |
428 |
20598 |
4598 |
1076 |
227 |
|
Pages - vocabulary frequency analysis |
62 |
41 |
17 |
60 |
17 |
19 |
65 |
32 |
19 |
379 |
91 |
20 |
8 |
|
Words - source and vocabulary |
1,060 |
2,787 |
3101 |
6334 |
3218 |
3350 |
3282 |
1416 |
2962 |
3036 |
13252 |
3396 |
1083 |
|
Lines -source and vocabulary |
3,103 |
993 |
457 |
3047 |
1202 |
683 |
1518 |
613 |
428 |
604 |
1627 |
1076 |
227 |
|
Pages - source and vocabulary |
62 |
35 |
17 |
60 |
17 |
19 |
28 |
32 |
19 |
11 |
44 |
20 |
8 |
The following charts display DARE
book size by word count, line count, and page count.
5.2.1.
Book
Word Count
Chart 6 shows the total word count
for DARE books. These measurements do not remove exemplar source code or
vocabulary analysis.
|
Table 6, Word Count in DARE book. |
||||||||||||
|
12966 |
6448 |
3101 |
6334 |
3218 |
3350 |
10396 |
1416 |
2962 |
74939 |
28068 |
3396 |
1083 |

|
Statistical Summary 6 : DARE Book
Word Count |
|||
|
Sample
Size, n: |
13 |
Range |
73856 |
|
Mean |
12129 |
Minimum |
1083 |
|
Median |
3396 |
1st
Quartile |
3101 |
|
Midrange |
38011 |
2nd
Quartile |
3396 |
|
Variance |
4.086476e+8 |
3rd
Quartile |
10396 |
|
Standard
Deviation |
20215.03 |
Maximum |
74939 |
5.2.2.
Word
Count without exemplar code and
vocabulary analysis
Here shows the total word count for
all projects with exemplar source code and vocabulary analysis removed. These measurements have less variance with source
code and vocabulary analysis removed.
|
Table 7, DARE book word count with
exemplary code and vocabulary frequency analysis removed. |
||||||||||||
|
1060 |
2787 |
3101 |
6334 |
3218 |
3350 |
3282 |
1416 |
2962 |
3036 |
1924 |
3396 |
1083 |

|
Statistical Summary 7: Book Words minus source and vocabulary |
|||
|
Sample
Size, n: |
13 |
Range |
5274 |
|
Mean |
2842.23 |
Minimum |
1060 |
|
Median |
3036 |
1st
Quartile |
1924 |
|
Midrange |
3697 |
2nd
Quartile |
3036 |
|
Variance |
1.863497e+6 |
3rd
Quartile |
3282 |
|
Standard
Deviation |
1365.1 |
Maximum |
6334 |
5.2.3.
Line
Count – code and vocabulary
Chart 8 displays the total line
count for all projects with exemplar source code and vocabulary analysis
removed.
|
Table 8, DARE book line count
without exemplary source code and vocabulary frequency analysis |
||||||||||||
|
3103 |
993 |
457 |
3047 |
1202 |
683 |
1518 |
613 |
428 |
604 |
438 |
1076 |
227 |

|
Statistical Summary 8: Line Count
minus code and vocabulary |
|||
|
Sample
Size, n: |
13 |
Range |
2876 |
|
Mean |
1106.84 |
Minimum |
227 |
|
Median |
683 |
1st
Quartile |
457 |
|
Midrange |
1665 |
2nd
Quartile |
683 |
|
Variance |
893486.8 |
3rd
Quartile |
1202 |
|
Standard
Deviation |
945.24 |
Maximum |
3103 |
5.2.4.
Page
Count – code and vocabulary
Here shows the total page count for
all projects with exemplar source code and vocabulary analysis removed
|
Table 9, DARE book page count without exemplary source code and
vocabulary frequency analysis |
||||||||||||
|
62 |
35 |
17 |
60 |
17 |
19 |
28 |
32 |
19 |
11 |
17 |
20 |
8 |

|
Statistical Summary 9: Page Count
minus code and vocabulary |
|||
|
Sample
Size, n: |
13 |
Range |
54 |
|
Mean |
26.53 |
Minimum |
8 |
|
Median |
19 |
1st
Quartile |
17 |
|
Midrange |
35 |
2nd
Quartile |
19 |
|
Variance |
291.26 |
3rd
Quartile |
32 |
|
Standard
Deviation |
17.06 |
Maximum |
62 |
5.3.
Domain Scope
Domain analysts are to describe
their domains verbally, describing what is in and what is not in the domain,
using set notation if necessary. These
measurements capture this domain scoping statement, by measuring the number of
words and sets in the statement.
5.3.1.
Number of Words
Chart 10
displays the number of words present in each projects domain scope statement
|
Table 10.a , Domain Scope Word
Count |
||||||||||||
|
64 |
NA |
178 |
1075 |
141 |
688 |
160 |
229 |
NA |
255 |
NA |
NA |
18 |

Statistical Summary, word count in the domain scope section
|
Statistical Summary 10: Domain
Scope Word Count |
|||
|
Sample
Size, n: |
9 |
Range |
1057 |
|
Mean |
312 |
Minimum |
18 |
|
Median |
178 |
1st
Quartile |
141 |
|
Midrange |
546.5 |
2nd
Quartile |
178 |
|
Variance |
118990.5 |
3rd
Quartile |
255 |
|
Standard
Deviation |
344.95 |
Maximum |
1075 |
5.3.2.
Mathematical Model using Set Notation
For those
projects that had domain statements, table 10.b notates if the project used set
notation.
|
Table 10.b Presence of set
notation. |
||||||||||||
|
1 |
NA |
0 |
1 |
0 |
0 |
0 |
1 |
NA |
0 |
NA |
NA |
0 |
Three of the nine projects that had
a section devoted to scoping the domain, three or 33%, used a set based
mathematical model to describe their domains.
Two
individuals who did not use set notation used different notation. One
participant used pseudo code in the scope statement, and the other used a
top-down function and connector architecture.
5.3.3.
Number of sets
Table 10.c displays for those
projects that used set notation, how many sets were used in their model.
|
Table 10.c, Number of sets in
domain scope |
||||||||||||
|
4 |
NA |
NA |
20 |
NA |
NA |
NA |
8 |
NA |
NA |
NA |
NA |
NA |
Of the three projects that used set
notation to scope their domains, the number of sets used in the model was 4, 8,
and 20.
5.3.4.
Domain Sources
A key component to domain analysis
is the selection of domain sources.
These exemplars include source code, documents, and architectures.
5.3.4.1.
Count of Exemplar Sources
Chart 11 shows the number of number
of examples of domain source code used for each project’s domain analysis.
|
Table 11, Number of
Exemplar sources |
||||||||||||
|
3 |
7 |
3 |
3 |
3 |
4 |
3 |
8 |
4 |
3 |
3 |
3 |
3 |

Statistical Summary, Exemplar Count
|
Statistical Summary 11: Exemplar
Count |
|||
|
Sample
Size, n: |
13 |
Range |
5 |
|
Mean |
3.86 |
Minimum |
3 |
|
Median |
3 |
1st
Quartile |
3 |
|
Midrange |
5.5 |
2nd
Quartile |
3 |
|
Variance |
2.80 |
3rd
Quartile |
4 |
|
Standard
Deviation |
1.67 |
Maximum |
8 |
5.3.4.2.
Count
of Exemplar Documents
Chart 12
displays the number of domain documents chosen for the DARE book analysis.
|
Table 12a Number of Domain Documents |
||||||||||||
|
5 |
3 |
3 |
7 |
3 |
4 |
4 |
19 |
3 |
3 |
7 |
3 |
3 |

|
Statistical Summary 12a, Exemplar
Document Count |
|||
|
Sample
Size, n: |
13 |
Range |
16 |
|
Mean |
5.15 |
Minimum |
3 |
|
Median |
3 |
1st
Quartile |
3 |
|
Midrange |
11 |
2nd
Quartile |
3 |
|
Variance |
19.47 |
3rd
Quartile |
5 |
|
Standard
Deviation |
4.41 |
Maximum |
19 |
5.3.5.
System Descriptions Count
Each
project contained system descriptions. This table shows the number of system
descriptions.
|
Table 12.b ,System
Descriptions Number and Type |
||||||||||||
|
3
fictitious |
NA |
3 |
5 |
3 |
2 |
3 |
6 |
3
paragraphs |
3
paragraphs |
3
paragraphs |
3
paragraphs |
3 |
The majority of the projects contained
system descriptions. A system template
is presented in Appendix D. Of this set,
three projects did not follow the system template; rather, these projects described
the system verbally in a paragraph or less.
92% of the projects contained system descriptions.
5.4.
Architectural Analysis
As part of the domain analysis,
participants undertook an architectural analysis of their domains. There were three points of data taken with
their analysis: the number of system
architectures in each DARE book, the types of architectures in this set, and
the word count of the architectural analysis section of the domain book.
5.4.1.
Number of system architecture
This
table contains the number of architectural images contained in each individual’s
DARE book.
|
Table 13a, Number of System Architectures |
||||||||||||
|
9 |
14 |
3 |
26 |
4 |
3 |
3 |
6 |
0 |
3 |
5 |
6 |
3 |

|
Statistical Summary 13: System
Architecture Count |
|||
|
Sample
Size, n: |
13 |
Range |
26 |
|
Mean |
6.53 |
Minimum |
0 |
|
Median |
4 |
1st
Quartile |
3 |
|
Midrange |
13 |
2nd
Quartile |
4 |
|
Variance |
46.26 |
3rd
Quartile |
6 |
|
Standard
Deviation |
6.80 |
Maximum |
26 |
5.4.2.
Types of Architecture
This
table lists the architectural type for each architectural image in the DARE
book.
|
Table 13.b, Types of Architecture |
||||||
|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
|
3
flow chart 3 top down, 3 use case |
7 activity, 7 module |
Object
relationships. Class
diagram, |
Functional
flow and their actions |
functions
and connectors |
Pseudo
code (1), function and connectors (3) |
function
and connectors |
|
8 |
9 |
10 |
11 |
12 |
13 |
|
|
Flow (functions, connectors, data) |
class diagrams |
flow (functions and connectors) |
flow (functions, connectors) |
data flow, data flow, data flow,
top down, batch sequential |
class diagrams |
|
5.4.3.
Word Count
This
table shows the number of words for each projects architectural analysis
section.
|
Table 14.a, Word count
architectural analysis |
||||||||||||
|
852 |
115 |
0 |
1767 |
179 |
0 |
0 |
0 |
0 |
0 |
30 |
126 |
0 |

Chart 14
|
Statistical Summary 14:
Architectural Analysis word count |
|||
|
Sample
Size, n: |
13 |
Range |
1767 |
|
Mean |
236.07 |
Minimum |
0 |
|
Median |
0 |
1st
Quartile |
0 |
|
Midrange |
883.5 |
2nd
Quartile |
0 |
|
Variance |
265476.2 |
3rd
Quartile |
126 |
|
Standard
Deviation |
515.24 |
Maximum |
1767 |
Seven projects included only the
architectures diagram but contained no descriptive text.
Six projects included descriptive
text. The number of words ranged from 30
to 1767.
5.4.4.
System Feature Tables
Each analyst was to create feature
tables of his/her domain system exemplars.
The table below shows the number of tables included in his/her system
feature table section.
5.4.4.1.
Number of Tables
|
Table 14.b Number of System
Feature Tables |
||||||||||||
|
1 |
1 |
3 |
3 |
3 |
1 |
3 |
6 |
1 |
1 |
1 (3 combined) |
3 |
1 (3) |
Number of Features
This
table shows the number of features in each feature table.
|
Table 14.c, Number of Features in System Feature Tables |
||||||||||||
|
4 |
11 |
5,11,5 |
16x3,17x3,20x3 |
11,11,6 |
9x4, |
10,10,10 |
6 (6x1 tables) |
12 |
38x3 |
8 |
13,14,18 |
7 |
There was a great variety of
interpretations of what constituted a feature table.
5.5.
Vocabulary Analysis
The log of activity included the
vocabulary analysis activities. There
was a great deal of variety of interpretation as to what labels to use and what
constituted vocabulary analysis.
Therefore it is helpful to redisplay those elements that constitute the
vocabulary analysis phase. Due to the
variety, discussing them individually is impractical, but looking at the total
time spent on vocabulary analysis is.
5.5.1.
Time
invested in vocabulary analysis
|
Table
4, Total Time Spent on Vocabulary Analysis |
||||||||||||
|
4 |
21 |
56 |
12 |
36 |
26 |
NA |
3.00 |
NA |
NA |
3 |
20 |
5 |

|
Statistical Summary 4: Vocabulary
Analysis Time |
|||
|
Sample
Size, n: |
10 |
Range |
33 |
|
Mean |
15 |
Minimum |
3 |
|
Median |
16 |
1st
Quartile |
4 |
|
Midrange |
19.2 |
2nd
Quartile |
16 |
|
Variance |
129.55 |
3rd
Quartile |
21 |
|
Standard
Deviation |
11.38 |
Maximum |
36 |
Of this
group there is one outlier, whereby an individual spent 56 hours on vocabulary
analysis.
These times can be viewed a two
groups – those who spent 12 hours or more and those who spent five hours or
less
|
Table 4.b, Total time spent on vocabulary analysis |
|
|
<=5 |
3,3,4,5 |
|
>=12 |
12,
20,21,36 |
5.5.2.
Manual or Automatic
Vocabulary analysis consists of a
combination of automated and manual processes.
The following table displays whether an analyst uses automatic, manual,
or a combination of both methods.
|
Table 14.c, Vocabulary Analysis:
manual, automatic, both |
||||||||||||
|
Manual |
Automatic |
automatic |
both |
Automated |
both |
NA |
both |
both |
both |
both |
both |
manual |
5.5.3.
Original Set
Analysts begin with a raw set of
domain words to be analyzed. For those
who reported their original set, the figures are recorded below.
|
Table 14.c, Original Set Word
Count |
||||||||||||
|
|
|
thousands |
thousands |
thousands |
thousands |
thousands |
|
|
|
10790 |
|
|
5.5.4.
Number of words key word set
Following a series of automatic and
manual vocabulary analysis methods, analysts derive a key word set with which
to create facet tables. The number of
each projects key word set is shown in the following table.
|
Table 15, Number of word in
keyword set |
||||||||||||
|
39 |
939 |
650 |
34 |
4681 |
39 |
48 |
65 |
14 |
4117 |
32 |
NA |
188 |

Chart 15
|
Statistical Summary15: Key Word
Set Size |
|||
|
Sample
Size, n: |
12 |
Range |
4667 |
|
Mean |
903.83 |
Minimum |
14 |
|
Median |
56.6 |
1st
Quartile |
36.6 |
|
Midrange |
2347.5 |
2nd
Quartile |
56.5 |
|
Variance |
2.764544e+6 |
3rd
Quartile |
794.5 |
|
Standard
Deviation |
1662.692 |
Maximum |
4681 |
5.5.5.
Key word set, minus outliers
Four individuals’ vocabulary key
word sets contained a key word set greater than 600. This group skewed the group that derived
manageable set of vocabulary. This set
is listed below.
|
Table 16.a, Key word set |
||||||||||||
|
39 |
OUTLIER |
|||||||||||