Domain Engineering:

An Empirical Study

 

 

 

William Frakes

C. David Harris, Jr.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Department of Computer Science

Virginia Polytechnic Institute and State University

Blacksburg, Virginia 24061

 

 

 

December 21, 2006

 

 

 

Domain Engineering:

An Empirical Study

 

William Frakes                                                                                       C. David Harris, Jr.

      Virginia Tech                                                                                             Virginia Tech

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

December 21, 2006                                                                               Frakes and Harris


 

 

Domain Engineering:

An Empirical Study

 

William Frakes and C. David Harris Jr.

Virginia Tech    

 

                                                           

 

Abstract:  This paper presents a summary and analysis of data gathered from thirteen domain engineering projects, participant surveys, and demographic information. Taking a failure modes approach, project data is compared to an ideal model of the DARE methodology, revealing valuable insights into points of failure in the domain engineering process. This study suggests that success is a function of the domain analyst’s command of a specific set of domain engineering concepts and skills, the time invested in the process, and persistence in difficult areas. We conclude by presenting strategies to avoid points of failure in future domain engineering projects. 

 

1.       Introduction

 

Domain engineering has emerged as an important topic in software engineering research and practice. While many methods have been developed to support domain engineering [2], there has been little empirical analysis of the domain engineering process. This paper conducts an empirical analysis of thirteen domain engineering projects in a university setting.  Taking a failure modes approach, this report analyzed the collected project data and various project outcomes to identify points of improvement.  Sources of information include the DARE books, participants’ demographic information, and survey data taken during and after the domain engineering exercises.  These artifacts were produced over the course of three years as part of a graduate-level, advanced software engineering course at Virginia Tech. 

 

These resources present important opportunities to discover individual strategies for domain engineering. The data alone presents a contribution to domain engineering research; however, this report’s analysis and conclusions likewise provide valuable insights for improving the craft of domain engineering.

 

2.       Domain Engineering

 

2.1.             Context

 

Software developers have always done software reuse.  Operating system libraries and programming language syntactic elements are designed to be reused and have been successful in standardizing computer programming languages and methods.

 

There are many types of reuse. [Frakes and Terry 96] For example a programmer may reuse software by selecting lines of code from preexisting projects and copying into new applications. He/she may also reuse software functions or libraries.   However, developing a framework to systematically engineer domain specific reusable software components requires an entirely different process and methodology.

 

The term software engineering was first popularized in a 1968 NATO conference with the hopes of applying structured engineering principles to the process of creating software [9].  As a discipline, software engineering is an evolutionary leap from the craft of computer programming.  Traditionally, successful development is the result of intuition, hard work, and individual talent. 

But as commercial demands for software increase, it becomes impossible to continue traditional ad hoc, single system development strategies [10].  Current software engineering research aspires to apply a more formal methodology to software development.

 

Software engineering models strive to manage the software development process. Improved requirements gathering, design, quality, cost, and time estimation are all goals of the various software engineering processes. The numerous software engineering processes - waterfall, iterative, capability maturity model, etc- all have the common goal of improving software development. An emerging goal of software engineering is to design software assets so that the software can be reused easily. Creating reusable software components in a particular domain is a goal of domain engineering.

 

2.2.             Design for Reuse

 

Like software engineering, the field of software reuse has its origins in the late sixties. During this time, Doug Mcilroy of Bell Laboratories proposed software reuse as an area of study and recommended basing the software industry upon reusable components [7]. Other research contributions include Parnas’ notion of programming families and Neighbors’ idea of domain analysis [2]. Since this time, software reuse has been an active area of research.

 

Domain engineering recognizes that software development communities create software in an application area or a set of systems that share design decisions [2]. Commercial groups create software products in a given market area. Problem domains such a software metrics or vocabulary analysis inspire research, publications, and software from multiple sources.

 

The process of domain engineering has two phases.  Domain analysis systematically analyzes preexisting software, architectures, and documents to capture their commonalities and variabilities to develop a generic architecture and templates. Domain implementation develops software artifacts that can be produced new systems within a domain.

 

2.3.             Benefits

 

A core benefit of domain engineering is that development with reusable software assets – libraries, code generators, domain specific languages, contributes to improved software quality and productivity.  Software that is reused is often tested more frequently than software that has a single use or function.  Using reusable assets typically decreases development costs, for the reuse requires less time, less testing, and less new documentation.  In addition, reusable assets often improve the accuracy of initial development estimates of time and cost [7].

 

2.4.             Literature Review

 

Several Domain engineering methodologies exist, including Family-Oriented Abstraction Specification and Translation [8], Feature-Oriented Domain Analysis (FODA) [9], Organization Model Modeling (ODM), and Domain Engineering Method and Reusable Algorithmic Libraries (DEMRAL). This report uses the Domain Analysis and Reuse Environment methodology [6].

 

2.5.             Domain Analysis Reuse Environment

 

The Domain Analysis and Reuse Environment (DARE) methodology is a multidisciplinary approach that has two phases: domain analysis and domain implementation.

 

Domain Analysis

 

In domain analysis, a domain analyst, with the help of a domain expert, collects domain source code, documents, and architectures.  Through a systematic study of these domain sources, the domain analyst acquires an understanding of the commonalities and variabilities of domain systems.  Specific activities include the identification and organization of important vocabularies, the development of feature tables, and the creation of facet tables and generic architectures.  These products of analysis provide the material for domain implementation.

 

Domain Implementation

 

Software reuse can be divided into two types: parts-based and formal language based.

Parts-based reuse is better known. It typically introduces reusable software “parts” into a language.  Examples of parts-based reuse include the reuse of programming functions, classes, and libraries. Java, for example, provides a large collection of reusable java libraries in the form of Java Application Resource (JAR) packages. Parts based reuse can also include language frameworks - interacting collections of objects, such as the C++ Standard Template Library, or the Microsoft .NET Framework.

 

The second type of reuse, formal language based, involves creating a general, formal language that has the reusable information embedded into it [8]. Formal language based reuse can be further divided into two types: programming language based and generative.

 

Domain specific languages that are created for a small problem area are called Little Languages.[Bentley]   R, for example, is a statistics language that readily processes data sets and produces summary statistics and graphs with a few commands. Others examples of little languages include: Lex  (Lexical Analysis), Yacc (Yet Another Compiler Compiler), Csound for dealing with sound, Make for managing software project compilation, Extensible Stylesheet Language Transformations (XSLT) for translating XML documents, and Ant (Anther Neat Tool) for managing Java projects.[20,21]

 

In the generative approach, domain knowledge is built into the generator such that based upon a few specifications, an application can be generated.  AndroMDA, for example, is a model-driven architecture that translates UML into software that integrates with several common architectures, such as Spring, EJB, Struts, Hibernate, JSF, and Java. Other examples of language specific application generators are Sun’s Netbeans and Microsoft Visual Studio.  Both of these programming user interfaces contain a collection of default project types that can be generated automatically. They also facilitate adding commonly used application components and provide ready access to software libraries. 

 

Systematic Reuse and Product Line Engineering

 

The idea of applying a product line approach to systematic software reuse has its origins in manufacturing, where products are produced in a structured way.  This formal creation takes full advantage of the knowledge of the commonalities and variabilities of each release. Software Engineering Institute defines a software product line as “… a set of software-intensive systems that share a common, managed set of features satisfying the specific needs of a particular market segment or mission and that are developed from a common set of core assets in a prescribed way.”

 

2.6.             A Unique Opportunity

 

Measures of software reuse are difficult to estimate. In the past, individual organizations like NASA and Motorola have instigated reuse programs and reported improved reuse rates from ~20% to 79% [26], and 14% to 85% [27]. An industry-wide perspective, however, is a difficult statistic to come by.  Research indicates that there are great differences between reuse practices among different industries, and the factors that influence the success and failures of software reuse practices are complex [29].

 

This study explored this complex failure space by capturing the efforts of thirteen individuals’ first domain engineering exercises. Surveys of the participants in this study indicate they reuse software on average 30% across all the lifecycle objects in their organizations and 28% of the lifecycle objects they personally create.  In addition, none had a work program for software reuse education or a means to measure levels of reuse. [Appendix C]. 

 

From a failure modes perspective, this study offers insight into the factors that contribute to both success and failure in domain engineering. The collection of data alone provides valuable information, but it is hoped that its analysis and the conclusions of this report provide helpful guidance for future domain engineering efforts, and will contribute more reuse rates in the workplace.

 


 

3.       The Study             

 

3.1.             Demographics

 

In this study, thirteen subjects completed domain engineering in several domains using DARE-COTS [1] as part of an advanced graduate course in software engineering. Projects were completed in a 14-week period from January to April of 2005 and January to April 2006. None of the subjects had prior knowledge of domain engineering. Demographic information on the subjects is reported in table 3,1.

 

Demographic information includes the subject’s degree level, the subject’s degree area he/she was pursing (computer science or information systems), the subject’s years of industrial experience, the subject’s primary programming language, the subject’s chosen domain, and the subject’s language used in the project domain implementation.

 

Subject

Level

Degree

Experience

Languages in order of familiarity

Domain

Implementation Language

1

M.S.

MIS

0

NA

Conflation

Java

2

M.S.

CS

9.5

VB.NET, C#, VB6, PL/SQL, Java C++

AHLTA Longitudinal Domain Book

NA

3

M.S.

CS

5

Java, PL/SQL, Perl, C

Open source java metrics

Java

4

M.S.

CS

3

C++

Symmetric encryption

C

5

M.S.

CS

15

C++,C, Fortran, Assembler, Java

Simple metrics

Perl

6

M.S.

CS

10

C++, C#, Java, Perl

Sentence Alignment Systems

C#

7

Ph.D.

CS

1

Java, C++, C

Conflation

Java

8

 

CS

13

C++, C

Conflation

Perl

9

MS

CS

3

Java, Ruby, C++

Blog domain

NA

10

M.S.

CS

7

Visual Basic, C++, C

Personal information management systems

Visual Basic

11

M.S.

MIS

 

 

Conflation

NA

12

M.S.

CS

1

C++, Pascal, C

Static code metrics

C

13

M.S.

CS

6

Java, SQL, C#

Object Oriented Software Metric Tools

Java

 

Table 1: Participant Demographics


3.2.             Assignment

 

Students were given the following semester-long assignment:

 

  • Domain engineering using DARE for one of the following domains: software metrics, conflation algorithms, or one of your choice.
  • Collect process metrics for your project such as:
    • Time for each step
    • Log of what you did
    • Produce size/ complexity metrics
  • Create one or more reusable components for your domain and/or Create a little language for your domain and/or Create an application generator for your domain

 

3.3.             DARE Book Structure

 

The completed DARE book contained the following sections:

 

DARE

Definition: (Domain Analysis Reuse Environment) [1]

DARE Book provides a detailed specifications of the domain

Domain Sources

Vocabulary Analysis

Architecture Analysis, code analysis

Summary information, glossary, bibliography index, and appendix

DARE Book

Domain sources

Source documentation

Source code

Source notes

System description

System architectures

System feature tables

Vocabulary

Basic vocabulary, facets, synonyms, templates, thesaurus

Architecture

Generic architecture

Generic features

Code structure

Reusable Components

Summary

Domain Glossary

Bibliography

Index

Appendix

 

 


3.4.             The Paper Outline

 

The remainder of this paper is as follows: Section 4 defines an idealized model or the DARE methodology for domain engineering. Data collected from the thirteen projects is presented in section 5, and then data is analyzed for points of failure in section 6. These observations are summarized in section 7’s themes of failure, followed by section 8’s principles of success.  Section 9, 10, and 11 then suggest practical implications of these observations for future DARE projects endeavors and future research.

 

 

4.       DARE Methodology Model

 

4.1.             Domain Analysis

 

Domain engineering’s fist phase, domain analysis, is the process of analyzing a given domain to derive domain models. Sources of this analysis include exemplar code, exemplar text and domain expert information. Examples of models include generic architectures, facet tables, feature tables and templates.

 

4.1.1.                    Scoping the Domain  

 

Domain analysis begins by selecting a domain that can be bounded and scoped to a manageable level.  With the help of a domain expert, the domain analyst scopes a domain and creates a formal domain model.  Scoping the domain involves gathering suitable domain exemplars, documents, and notes, then describing the domain verbally.  Set notation may be necessary to state clearly what is in and what is not in the domain [25].  The success of this is dependent, in part, on the degree to which a chosen domain is suitable for analysis.

 

4.1.2.                    Domain Suitability

 

The suitability of a domain for analysis is a function of several factors. First among these is the availability of at least three exemplar systems. The suitability of exemplar systems for domain analysis is a function of system complexity, stability, size, and formality. System complexity increases with the number of system types, and the complexity of the system source code.

System stability is related to the quality of the project. Elements of stability of an exemplar system include:  code quality, maintenance support, release history, experience of authors, comment to code ratio, coding style, and complexity metric.

 

The size of a system can be a determining factor in whether the system can be scoped to a manageable size for domain analysis.  For instance, the domain of software metrics might better be scoped to static complexity analysis.  A system with high formality will contain documentation, architectures, and ample software comments.  An exemplar system produced in a formal methodology contributes to stability.

 

 

4.1.3.                    Vocabulary Analysis

 

The vocabulary analysis steps include gathering the domain exemplar code, exemplar text and domain expert information, and compiling an initial word set.  Then using automated processes and domain knowledge, this set of words is reduced to a manageable set of key words.  Diagram 1 represents the vocabulary analysis methodology.  Various vocabulary analysis methods include the use of stop lists, stemming, conflation, clustering, and frequency.

 

Cluster tables group words together around a point of commonality.  The points of commonality become the column titles of the facet table with the cluster words becoming the rows or points of variability.  The facet table maps directly to a template whereby descriptive words describe the generic system with the facet table’s columns mapping to variables. This template then is used in the creation of the reusable asset.

 

 

Diagram 1: Vocabulary Analysis

 

 

The cluster tables, facet table, and template are all derived from the key word set.  At each step of the process, domain analyst system knowledge plays a key role in refining the tables. 

 

4.1.4.                    Architectural Analysis

 

The domain analyst, along with the domain expert, creates a system feature table for each exemplar system. System feature tables describe the attributes of each of the systems.

System architectures are architectures available in the documentation of the exemplar systems.  

A generic architecture is formed, in large part, by an analysis of the set of available system architectures and the template produced by the vocabulary analysis.

 

4.1.5.                    Software  Analysis

 

The exemplar code is analyzed using a variety of techniques to understand how it can be incorporated in a reusable asset.  Software components, such as classes, functions, and libraries, can be identified and incorporated into a parts-based reusable asset.  Software metrics can be used to identify areas of complexity or maintainability that can assist in a redesigned, reusable component. This process is aided by analysts that possess domain expertise, programming language knowledge, and an understanding of software analysis metrics.

 

 

4.2.             Domain Implementation

 

 

Reusable assets are derived by domain implementation based on domain models Assets can be parts based components, domain specific languages, or application generators. In a parts-based implementation, domain knowledge gained from software analysis can contribute to creating reusable language components, such as classes and libraries. These parts can be incorporated and reused in future software applications.  In a formal language implementation, the template and generic architectures provide frameworks for creating domain specific languages or application generators.


5.       Project Data

 

The following section presents data collected from thirteen DARE projects. Data considerations are organized by measures of time, measures of size, domain scope, vocabulary analysis, architectural analysis, and dare book tables.,  The Discussions of their implications can be found in section 6.

 

5.1.             Activities Time Log

 

Participants were instructed to provide a log of time spent on each stage of the process.

All times are given in hours except where noted.

 

Table 2. Activities Time Log Entries

Project Number

1

 2

7

10

11

12

 13

Book Creation

 

12

 

20

 

 

 

2

 

 

 

 

 

Finding good tools

 

16.25

50

 

 

 

 

 

 

 

 

 

 

gathering source information

8

2

 

6

 

 

1 ,,, 2 weeks

13

 

 

 

38

 

Documents

 

 

 

 

 

 

 

 

 

 

9

 

2

source code

 

 

 

 

 

 

 

 

 

 

4

 

0.5

system descriptions

 

 

 

 

 

 

 

0.5

 

 

 

 

 

system architectures

 

 

 

 

 

 

2 … 3 weeks

8

 

 

8

16

 

system feature tables

 

 

 

 

 

2

3 days

0.5

 

 

 

20

3

source notes

 

 

 

 

 

 

 

 

 

 

 

 

1

expert systems

 

 

 

 

 

12

 

5

 

 

 

 

0.5

study of domain

 

 

 

46

 

 

 

3

 

 

9

 

 

domain scope

1

 

 

 

 

 

3 weeks

3

 

 

2

 

 

vocabulary analysis

4

21

20

12

30

13

2 weeks

0

0

0

3

 

 

basic vocabulary

 

 

 

 

1

 

 

 

 

 

 

 

1

frequency analysis

 

 

 

 

 

4

 

 

 

 

 

 

 

cluster analysis

 

 

 

 

 

8

 

 

 

 

 

 

2

facet table

 

 

 

 

1

2

4 days

2

 

 

 

20

0.5

synonym table

 

 

 

 

1

0.5

 

 

 

 

 

 

0.5

template

 

 

 

 

1

1

 

 

 

 

 

 

0.5

thesaurus

 

 

 

 

1

0.5

 

 

 

 

 

 

0.5

vocabulary notes

 

 

 

 

1

 

 

1

 

 

 

 

 

code analysis

30

 

3

2

2

4

5 days

1

 

 

3

 

 

source notes

 

 

 

 

 

 

 

3

 

 

 

 

 

Arch analysis

5

14

60

10

2

 

 

 

 

 

 

 

 

generic architecture

 

 

 

 

2

4

 

4

 

 

6

16

0.5

generic feature table

 

 

 

 

2

 

1.5 days

1

 

 

2

16

 

architecture notes

 

 

 

 

2

 

1.5 days

3

 

 

 

 

 

implementation/ reusable component

5

 

55

80

4

24

5 weeks

5

 

55

3

 

0.5

reusable algorithm

 

 

 

 

 

 

 

62

 

 

 

 

 

glossary

 

 

 

 

 

 

 

5

 

 

 

 

2

testing

 

 

 

 

 

 

 

 

 

15

 

 

0.5


5.1.1.                    Number of time log entries for each participant.

 

This table considers the number of time log entries made by each participant. The variability of these measures may play a role in understanding different project outcomes.

 

 

Table 3 Number of Time Log Entries

3

5

5

6

6

7

10

10

12

13

14

19

3

 

 

 

 

 


Statistical Summary 1 - Number of time Log Entries

Sample Size, n:

13

Range

16

Mean

8.69

Minimum

3

Median

7

1st Quartile

5

Midrange

11

2nd Quartile

7

Variance

23.06

3rd Quartile

12

Standard Deviation

4.80

Maximum

19

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5.1.2.                    Total time creating the DARE project

 

This table considers the total time invested in DARE project. Time investment may be an indicator of project success.

 

Table 4. Total Time of DARE project

53

65.25

188

176

65

75

NA

1222

NA

70

49

126

13

 

 

 

 

 

Statistical Summary 2, Total Book Time Statistical Summary

Sample Size, n:

11

Range

175

Mean

91.11

Minimum

13

Median

70

1st Quartile

53

Midrange

100.5

2nd Quartile

70

Variance

3014.79

3rd Quartile

126

Standard Deviation

54.9

Maximum

188

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5.1.3.                    Time Preparing the Domain

 

For purposes of analysis the report groups together the time log steps of table 2 from “finding good tools” to “domain scope”. These steps comprise an important phase of the DARE book, referred to in this paper as “preparing the domain”.

 

 

Table 5. Total time preparing the domain

9

18.25

50

52

15

14

NA

33

NA

NA

32

74

7

 

 

 

 

Statistical Summary 3, Domain Preparation Time

Sample Size, n:

10

Range

67

Mean

30.22

Minimum

7

Median

25.125

1st Quartile

12

Midrange

40.5

2nd Quartile

25.12

Variance

498.83

3rd Quartile

50

Standard Deviation

22.3

Maximum

74

 

 

 

 

 

 

 


5.1.4.                    Time invested in vocabulary analysis

 

In this report, vocabulary analysis encompasses the creation time log steps, table 2, from “vocabulary analysis” to “vocabulary notes”. Diagram 1 of section 4.1 graphically shows the collection of steps involved in vocabulary analysis.

 

 

Table 6. Total time spent on vocabulary analysis

4

21

20

12

36

29

NA

3

NA

NA

3

20

5

 

 

 

Statistical Summary 4 - Vocabulary Analysis Time

Sample Size, n:

10

Range

33

Mean

15

Minimum

3

Median

16

1st Quartile

4

Midrange

19.2

2nd Quartile

16

Variance

129.55

3rd Quartile

21

Standard Deviation

11.38

Maximum

36

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5.1.5.                    Time Implementing Reusable Assets:

 

The time for implementing reusable assets is derived from the creating time log entries, table 2, “implementing reusable component”, “reusable algorithm” and “testing”

 

 

Table 7. Time for Domain Implementation

implementation/ reusable component

5

 

55

80

4

24

5 weeks

5

 

55

3

 

0.5

reusable algorithm

 

 

 

 

 

 

 

62

 

 

 

 

 

Testing

 

 

 

 

 

 

 

 

 

15

 

 

 

Total

5

 

55

80

4

24

 

67

 

70

3

 

0.5

 

 

Statistical Summary 5, Time Implementing Reusable Assets

Sample Size, n:

9

Range

79.5

Mean

34.2

Minimum

0.5

Median

24

1st Quartile

4

Midrange

40.25

2nd Quartile

24

Variance

1108.19

3rd Quartile

67

Standard Deviation

33.28

Maximum

80

 

5.1.6.                    Time Groups

 

In the category of implementing reusable assets, there were to two distinct groups: those who spent 50 hours or more, and those who spent 5 or less hours.  The data points break down as follows:

 

 

Table 8. Time spent implementing Reusable Assets

<= 5

0.5, 3, 4, 5

 

>=50

55, 67, 70,80

 

 

 

 

5.1.7.                    Time Outliers

 

Outliers of the time entry log, table 2, are listed below.  Although there is a good deal of variation of the time entry log, outliers in the following categories are noteworthy.  The following table lists areas of the project where students experienced difficulty completing the task.

 

Time outliers

Finding good tools

50

Gathering Source Information

38

Study of domain

46

Architectural Analysis

60

 


5.2.             Book size

 

Book size was measured by the number of words, number of lines, and number of pages.

There was considerable variation in book size due to the different individual approaches to book construction. Book style ranged from those who included domain sources with their projects - the system exemplar code, vocabulary analysis results, code analysis, etc., to those whose books were little more than an outline with references.

 

Two areas that greatly increased the size of books were the inclusion of system source code and intermediary vocabulary analysis tables.  In one project, for example, a 422 page book was reduced to 20 pages after removing 376 exemplar source code and 26 pages of vocabulary analysis.

 

Book Size

1

2

3

4

5

6

7

8

9

10

11

12

13

Words

12966

6448

3101

6334

3218

3350

10396

1416

2962

74939

28068

3396

1083

Lines

4021

4757

457

3047

1202

683

3,636

613

428

26648

6628

1076

227

Pages

81

68

17

60

17

19

65

32

19

422

125

20

8

Words - source code

10,060

6,185

3101

6334

3218

3350

3282

1416

2962

9,104

17314

3396

1083

Lines - source code

3,103

4,494

457

3047

1202

683

1518

613

428

6653

3658

1076

227

Pages - source code

62

62

17

60

17

19

28

32

19

46

78

20

8

Words - vocabulary frequency analysis

10,060

3,060

3101

6334

3218

3350

10396

1416

2962

68869

24008

3396

1083

Lines - vocabulary frequency analysis

3,103

1,257

457

3047

1202

683

3,636

613

428

20598

4598

1076

227

Pages -  vocabulary frequency analysis

62

41

17

60

17

19

65

32

19

379

91

20

8

Words - source and vocabulary

1,060

2,787

3101

6334

3218

3350

3282

1416

2962

3036

13252

3396

1083

Lines -source and vocabulary

3,103

993

457

3047

1202

683

1518

613

428

604

1627

1076

227

Pages - source and vocabulary

62

35

17

60

17

19

28

32

19

11

44

20

8


 

The following charts display DARE book size by word count, line count, and page count.

 

5.2.1.                    Book Word Count

 

Chart 6 shows the total word count for DARE books. These measurements do not remove exemplar source code or vocabulary analysis.

 

Table 6, Word Count in DARE book.

12966

6448

3101

6334

3218

3350

10396

1416

2962

74939

28068

3396

1083

 

 

 

 

Statistical Summary 6 : DARE Book Word Count

Sample Size, n:

13

Range

73856

Mean

12129

Minimum

1083

Median

3396

1st Quartile

3101

Midrange

38011

2nd Quartile

3396

Variance

4.086476e+8

3rd Quartile

10396

Standard Deviation

20215.03

Maximum

74939

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5.2.2.                    Word Count  without exemplar code and vocabulary analysis

 

Here shows the total word count for all projects with exemplar source code and vocabulary analysis removed.  These measurements have less variance with source code and vocabulary analysis removed.

 

Table 7, DARE book word count with exemplary code and vocabulary frequency analysis removed.

1060

2787

3101

6334

3218

3350

3282

1416

2962

3036

1924

3396

1083

 

 

 

 

Statistical Summary 7: Book Words minus  source and vocabulary

Sample Size, n:

13

Range

5274

Mean

2842.23

Minimum

1060

Median

3036

1st Quartile

1924

Midrange

3697

2nd Quartile

3036

Variance

1.863497e+6

3rd Quartile

3282

Standard Deviation

1365.1

Maximum

6334

 

 

 

 

 

 

 

 

 

5.2.3.                    Line Count – code and vocabulary

 

 

Chart 8 displays the total line count for all projects with exemplar source code and vocabulary analysis removed.

 

 

 

Table 8, DARE book line count without exemplary source code and vocabulary frequency analysis

3103

993

457

3047

1202

683

1518

613

428

604

438

1076

227

 

 

 

 

Statistical Summary 8: Line Count minus code and vocabulary

Sample Size, n:

13

Range

2876

Mean

1106.84

Minimum

227

Median

683

1st Quartile

457

Midrange

1665

2nd Quartile

683

Variance

893486.8

3rd Quartile

1202

Standard Deviation

945.24

Maximum

3103

 

 

 


 

 

5.2.4.                    Page Count – code and vocabulary

 

 

Here shows the total page count for all projects with exemplar source code and vocabulary analysis removed

 

Table 9, DARE book page count without exemplary source code and vocabulary frequency analysis

62

35

17

60

17

19

28

32

19

11

17

20

8

 

 

Statistical Summary 9: Page Count minus  code and vocabulary

Sample Size, n:

13

Range

54

Mean

26.53

Minimum

8

Median

19

1st Quartile

17

Midrange

35

2nd Quartile

19

Variance

291.26

3rd Quartile

32

Standard Deviation

17.06

Maximum

62

 

 

 

 

 


5.3.             Domain Scope

 

Domain analysts are to describe their domains verbally, describing what is in and what is not in the domain, using set notation if necessary.  These measurements capture this domain scoping statement, by measuring the number of words and sets in the statement.

 

5.3.1.                    Number of Words

 

Chart 10 displays the number of words present in each projects domain scope statement

 

 

Table 10.a , Domain Scope Word Count

64

NA

178

1075

141

688

160

229

NA

255

NA

NA

18

 

           

 

Statistical Summary, word count in the domain scope section

 

 

Statistical Summary 10: Domain Scope Word Count

Sample Size, n:

9

Range

1057

Mean

312

Minimum

18

Median

178

1st Quartile

141

Midrange

546.5

2nd Quartile

178

Variance

118990.5

3rd Quartile

255

Standard Deviation

344.95

Maximum

1075

 

 

 

 

 

 

 

5.3.2.                    Mathematical Model using Set Notation

 

For those projects that had domain statements, table 10.b notates if the project used set notation.

 

Table 10.b Presence of set notation.

1

NA

0

1

0

0

0

1

NA

0

NA

NA

0

 

 

Three of the nine projects that had a section devoted to scoping the domain, three or 33%, used a set based mathematical model to describe their domains.

 

Two individuals who did not use set notation used different notation. One participant used pseudo code in the scope statement, and the other used a top-down function and connector architecture.

 

 

 

5.3.3.                    Number of sets

 

Table 10.c displays for those projects that used set notation, how many sets were used in their model.

 

Table 10.c, Number of sets in domain scope

4

NA

NA

20

NA

NA

NA

8

NA

NA

NA

NA

NA

 

Of the three projects that used set notation to scope their domains, the number of sets used in the model was 4, 8, and 20.

 

 

 

 

 


5.3.4.                    Domain Sources

 

A key component to domain analysis is the selection of domain sources.  These exemplars include source code, documents, and architectures.

 

5.3.4.1.              Count of Exemplar Sources

 

Chart 11 shows the number of number of examples of domain source code used for each project’s domain analysis.

 

 

Table 11, Number of Exemplar sources

3

7

3

3

3

4

3

8

4

3

3

3

3

 

 

 

Statistical Summary, Exemplar Count

 

 

 

Statistical Summary 11: Exemplar Count

Sample Size, n:

13

Range

5

Mean

3.86

Minimum

3

Median

3

1st Quartile

3

Midrange

5.5

2nd Quartile

3

Variance

2.80

3rd Quartile

4

Standard Deviation

1.67

Maximum

8

 

 

 

 

 

5.3.4.2.              Count of Exemplar Documents

 

Chart 12 displays the number of domain documents chosen for the DARE book analysis.

 

Table 12a Number of Domain Documents

5

3

3

7

3

4

4

19

3

3

7

3

3

 

 

 

 

 

 

 

Statistical Summary 12a, Exemplar Document Count

Sample Size, n:

13

Range

16

Mean

5.15

Minimum

3

Median

3

1st Quartile

3

Midrange

11

2nd Quartile

3

Variance

19.47

3rd Quartile

5

Standard Deviation

4.41

Maximum

19

 

 

 

 

 

 

 

 

 

 

 

 

 

 

5.3.5.                    System Descriptions Count

 

Each project contained system descriptions. This table shows the number of system descriptions.

 

Table 12.b ,System Descriptions Number and Type

3 fictitious

NA

3

5

3

2

3

6

3 paragraphs

3 paragraphs

3 paragraphs

3 paragraphs

3

 

The majority of the projects contained system descriptions.  A system template is presented in Appendix D.  Of this set, three projects did not follow the system template; rather, these projects described the system verbally in a paragraph or less.  92% of the projects contained system descriptions.

 


5.4.             Architectural Analysis

 

As part of the domain analysis, participants undertook an architectural analysis of their domains.  There were three points of data taken with their analysis:  the number of system architectures in each DARE book, the types of architectures in this set, and the word count of the architectural analysis section of the domain book.

 

5.4.1.                    Number of system architecture

 

This table contains the number of architectural images contained in each individual’s DARE book.

 

 

Table 13a, Number of System Architectures

9

14

3

26

4

3

3

6

0

3

5

6

3

 

 

Statistical Summary 13: System Architecture Count

Sample Size, n:

13

Range

26

Mean

6.53

Minimum

0

Median

4

1st Quartile

3

Midrange

13

2nd Quartile

4

Variance

46.26

3rd Quartile

6

Standard Deviation

6.80

Maximum

26

 

 

 

5.4.2.                    Types of Architecture

 

This table lists the architectural type for each architectural image in the DARE book.

 

Table 13.b, Types of Architecture

1

2

3

4

5

6

7

3 flow chart         3 top down,          3 use case

7 activity, 7 module

Object relationships.

Class diagram,

 

Functional flow and their actions

functions and connectors

Pseudo code (1), function and connectors (3)

function and connectors

8

9

10

11

12

13

 

Flow (functions, connectors, data)

class diagrams

flow (functions and connectors)

flow (functions, connectors)

data flow, data flow, data flow, top down, batch sequential

class diagrams

 

 


5.4.3.                    Word Count

 

This table shows the number of words for each projects architectural analysis section.

 

Table 14.a, Word count architectural analysis

852

115

0

1767

179

0

0

0

0

0

30

126

0

 

 

Chart 14

 

 

Statistical Summary 14: Architectural Analysis word count

Sample Size, n:

13

Range

1767

Mean

236.07

Minimum

0

Median

0

1st Quartile

0

Midrange

883.5

2nd Quartile

0

Variance

265476.2

3rd Quartile

126

Standard Deviation

515.24

Maximum

1767

 

 

 

Seven projects included only the architectures diagram but contained no descriptive text.

Six projects included descriptive text.  The number of words ranged from 30 to 1767.

 

5.4.4.                    System Feature Tables

 

Each analyst was to create feature tables of his/her domain system exemplars.  The table below shows the number of tables included in his/her system feature table section.

 

5.4.4.1.              Number of Tables

 

Table 14.b Number of System Feature Tables

1

1

3

3

3

1

3

6

1

1

1 (3 combined)

3

1 (3)

 

Number of Features

This table shows the number of features in each feature table.

 

Table 14.c, Number of  Features in System Feature Tables

4

11

5,11,5

16x3,17x3,20x3

11,11,6

9x4,

10,10,10

6 (6x1 tables)

12

38x3

8

13,14,18

7

 

 

There was a great variety of interpretations of what constituted a feature table.


5.5.             Vocabulary Analysis

 

 

The log of activity included the vocabulary analysis activities.  There was a great deal of variety of interpretation as to what labels to use and what constituted vocabulary analysis.  Therefore it is helpful to redisplay those elements that constitute the vocabulary analysis phase.  Due to the variety, discussing them individually is impractical, but looking at the total time spent on vocabulary analysis is.

 

5.5.1.                    Time invested in vocabulary analysis           

 

Table 4, Total Time Spent on Vocabulary Analysis

4

21

56

12

36

26

NA

3.00

NA

NA

3

20

5

 

 

 

Statistical Summary 4: Vocabulary Analysis Time

Sample Size, n:

10

Range

33

Mean

15

Minimum

3

Median

16

1st Quartile

4

Midrange

19.2

2nd Quartile

16

Variance

129.55

3rd Quartile

21

Standard Deviation

11.38

Maximum

36

 

Of this group there is one outlier, whereby an individual spent 56 hours on vocabulary analysis.

 

These times can be viewed a two groups – those who spent 12 hours or more and those who spent five hours or less

 

 

Table 4.b, Total time spent on vocabulary analysis

<=5

3,3,4,5

>=12

12, 20,21,36

 

5.5.2.                    Manual or Automatic

 

Vocabulary analysis consists of a combination of automated and manual processes.  The following table displays whether an analyst uses automatic, manual, or a combination of both methods.

 

Table 14.c, Vocabulary Analysis: manual, automatic, both

Manual

Automatic

automatic

both

Automated

both

NA

both

both

both

both

both

manual

 

 

5.5.3.                    Original Set

 

Analysts begin with a raw set of domain words to be analyzed.  For those who reported their original set, the figures are recorded below.

 

Table 14.c, Original Set Word Count

 

 

thousands

thousands

thousands

thousands

thousands

 

 

 

10790

 

 

 

 


5.5.4.                    Number of words key word set

 

Following a series of automatic and manual vocabulary analysis methods, analysts derive a key word set with which to create facet tables.  The number of each projects key word set is shown in the following table.

 

 

Table 15, Number of word in keyword set

39

939

650

34

4681

39

48

65

14

4117

32

NA

188

 

 

Chart 15

 

 

Statistical Summary15: Key Word Set Size

Sample Size, n:

12

Range

4667

Mean

903.83

Minimum

14

Median

56.6

1st Quartile

36.6

Midrange

2347.5

2nd Quartile

56.5

Variance

2.764544e+6

3rd Quartile

794.5

Standard Deviation

1662.692

Maximum

4681