Who is AIS? News about this site Reviews of CASE products Reading - White Papers CASE technology training  & consulting CASE technology training  & consulting
Who is AIS? Users' Groups & Feedback Opinions - We have strong ones! Links to Other Pages of Interest Training & Consulting
AIS CASE Home Page Site Table of Contents Private Pages E-mail to AIS  

The Role of

03 Dec 1997

Data Modeling




Why Model Data?
What is a "Model"?
What is Data Modeling?
What are the benefits of Data Modeling?
Where else to go from here


Why Model Data?

(see also Why Build a Model?)

For an information system to be useful, reliable, adaptable, and economic, it must be based first on sound data modeling, and only secondarily on process analysis. We can make this categorical statement because the structure of data is inherently about truth, whereas process is about technique.

We know that steel can be made by any of a number of processes: Bessemer, open-hearth, basic oxygen, etc. All will yield steel, albeit of varying quality, quantity, and cost. Yet steel is always made from inputs of iron and carbon with the addition of other "alloy"raw materials. Thus among the unvarying structural truths about steel is that regardless of what process is used to make it, steel is made from some proportions of several raw materials. Our interest in steel may not include the raw materials or proportions but their existence is undeniable and universal. A complete data structure designed to describe steel must make provision for recording the materials used.

In another example familiar to many readers, people also have data properties whose values vary but whose structure is universal. All people have gender, height, weight, eye color, etc. And for many years bureaucracies have also deemed that people are single or married. Recently, however, many institutions have been forced to recognize that marital status reflects social processes, rather than human physiology. A person may be unmarried yet in a permanent relationship with another of the same or different sex.

Not only does our society struggle to adjust to unconventional forms of personal relationships but in a smaller way information systems are stretched and twisted to record unanticipated facts which were, for example, mistakenly attached to people when those facts belonged instead to relationships.

This human example demonstrates a common flaw of information systems which fail to differentiate a process-defined view from the underlying structure of things. If the original designers of those information systems had exercised data modeling disciplines, they would have discovered the appropriate structural location for each kind of data, and thus avoided the painful task of later remodeling a system already in production.

To better understand the virtue of modeling structure versus process, consider your house. When designing, or even purchasing, a house to be one's home, one should have a fairly sound idea of the life style(s) which one is likely to lead while residing there. Swinging singles stay out all night, party hard, and never mow the lawn. Young parents with two careers need room to grow without over committing to housekeeping chores. Mature folks like me seldom cook and are eager to downsize before the kids move back in. All these process considerations are part of our life style and should be accommodated.

What if the young bachelor gets both wife and steady job (since they frequently arrive together)? He often sells the house and, with his new wife, finds another more suited to their new life style. This works for people who don't mind moving often. However when we translate that experience into a business enterprise, it means throwing out the old systems at great cost and disruption, then waiting anxiously, interminably for the arrival of the new.

Some people designing houses, and some organizations designing information systems, have learned that thought invested in planning a structure flexible enough to suit changing processes possibilities yields immense dividends in terms of work not thrown away and opportunities not lost waiting for something better to be built.

Furthermore, when the time comes that the house, or information system, must be remodeled (not scrapped), then the cost to proceed is high and the prognosis for success is low if there is no reliable blueprint.

House = System
Blueprint = Data Model
Life Style = Process Model

The lesson, then, is that a house built on a good blueprint can be used for partying, raising children, conducting gourmet cooking lessons, running a home office, throwing pottery, or reading old books. An information system built in a good data model can likewise accommodate new ways of doing business, new lines of business, even new businesses - without throwing out the system.

From these and hundreds of similar examples over the years we observe the first law of data design: data structure in an information system need not be complete but it should be accurate. Data modeling is the means to achieve this end.

Note: Current reliable estimates indicate that 5% to 15% of all information systems projects are ever completed - not under budget and on time, but ever! So your organization's wait for your next information system is probably going to be longer than you think.


What is a "Model"?

A model is a symbolic or abstract representation of something real or imagined. One often builds a model, as of a building or theme park, in order to visualize the design before the real thing is constructed. For a city architect, a computer simulated helicopter ride 200 feet above planned streets may reveal potential traffic snarls and unusable parks. For a graduate mathematician, a model expressed in half a page of Greek alphabet may immediately convey what could not be explained in three hours' discussion.

Likewise a data model helps us visualize data structures to gauge how completely and accurately they reflect our information system problem space. Of course we greatly prefer improving a design before any system is built since design changes generally cost less than one-fourth as much as code changes.

But of equal importance is a model's ability to present our designs revealingly. Data modeling concisely represents a large body of dry, highly repetitive material which tends to obscure the more subtle and powerful design facts of a complex system. We can discover more truthful structures and anticipate less obvious uses of a design when we can see it compactly as an integrated whole, rather than as voluminous text and numeric listings of individual elements.

The essence of a model lies in efficient representation, achieved by eliminating uninteresting detail and substituting symbols for bulkier components of the subject. Thus a model need not be simply a smaller copy of the real thing; it may use words, pictures, numbers, or any combination of media. So a data model drawn on a few pages of diagrams can represent the structure of a database which occupies mega- or gigabytes of database storage.

While the fundamental concepts of data modeling are now well established, new tools and techniques continue to evolve. Some of them attempt to improve on the ease of use or flexibility of the tools; others have also extend the concepts of data modeling to include additional information about the structure of data. Thus data modeling itself follows the second law of data design: data structures will expand over time, as we continually learn more about a subject.


What is Data Modeling?

(see also What are Data Models?)

  Note that the diagram above is not to be taken literally. It shows not so much the steps in sequence as the categories of results in a CASE approach.

We differentiate between classical (i.e., obsolete) passive data modeling versus modern active data modeling. The former included typically included steps 1 through 4 above but stopped at the point of pretty pictures on the war room walls. Without producing actual coded results, this was incomplete, ineffective and frustrating.

In the early rounds of CASE, through the mid and late 1980's, most organizations attempting large scale data modeling through such passive tools became disillusioned and abandoned their attempts. In our work with hundreds of enterprise clients on five continents we see over 90% of the early CASE products sitting unused in IS closets, representing hundred of thousands of dollars per site in product, equipment, facilities, training, and lost opportunity.

However, with the advent of active CASE tools about the same period, the passive discovery and recording process could be implemented directly through generated code - at least DDL and in some cases much more.

Please see also our "Cone over the Valcano". `



What are the benefits of Data Modeling?


The act of abstraction expresses a concept in its minimum, most universal set of properties. A well abstracted data model will be economical and flexible to maintain and enhance since it will utilize few symbols to represent a large body of design. If we can make a general design statement which is true for a broad class of situations, then we do not need to recode that point for each instance. We save repetitive labor; minimize multiple opportunities for human error; and enable broad scale, uniform change of behavior by making central changes to the abstract definition.

In data modeling, strong methodologies and tools provide several powerful techniques (discussed further below) which support abstraction. For example, a symbolic relationship between entities need not specify details of foreign keys since they are merely a function of their relationship. Entity sub-types enable the model to reflect real world hierarchies with minimum notation. Automatic resolution of many-to-many relationships into the appropriate tables allows the modeler to focus on business meaning and solutions rather than technical implementation.


Transparency is the property of being intuitively clear and understandable from any point of view. A good data model enables its designer to perceive truthfulness of design by presenting an understandable picture of inherently complex ideas. The data model can reveal inaccurate grouping of information (normalization of data items), incorrect relationships between objects (entities), and contrived attempts to force data into preconceived processing arrangements.

It is not sufficient for a data model to exists merely as a single global diagram with all content smashed into little boxes. To provide transparency a data model needs to enable examination in several dimensions and views: diagrams by functional area and by related data structures; lists of data structures by type and groupings; context-bound explosions of details within abstract symbols; data based queries into the data describing the model.


An effective data model does the right job - the one for which it was commissioned - and does the job right - accurately, reliably, and economically. It is tuned to enable acceptable performance at an affordable operating cost.

To generate an effective data model the tools and techniques must not only capture a sound conceptual design but also translate into a workable physical database schema. At that level a number of implementation issues (e.g., reducing insert and update times; minimizing joins on retrieval without limiting access; simplifying access with views; enforcing referential integrity) which are implicit or ignored at the conceptual level must be addressed.

An effective data model is durable; that is it ensures that a system built on its foundation will meet unanticipated processing requirements for years to come. A durable data model is sufficiently complete that the system does not need constant reconstruction to accommodate new business requirements and processes.

Furthermore, as additional data structures are defined over time, an effective data model is easily maintained and adapted because it reflects permanent truths about the underlying subjects rather than temporary techniques for dealing with those subjects.

These are the same goals sought by object-oriented techniques!

We have known for twenty years that, regardless of the fad of the moment, all aspects of information systems design benefit from the same traits. Designs should be clear so we can determine their correctness and suitability. Code and structure which are repeated should be defined centrally and shared as widely as possible. Code and structures should be designed to a deliberate target on the features/cost/performance curve. And we want to be able to use, reuse, and even abuse our designs for years, not months, between overhauls.


Where else to go from here

Top of Page
Data Modeling Methodologies
Data Modeling - A Fresh and Simple Look at The Fundamentals
Logical Modeling

Copyright 1997-2000 Applied Information Science