I would like to know how the data representation is exposed in slide 7 of information hiding:
-
Modifying an exposed data representation propagates to all code which directly accesses that representation
- Perhaps the best example of the impact this can have is the Year 2000 problem
- Legacy software for applications as diverse as nuclear power station control, air traffic control, finance, and the military were coded using exposed data representations
- To ensure the software will work correctly come 2000 every single place that uses the date representation needed to be changed to store years using 4 digits rather than 2
- The cost of this conversion has been estimated in billions of dollars!
-
Exposed data representation leads to change propagation … affects maintenance costs … big trouble
I would also really like to know whether information hiding means ‘hiding the data using visibility identifiers(public,private)’ or whether it means ‘hiding data representation’.What actually does data representation mean?Could anyone help me.
When talking about a system (or a module, or class, or structure), there are two representations of data : internal and external (I don’t want to call them public and private as not to mix their meaning with OOP). The external is representation that is used by every single thing that is using a system. There is a risk, that changing this external representation will require change of every thing that depends of this representation. You rarely have full control of all of those things. Internal representation on the other hand is only in one place and accessed by controlled set of functions (do not mistake with function in programming language). When this representation changes, only those functions have to change. So change is much easier (and thus cheaper) for internal representation than external representation.
The Y2K problem was caused by one simple fact: that the internal representation was also an external one. Every single piece of code relied on a fact, that year is represented as 2 digits and that before those there is always 19. If, on the other hand there was a clear separation, then the internal representation would use the two digits, but the external would be full year (all 4 digits). This would mean that only thing that needs to change would be the internal representation and function that converts the internal to external representation (and other way around).
1
Let me start by answering your questions:
-
Data representation is the storage type of the data and what meaning the possible values of that data have. For example, a Boolean value may be stored as a single bit, where the bit being
1
means truth and0
means falsehood. -
Data representation is exposed by making something with a type (variable, class member) visible to outside.
-
Information hiding means using language constructs to prevent the inner workings of a module from being visible to other code. “Inner workings” can mean anything that makes the module tick, such as the fact that a piece of data exists, its representation and how values are determined (stored, calculated or both).
Here’s a practical example that should answer the question I think you’re really asking based on some of your comments:
Let’s say you’re developing an application that needs a class to represent a date. You come up with something that makes no effort to hide information, where all members are public and there are no methods to manipulate them:
class Date { // Irrelevant parts left out for brevity
...
public int month; // Month of year, 1 = January
...
}
Your team goes on to write many thousands of lines of code that use Date
, all of which assume that the first month of the year, January, is represented by the value 1
and reach directly into the structure to access it:
Date foo;
...
foo.month = 1; // January
...
One day, someone discovers that a critical function would be much more efficient if it’s given zero-based months (0
for January, 1
for February, etc.) instead of having to do the arithmetic to produce one every time it reads or writes the month
member of a Date
. Management decides the business case for making the change is a good one and says to make it happen.
The critical function is re-written to use zero-based months, which means that every other line of code in the application that deals with months is now broken because it doesn’t use the correct values. It may compile because the value is still an integer, but it’s no longer using the correct values. What was January (1
) is now February, and what was December (12
) is no longer a valid value because zero-based months end at 11
.
(This example is a bit contrived, because good practice would dictate having the class define constants for well-known values like Date.MONTH_JANUARY
, but bear with me. The principle still applies.)
Correcting the broken code gives you a lot to do:
- Identify every use of a
Date
in the entire application - Check each of those uses to determine whether or not it makes any assumptions about the value of the
month
member - Change those that do to use the new rules
- Re-test all of it
- Get on your knees and pray you didn’t miss anything
The costs of this effort can be huge in terms of labor to make and test the changes and risk to the business if a mistake (or the correction of one) would result in lost revenue.
Had Date
been developed using information hiding in the first place, there would be ways to manipulate the month that are independent of how it’s stored:
class Date {
...
private int month; // Month of year, 1 = January
...
// These methods operate on the month as a 1-based value and
// are the ones all of the existing code uses.
void set_month(int new_month) { month = new_month; }
int get_month() { return month; }
}
Making the change to zero-based storage makes your to-do list a lot more pleasant and a lot less risky:
- Re-comment the
month
member to indicate that it’s now zero-based - Change
set_month()
andget_month()
to do the arithmetic to convert to/from one-based to zero-based on the way in/out - Test the changed methods for correctness
- Add and test methods to do zero-based operations on
month
so the new version of the critical function can use them
The class now looks like this:
class Date {
...
private int month; // Month of year, 0 = January
...
// These methods operate on the month as a 1-based value and
// are the ones all of the existing code uses.
void set_month(int new_month) { month = new_month - 1; }
int get_month() { return month + 1; }
// New methods that operate on the month as a 0-based value
void set_month_zero(int new_month) { month = new_month; }
int get_month_zero() { return month; }
}
The key here is, again, that nothing outside the class knows anything about how the month is stored or even if it’s stored at all. All they see is two methods called set_month()
and get_month()
that have to be used to store or retrieve the one-based value. As long as the new implementations of those methods can be proven to behave as they always did, there’s no need to touch the thousands of lines of code that use them. The critical function can use the new set_month_zero()
and get_month_zero()
methods to do a zero-based manipulation of the month.
1
How? Well, by making it public and writable.
The data representation in this case is an int
value for each Date component, but you’ll notice that this isn’t actually a particularly fitting choice – day numbers do not range from –2,147,483,648 to 2,147,483,647, they range from 1 to 31. The point is not so much keeping it a secret from the API clients that the day number is stored as an int but to prevent them from invoking the full representational power of int
when manipulating a Date
.
With public fields, every user of the class has as much power over the values as the writer of the class: absolute power. In particular, they can set nonsensical values (e.g. dayOfMonth = 50) or values that are fine in themselves but conflict with other values (dayOfMonth = 31, month = 2), or values that are even more subtly wrong (dayOfMonth = 29, month = 2, year = 2001). This kind of restriction of the possible values and value combinations is exactly what the class is ideally suited for, but it can only do its job correctly if nobody else has the same power. That is why it’s a good idea to have an advanceDay()
method instead of publicly writable fields.
7