Changing code

May 28, 2008

Reading list: Design by contract, by Example

Filed under: Reading list, Software design — Roberto Liffredo @ 7:49 pm

Design by contract, by example is a kind of nice book.
I mean, for me, it failed completely to the goal of pushing towards design by contract, and still was very useful: by stressing on immutable classes and some sound design principles guidelines, and by clarifying a language – functions and objects contracts – that is becoming more and more common because of its expressiveness of responsibilities in interfaces.

I bought it because I was interested in software contracts.
Software contracts, as defined by Meyer, are simply an enhanced (and somewhat, object-oriented) version of the pre and post conditions used in algorithm analysis since Donald Knuth.
The promise is exceptional: provide bug-free components, able even to avoid bugs like the one that affected the Arianne 5 crash. Not only: it should allow for an always up-to-date documentation, because it would be an effective part of the object interface.

Unfortunately, the book does nothing to sell really this concept. Instead, it makes it quite clear that you have to use the Eiffel language to unleash all the power of software contracts.
Well, sort of.
Dynamic languages allows for several extensions to the language, without the need of preprocessors. It is thus possible to add some DBC, like the Pre-Post-Conditions in the Python decorator Library.
Unfortunately, although not as cumbersome as using a preprocessor, these methods are not exactly a complete DBC implementation; moreover, unit tests offer a similar confidence in the software, while much easier to introduce as a development practice.

That said, I would really recommend the book to anyone willing to improve his design skills.
DBC require a completely different design perspective, focusing on a programming style that, by separating queries by operation stress particularly on avoiding side effects, and on a programming style that can easily allow for parallel processing. If it may seems too abstract, the authors show it in a very concrete way, with a complete set of principles and guidelines, and full examples on how to apply them.

The book is also a very good introduction to a terminology (software contracts) that proves very effective expressing concepts and designs – or to find flaws during a review.
Actually, I have found this second aspect even more compelling than the previous. Once the basic idea of software contract is well defined, analysis of interfaces, API and designs is definitely easier; and although these are concepts quite a lot in use nowadays, I still think that an operative definition (even by means of techniques and languages not used in the project) helps quite a lot in their usage and communication.

Design by Contract, by Example
ISBN: 0201634600
ISBN-13: 9780201634600

April 18, 2008

Developer tests

Filed under: Coding, Software design — Roberto Liffredo @ 1:24 am

devtesting

Everybody knows what unit testing is.
Well, maybe not. Actually, in most environments, unit tests are a kind of empty “enterprise” word.

Unit test, in their “purest” idea, should test small unit of code (a function, an object, a file, a package) in perfect isolation; usually, this requires the use of stubs or other kind of fake objects.
Of course, we have to be pragmatic; sometimes, a unit is somewhat bigger than it should, and often it unit will embed a complex system, like a database.

In any case, the main advantage of a unit test is that it exactly pinpoints the source of the error, without the need of complex analysis or debugging sessions. This advantage obviously decreases with the complexity of the system.

That said, unit testing is only one of the possible test practices. With some simplification, we may classify them in the following way:

  • Unit tests
    Focus is on a development unit and its interface.
  • Integration tests
    Focus is on integration between components (like a class and a database) and their integration.
  • Functional tests
    Focus is on complete functionalities of the program, from a user-perspective.

All of these tests have one trait in common: they are developer tests, or tests written by software developers.
We need functional tests, in order to check a feature in its “real” environment.
We need integration tests, in order to work out all possible issues deriving from integrations.
We need unit tests, because they are able to pinpoint exactly an error in a very small unit of code.

All those tests help in raising software quality, and for this reason we should go for all of them, whenever possible.
Of course, this approach may be quite expensive; therefore, we should always have a clear idea of testing scope, in particular what we should test, and when we should run a test.
JUnit website makes it quite clear: we should test what could reasonably break, and we should run tests every time code is changed.
How this translates into actual projects, depend on several factors; my own suggestions is to take it as literal as possible, because it will greatly enhance the developer confidence in the final code, and finally lower the total effort for software development and maintenance.

Some further reading:

December 20, 2007

Scaling performance

Filed under: Software design — Roberto Liffredo @ 7:18 pm

It is quite common to see software that does not scales well.
Software that works well on a developer machine, and suddenly it start crawling when using real data.
I have seen it in my teams, and usually it escalates to a critical emergency: all other development stops, until a solution is found, if any.

Use real-size test data for development

Development is usually performed on ad-hoc databases, with few items created just for basic testing.
While this may be a useful tool for correctness tests, you will hardly ever find performance issues. Ok, you may anticipate them and add index to database at will, but there is no general recipe for performance; and without a good data set, such issues sooner or later will appear.
Only testing with real-size data allows for performance findings.
Note that real data means also real use cases, with target performance: good performance is only useful as long as customer appreciate it, otherwise is a worthless effort.
Therefore, once equipped with a real data sets and scenarios, start profiling. Measure the performance, and make a plan. Where is the bottleneck? Is it possible to optimize that part? What is the current gap in comparison to the target?

Assess complexity of every class and methods

While the former approach is correct, I think we should also make a small pro actively step back, at our programming courses, when we have been taught about algorithm correctness and complexity, and how to assess it. Yep, this is a kind of lost knowledge: with the notable exception of STL, it is quite uncommon to see such analysis in the code or in its documentation.
This should not be an obstacle though, and a very useful tool for tracking complexity is the so called “Big-O notation“.
Wikipedia has quite a good explanation, but roughly speaking, this notation gives an idea of algorithm complexity as function of the size of the input data. For instance, a linear algorithm will require about double of the time to complete for a double set of data, while a quadratic one will require four times, and a constant one will return a result in a constant time, no matter the size or the kind of the input data.
It is an invaluable tool for assessing algorithm complexity; unfortunately it is not always so easy to use: a good analysis may involve calculation of series, and special thinking on statistical data; but in many cases, even a wrong, pessimistic, calculation is already a good starting point.
Of the various complexity classes (again, take a look at Wikipedia article for a complete listing) the most important are (from “faster” to “slower”):

  • Constant
  • Logarithmic
  • Linear
  • Quadratic
  • Polynomial
  • Exponential

Generally speaking, you should try to avoid everything more complex than linear. Of course, this is not always possible, but then you should beware large sets of data.

Be humble

When your application is slow, don’t blame the operating system, the language or third party libraries.
Don’t even say “it is an heavy calculation, hence it must be slow”.
You can say whatever you want, but your application will still be slow.
Instead, assess performance, ask for a target, and start profiling. Analyze and then try to remove unnecessary complexities.
Be humble, and fix your errors.

December 14, 2007

Memory management in python

Filed under: Coding, Software design — Tags: — Roberto Liffredo @ 6:03 pm

It is a quite common pitfall in python: trying to directly use knowledge gathered within other programming languages, like Java or .NET.
And, usually, ends with something like “Puah, I don’t like python”.

For instance, garbage collection.

Mainly, Python memory management is implemented through reference counting: as soon as the number of references of an object reaches zero, it is deleted. What “delete” means, then, depends on the actual implementation of the python VM.
CPython, for instance, uses delete the object: this means that the C++ destructor (or, in case of Python objects, the __del__ method) is called, and the memory may be released (or may return to a common pool handled directly by the python memory manager, in case of “simple” objects like integers); in other words, CPython uses deterministic finalizers.
However, this behavior is not guaranteed on other implementations, like JPython or IronPython, because of the different underlying memory model.

Reference counting has several advantages because it is easy to implement, fast, and predictable; on the other hand, it is not able to handle some cases, in particular circular references.
Circular references happens when items in a container maintain a reference to the container itself; in this particular situation, when the container goes out of scope its reference count will still remain higher than zero, and hence will cause a memory leak.
For this reason, since Python 2.0, there is a new module, called gc, that perform some garbage collecting.
Its main and sole purpose is to handle those particular situations, and does not change nothing on standard memory management.

Back to our initial problem, this is quite a big difference with, for instance, C#, where calling GC.Collect() will effectively deallocate all pending objects. In python, gc.collect() will simply run a “check” for circular references, and deallocate if necessary all pending object in such state.

This is the reason why, in case of problems with memory deallocation, calling gc.collect() in python is in most cases almost useless.
In python, such problems are symptoms of flaws in the design, and blaming the language because it does not behave like other certainly do not help fixing them.

December 7, 2007

KISS

Filed under: Software design — Roberto Liffredo @ 8:00 pm

Software design should always follow the KISS rule: Keep It Simple, Stupid.

  1. Trying to guess future business cases is job for Program Managers, not developers.
  2. The simplest design that fits the solution is usually the best you can conceive.
  3. Refactoring is a developer best friend; in case future requirements will require an extension of the design, simply do it.
  4. Software can (and should) change over time.

Nevertheless, this is something difficult to achieve.
How many times we put a singleton, simply because we liked the pattern?
How many times we tried to anticipate the needs of a library, spending quite big amounts of effort in developing a flexible and extensible architecture that no one but us would use?
We keep creating something flexible and powerful, which in most cases is simply a waste of time; moreover it leads to over-complicated designs, with lower performance and difficult to maintain.
And very often all that flexibility is even not used at all!

I think, like many other Agile methodologies it is a way of thinking.
Because we all know that interfaces are sacred cows of design, we want them to be right at th first iteration, and not to change them.
Because we are going to design a piece of software that will stay there in the core of the system, we want to prefigure out as many possible use cases as possible.
Because we love what we do, we want to think that it will be so clever that no one will attempt to (radically) change it.

And we forget that interfaces may be changed (or added), that refactoring is our best friend and that in general software is so easy to change.

The Shocking Blue Green Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.