API Backward compatibility

Every year a project grows more complex the difficulties of managing backward compatibility become more complex.
 
The strain imposed on APIs as time passes is tremendous. Developers want APIs to be extended to exract every ounce of functionality. Bugs require fixing, often requiring APIs to be modified or sometimes requiring complete removal or deprecation of a subset of the API. Sometimes the changes are to internal behavior, causing the original expectations to no longer be as originally described.
 
This increased complexity causes strain between organizations designing and publishing these APIs and those consuming them. Every time a change occurs it affects consumers and may break their applications.
 
Old school methodology for dealing with API compatibility was founded in statically linking an API provider to an API consumer. Thus, there was never a conflict between different versions of an API since the consumer simply bundled the exact API version needed.
 
Later, dynamic linking came along and using metadata and version schemes, consumers could describe specific versions (even ranges) of APIs which would suitably satisfy the API dependency. Dynamic loading tricks allowed the "system" to match API providers with consumers. Often these tricks were as simple as file system path and file naming conventions. This is still a very common method to this day.
 
API version management complexity increased significantly when dynamic languages, particularly Java, arrived on the scene. Naming conventions (and language features) were still in use for aggregating APIs into "packages" and further into jars (fancy libraries + metadata).
 
However, the notion of matching API by "version" was practically lost. The workaround or rather the argument for this loss was based on the idea that a given application should specify, at time of execution, which libraries would be available by what is known as the "classpath". Furthermore, at runtime, another Java feature called a "classloader" could further create a "context" within an application within which a new (different) set of libraries would be accessible. However, version matching is still missing.
 
What does it mean that version matching is missing from Java API management?
 
Without version matching, when APIs change as described earlier it's virtually impossible for developers to account for those changes in a declarative way. Developers must largely resort to trial and error in order to identify and resolve API compatibility problems. If they are lucky the changes are evident during build time due to compile errors (stemming from calls to some API which no longer match the API's definition), or worse; runtime errors (those pesky, unpredictable errors which could end up costing you real money).
 
As API developers, it would be ideal to have an enforceable mechanisms in place which would warn about backward compatibility problems and their severity, at build time. The result of these warnings would be action by the developer to properly declare the API version and rational, or even to re-engineering their change so the API is not affected. 
 
The declaritive portion of this type of change would allow for semantically rationalizing APIs between consumers and providers using version matching.
 
This notion of "semantic versioning" is not a new concept and has been defined and put to use by various organizations and technologies over the years. One major proponent of semantic versioning is the OSGi Allaince. The OSGi Allaince has defined their definition of semantic versioning in a technical white paper by the same name [1].
 
But what about software products which are not following the OSGi model? How could they leverage "build time" analysis of semantic versioning as proposed above? 
 
The key is to rely on at least one aspect of the OSGi model; package versioning. By appling this very simple declaritive practice, using some existing build tools, adding in a few relatively simple modifications, and it's possible to build a basic model for any build system to deliver semantic version warnings to developers.
 

Following is an outline of the process and assumptions:

1) When a developer first obtains the sources of a project, all the source code is declared as some version. This version is either an aggregate version for the entire project, it may be per package. Either way, from the point of view of the developer the version of the code is accurate (since it's from the origin repository).
 
2) The build tool used to build the project can deliver semantic version build warnings.
 
3) One build must occur in order bootstrap semantic version checking (libraries are build and initial versions are depositited into a repository, called the "baseline repository").
 
4) The project now stands in a state in which the build can check againts the baseline repository and give semantic version warnings (this process is now called "baselining"). At this stage there should be no warnings.
 
5) The developer makes some change. The build should be able to detect whether the change requires a version update. 
 
6) The unit of change will be the java package. The change will occur within the "packageinfo" file. Correctly altering the file will silence the build warning. Depending how sever the warning is, MAJOR or MINOR, and which type of work the developer is doing (e.g. bug fixing vs. new development) should reflect how the developer solves the wraning.
 
7) Anytime a warning occurs which causes the developer to make a version change, it is almost a guarantee that documentation needs to be updated. Exactly what type of change (javadoc, developer docs) depends on the type of change.
 

A prototype

is currently in development and can be found in the following github repository [2] in the "semantic-versioning" branch. The code uses the bnd [3] library and extends the default ant build task [4] into a self-initializing semantic version warning system.
 
I welcome anyone's thoughts on the subject!
 
[Updated Aug 8, 2013: 17:43]
 
I built Liferay 6.1.0-ga1 with the semantic version warning tools added. I then baselined liferay-portal:master@266cc47216 (+ semantic versioning) against it. You can see the outcome here [5]. You can see from lines 168-175, 236-380, 397-404, 496-797 where API changes.
 
Blogs
"The build should be able to detect whether the change requires a version update." - how would that happen? Clearly, if a method signature changes, that's probably cause to rev the interface version, but what if the change is in some sense compatible - e.g. you added a new, but optional parameter, with a sensible default? Or, what if the return type doesn't change, but the logic inside a method changes so that output is now different? At what point is it up to a developer vs. an automated process? Also, what about the notion of interface stability? Is it possible in the OSGi world to declare an unstable interface, that will probably change, but developers are free to bind to it (with the understanding that it might change/disappear)?
Obviously logical changes are hard to detect. But you usually do so in fixing a bug or other type of flaw which should still require the package to indicate a change using a micro version update.

Meanwhile, those are the least likely to break other peoples code (although when they do they are the hardest to resolve).

By far the largest "manageable" set are those changes which directly alter the API: adds, removes, changes. These we can teach the developer how to deal with by using tooling, which is what I'm trying to accomplish.