This is my first question so please indicate if my question is too vague and not understandable. My question is more related to High Level Design. We have a system (specifically an ATCA Chassis) configured in a Star Topology, having Master Node (MN) and other sub-ordinate nodes(SN). All nodes are connected via Ethernet and shall run on Linux OS with other proprietary applications.
I have to build a recovery Framework Design so that any software entity, whether its Linux, Ramdisk or application can be rollback to previous good versions if something bad happens.
Thus I think of maintaining a State Version Matrix over MN, where each State(1,2….n) represents Good Kernel, Ramdisk and application versions for each SN. It may happen that one SN version can dependent on other SN’s version.
Please see following diagram:-
So I am in dilemma whether to use Package Management Methodology used by Debian Distributions (Like Ubuntu) or GIT repository methodology; in order to do a Rollback to previous good versions on either one SN or on all the dependent SNs. The method should also be easier for upgrading SNs along with MNs.
Some of the features which I am trying to achieve:-
1) Upgrade of even single software entity is achievable without hindering others.
2) Dependency checks must be done before applying rollback or upgrade on each of the SN
3) User Prompt should be given in case dependency fails.If User still go for rollback, all the SNs should get notification to rollback there own releases (if required).
4) The binaries should be distributed on SNs accordingly so that recovery process is faster; rather fetching every time from MN.
5) Release Patches from developer for bug fixes, feature enhancement can be applied on running system.
6) Each version can be easily tracked and distinguishable.
Thanks
2
Imo, you want Puppet or a similar configuration management tool to satisfy req 1, 3 and 5. You can even manage puppet config in a git repo to satisfy req 6. It also sounds to me that using Debian packages would be better than git repos in your case for the following reasons:
- dependency checks are integrated into dpkg (req 2)
- apt-get caches binaries locally (req 4)
Of course, all of these can be reimplemented without reusing a configuration management tool and a package manager, but this sounds like more work. Finally, remember that no configuration tool is ideal, and the following issues may still arise:
- an upgrade of some software entity breaks the configuration tool itself
- a new version of software entity changes configuration in a backwards-incompatible way, e.g. by updating a database scheme and a rollback becomes problematic
- common package managers such as APT may produce surprising behaviour (example)
Hence, it is still important to define your changes properly (scope, time window etc.) and roll them out in stages.
1