I am interested to know what other peoples’ experiences are with managing development infrastructure are. I am talking about things like the build server, the central git repo etc etc. Any infrastructure which end users would probably not even know existed, but which are essential for the development team.
- Do your dev/devops/ops teams consider them to be as important as ‘true’ production systems?
- What happens when one goes down (both in terms of ramifications and recovery processes)?
- Who should be responsible for managing them?
Note: to complicate things (especially (3) above, our “teams” are small, about five or six devs and only the one sys admin. Hence, there is a lot of crossover of us devs in to ops, especially with the advent of the “devops” movement and technologies.
Disclaimer – I am the author of this blog post about the subject.
That’s a good question. At companies that I’ve worked at they are not quite considered a “true” production system – after all they are not used by customers and they’re not externally facing. But, if they go down, it will bring all IT development to a halt, so they realise that these boxes are important. Bug fixes are impossible, developing new features is impossible. It would be the same as a data entry team losing access to their data import application. They are as important as any other internal system.
So to answer your questions:
-
No. They do not have fail-overs. If they stop working for 30 mins or go down overnight it’s not the end of the world. But they need to be working for the majority of the time.
-
Everybody in the IT dev department walks over to the ops guys and pokes them in the back until they are back up.
-
The best approach I’ve found is that the tech lead has full admin rights to the box, but the underlying hardware is still managed by ops. If the tech lead does something stupid and kills the box it’s up to him/her to fix it 🙂
It sounds like what you need to do is a bit of education with your sysadmin over why Source Code Control and CI tools are vital to development.
In a large organization that does best practice software development the development infrastructure is the domain of IT – as it should be. These servers are usually locked down just like any other server, and developers do not have root access to these machines. (Would you give the Office Admin root access to the corporate email server) IT will have an SLA with development that covers availability, backup, performance etc. The IT guys provide support for the raw Software development tools – GIT/Jenkins/Redmine/Clearcase etc – security, license and user management, software updates etc. There is usually a tools team who specializes in building and customizing the tools on the servers and ensuring the developers have the tools needed to get the job done. The tools team often reports to the development manager, but works very closely with the IT team, and are usually software developers in their own right -the nature of the beast is the two teams are very closely tied – but there clear lines of responsibility. The tool team may be part of IT, but not all IT have the skills.
Few companies work this way. IT does not ‘get’ software development and software development doesn’t ‘get’ IT. IT likes stability, security and continuity of service. SD wants the latest – they want it all, and they want it now. As a result, in virtually every organization I have worked for, SD puts in some ‘backyard’ servers ‘off the radar’ of IT. IT tolerates them – as long as they don’t cause problems – but wash their hands of them completely. The backups are often lacking, and SLA – heck, if it breaks, “we built it, we’ll just fix it”. Tools are often few hacked-up scripts that kinda work most of the time, and there is no real budgets for the system. It doesn’t really matter when it breaks – because no one really knows what software developers do all day, so it gets fixed and the time is put against “feature XYZ” of what every product has a project budget big enough…..
So to answer your question: Development servers are usually no considered, so they have no idea of their relative importance. The correct solution is they are considered in the business planning.
Usually what happens when one fails, as I have said – it gets fixed, manager run around asking why, promise to fix it, and it all goes back to how it was when the cost out doing it properly…….
Managing development infrastructure – that is IT (OPs I think you called them) – production people. Software developers should not be allowed near development infrastructure except as users.
In small house, IT and SD are often the same people, in this case, its important for them to understand the difference in the rolls and learn to put on and take off “hats”.
In response to your blog ….
You are both right. The production server is more important, its failure will create an instant outage, but that does not make the dev Infra. unimportant. If dev systems go down (unrecoverable), so does the business. Somewhere, there is a cost benefit ratio between cost to protect and cost of failure. The business has to decide how much it is willing to invest is disaster resilience. Who is best qualifies to quantify that? Hint: it’s not a software development
1
I believe a development team has to look after the entire infrastructure. Development team should be experienced and the entire software development of any IT Company should be taken care by the development team. However, it is also important that the marketing team give full cooperation to the development team.