How Many DBAs Do You Need To Support Databases?

I frequently get asked the question of how many databases a DBA typically manages. Over the past five years, I have interviewed hundreds of organizations on this topic, asking them about their ratios and how they improved them. Typically I find that the current industry average is 40 databases to a DBA for large enterprises ($1 billion+ in revenue), with the lowest ratio seen around eight and the highest at 275. So, why this huge variation? There are many factors that I see in customer deployments that contribute to this variation, such as the size of a database, database tools, version of databases, DBA expertise, formalization of database administration, and production versus nonproduction.

This ratio is usually limited by the total size of all databases that a DBA manages. A terabyte-sized database remains difficult to manage compared to a database that's 100 GB in size. Larger databases often require extra tuning, backup, recovery, and upgrade effort. The average database-to-DBA ratio is often constrained by the total size of the databases being managed, which tends to be around five terabytes per DBA. In other words, one DBA can effectively manage 25 databases of 200 GB each or five 1 terabyte databases. And these include production and nonproduction databases.

What are the factors that can help improve the ratio? Cloud, tools, latest DBMS version (automation), and DBMS product used – SQL Server, Oracle, DB2, MySQL, or Sybase. Although most DBMS vendors have improved on manageability over the years, based on customer feedback, Microsoft SQL Server tends to have the best ratios.

I believe that although you should try to achieve the 40:1 ratio and the 5 terabyte cap, consider establishing your own baseline based on the database inventory and DBAs and using that as the basis for improving the ratio over time.

I would love to hear about your database-to-DBA ratios and what's working and what's not in your organization when it comes to database administration.

Cheers,

Noel

Comments

Noel, A better question might

Noel,

A better question might be -- how many databases do you need? A plethora of databases means that the total overhead time starts to increase as a percentage of DBA activity. Using the 5TB/DBA metric as an example, there would be a world of difference between 40 databases at 125GB each vs. 5 databases at 1TB each.

I'd be curious to hear what drives the creation of many distinct databases and what barriers preclude running with fewer, larger databases. I'm guessing that a big part of it is DBA competence and that the drive toward lower cost (and therefore less competent) DBAs is resulting in many smaller, less "complex" databases...

Thoughts?

Very valid point

I see larger corporations attempting to consolidate their databases as much as possible, and with the new hardware that can do both OLTP ànd Data Warehousing this shouldn't be such a problem anymore.
Ease of management is a great argument for consolidation!

Noel, 1) Do you have a sense

Noel,

1) Do you have a sense for how many databases a DBA supports in development as opposed to production? Care and feeding of dev versus prod database is completely different.

2) Do the numbers you state treat all SDLC copies of an individual database as one (e.g. do dev, val, prod acceptance test, training, prod count as 1) or count each one separately? Depending how you count, the DBA that manages the XYZ database manages 1 (SDLC collection of databases) or 5 (aligned with the SDLC examples above SDLC instances of XYZ).

A data point with MarkLogic

One of our largest implementations scales out at 1TB of XML per MarkLogic Data Node (dual-CPU server). That DB is currently around 90TB, though that's flexible, as the customer can scale-out further at any time. Total contractual DBA requirement is one-half an FTE, to support a total of 93 servers, including some Evaluator Nodes. Actual hours billed for DBA activity has averaged well under 20 per week.
Across this and other large implementations, there just isn't that much that a MarkLogic DBA needs to do: backup is typically automatic; recovery is a rare activity; and even major DBMS software version upgrades can be scripted and deployed across the cluster in under ten minutes, with everything back up for read/write access right away. (After an upgrade, reindexing will continue in background, which may take hours or days, after which new features that depend on new indexes become available. But it's fully available the whole time.) Adding a node to scale-out for an additional terabyte of capacity is a five-minute operation with no cluster downtime.
Another factor in the “not much to do” is reliability. Although that system never had formal HA requirements or design, it typically goes months between outages, with time-to-restore around five minutes in most cases--typically hardware component failures. This is usually a failure of an (unmirrored) local disk or HBA, and the solution is to activate a warm-spare server, mounting the DB partitions of the failed node. That could have been automated with HA-style failover, but that wasn't a requirement for this particular system. With two years of Production experience, reliability is high enough and DBA hours low enough that it would not have justified the complexity and cost of implementing auto-failover and hardware redundancy (e.g. duplicate HBAs and FC per server). We did gradually add mirrored local OS disks, since that was the most common failure. The customer is now moving to a virtualized environment automates failover for server-level problems. If a hardware node fails, the system will bring its VMs up on other nodes. To any of our servers, it looks like a momentary power outage on those (virtual) machines.
Of course, all that applies to MarkLogic, which is a NoSQL DBMS. The examples you gave were RDBMS. My experience with RDBMS DBA requirements has been quite varied. I've seen systems that achieve a degree of stability so long as nothing changes. But I've also seen an extremely mission-critical system, with under 100GB total data, that required several DBAs on 24-hour rotation, just to maintain .95-ish uptime. It had been subject to wrenching, urgent changes that progressively degraded its relational data model. A successor MarkLogic system is proving to adapt much more readily, primarily because of the schema-agnostic nature of the DBMS and query language.

Noel - My team lead and I

Noel -

My team lead and I administer over 750 Informix databases between the two of us. I am the primary DBA and could easily handle the administration myself and pretty much do - with my co-worker focusing almost exclusively on managerial type duties. And while I can provide no concrete evidence I believe there are many Informix DBA's who handle an equal or greater number of databases based on feedback I get from other DBA's at conferences and in online forums. In addition to Informix we are beginning to manage SQL Server 2008 installations, though we have had to use additional outside help. To this point SQL Server, with all of its fancy 'point and click' features require about five times as many DBAs as Informix - but I am sure that number will go down to only twice as many once we all get fully up to speed with our mouse's features.... I understand you are using averages in your study but I did want you to know my experience.

Mike Magie

Art's response

Art makes a great point below re. Informix instance management. I cited the # of database I manage as that seemed to be the metric discussed in the author's blog entry. When I supported Informix I recall several customers (single DBA's) who managed multiple instances that supported SAP or BAAN installations - I can't recall which of those two use tons of databases but one of them would create a couple thousand databases per instance and there were usually multiple instances - I am sure the Informix community could go on and on about other examples like this.

MM

RE: Databases per DBA

Noel,

In the Informix world we don't talk about the number of databases we manage, but rather, the number of Informix instances since an Informix instance can contain dozens or thousands of databases and can manage petabytes of data. Currently I work for a consulting firm. We have clients who manage single instances and pairs of replicated instances containing over 100Tb of data with one or two DBAs and an occasional helping hand from us. We have clients that manage hundreds of instances with from two to 6 DBAs.

When I worked for Bloomberg LP, I managed over 46 Informix instances in production containing over 200 databases and many TB of data as well as six development instances with a single assistant for five years.

Your metrics definitely do not work for IBM's Informix Dynamic Server RDBMS, which I notice did not make your impromptu listing. You should note that Informix, contrary to some opinions, is alive and well within IBM and in the Informix user community. One of the main strengths of Informix is its very low cost of ownership due to the ludicrously low maintenance requirements. You and your followers should check it out.

Informix...a viable database

Talking about Informix.... I was responsible for running the world's fastest TPC-B benchmark on Informix RDBMS in 1994. Also, have administered many Informix databases for mission-critical databases for large enterprises in the past, and agree that Informix is an easy to use database. Infact I was recently speaking to a large enterprise that mentioned that they have not shutdown a critical Informix Database for the last four years. I am sure there must many such implementations.

We are just about wrapping up a DBMS survey which will provide some valuable information on DBA productivity and database-to-DBA ratios.

BTW, I recently met Phil White - ex-CEO of Informix back in the old days at a vendor event. It was nice to touch base with him after so many years.

Cheers,
Noel

PHIL WHITE

I ever run into Phil White he will be sorry we met. The clown should be in prison for what he did to Informix employees in the 90's. He took a wondeful company that had the potential to rule the database world and through his criminal management brought Informix to its knees. It has been a long recovery but fortunatly it looks like Informix is BACK and will again be the #1 RDBMS available.

MM

Production vs. non-production...

John,

The number of database-to-DBA ratio I mentioned includes production and non-production. Typically, I see that the number of non-production database managed by a DBA tends to be on an average two to three times that of production. Production databases often require more tuning, optimization, upgrade, backup, migration, and patching that can significantly add to the effort. However, in some organizations I have seen non-production DBAs also take on data architect and developer related activities, which can dramatically impact the ratios.

Also, the ratios I mentioned treat each database seperately. So prod, dev, test, QA, and training would could as five databases.

Cheers,
Noel

How many databases to you need?

Dominic,

You raise a great question on how many databases to you need. I don’t think it’s about DBA competence that drives the creation of distinct databases but it’s the way applications are built. Traditionally, we have built applications, where each application used to have its own database. A hundred applications used to have a hundred production databases, plus 3 to 4 times non-production databases. However, today more applications are being built where a database is being shared by many applications. In addition, creating a data service (data virtualization) layer allows decoupling the application from the database, offering more control over the number of databases.

BTW, I recently spoke to a financial services organization that has more than 46,000 production and non-production databases that struggle to keep their SLAs under control. They already see that they might double the number over the next seven years! They are currently embarking on a major initiative to reduce the number of databases through consolidation and creation of a data services layer.

Cheers,
Noel

Nature of database software

I worked on the third Informix license sold in Australia in the 1970s. It used SQL for queries and was programmed via a library of C functions. Over the years it developed into a large suite of products, but to my mind, the feature that made Informix so easy and a pleasure to use was that its development team always recognised that the Informix company (it changed its name to the product name sometime in the 1980s, I think) was a tools company and kept a measure of distance between the RDBMS and the various utility products which were developed using it. For nine years, myself and one other colleague ran and developed the database servicing the teleconferencing services department of Australia's largest telco. The job was offered to IGM-GSA (IBMs service arm) but they said it could not be done with a team of less than seven people ! I will always maintain that the ease with which we could do our work was due to the thoughtful development style of the Informix team. Once database developers start offering their own end-user products, the way is open to compromise the inherent database architecture to suit the needs of those products.

Informix

We use Informix since 1992 in production. Usual workload per day is changing the backup tapes, once every 5 to 6 years its upgrading the IDS Release together with a scheduled hardware replacement. That is what I call a low TCO.

For Informix the question is: How many databases without a DBA?

I'm working for IBM (I think I should point this because it may affect readers opinion about my comment), since 1998 (Informix at the time). I've been doing pre-sales, technical support and services over these years.
What I see most, is not large sites with teams of DBAs working on Informix. Maybe because I work on a very small country/market. What I really see most of the times are customers without any DBA. Typically it's the system admin or the developer team that does the ocasional work of DBA.

Although I admire this study, I believe this is a really complex issue. I personally help to manage hundreds of databases (tens of intances) and there the larger (>1TB) is completely left alone for months. Others (smaller, and sometimes development instances) give more work. Most of the work is not really DBA work. It's more stuff like helping the development team and clarify doubts...

So, regarding Informix I think we should ask how many, how big, how frequently changed (DDL) the databases are to require a DBA? So, if the question is TCO than a study without Informix will not show the better picture.
In general, I think that mileage will vary greatly with the complexity of the environment, the rate of changes it suffers, and the work/quality of development and pre-production environments.

Regards.

DBA vs database ratio

I beleive the quantity of the data has nothing to do with how many dba's a company should have? In my view it depends on how complex the total environment is and what kind of databases the organization supports? Usually for many companies, there are several systems, typically consisting of OLTP and DSS environments. The primary goal in terms of databases would be to make sure data is available 24*7 wiithout any disruptions. That poses lot of challenges for the dba's in terms of implemeting database change requests, upgrading, applying OS patches, etc.

For mostly OLTP systems, there's always some form of replication going on which would protect the company in case of any eventuality. Replication comes with lot of DBA's work hours and would significantly take up their time. Size plays a small role to support such complex environments.

One more area where dba's spend most of their time is to make sure their OLTP systems are as clean and thin as possible. This would require continous clean up, running jobs, supporting apps that does clean up the environment.

So in short, DBA's wears lot of hats in any company which requires lot of hours and thats why I think size of the data doesn't matter much. Its the complexity that matters more.

RE: DBA vs database ratio

Firoz,

I agree with much of what you have to say. I think my original point in bringing up Informix was to point out the is also has to do with which RDBMS system you are using. Your example about replication is an excellent one to elaborate on.

For Informix DBAs setting up High Availability Replication is a trivial matter of archiving the primary server, restoring the archive to the new replicant server and running one command line command on the secondary and one on the primary. Once that's done, there is literally nothing else that the DBA needs to do to maintain the replication except monitor for problems - which are rare. Setting up shared disk clusters is even simpler. Just set up a modified copy of the primary's configuration file and run command line commands on the shared disk secondaries and the primary.

The point is that just as the complexity of this single feature differs from RDBMS to RDBMS, so do most operations that a DBA has to perform. What may occupy one DBA for a whole day at one installation may take a few minutes at another either, as you point out, because the environment is more complex or because the RDBMS makes the procedure easier/faster or harder/slower.

Art, Thanks for sharing your

Art,
Thanks for sharing your thoughts. First of all I would like to say I am a great fan of yours since I am an informix guy too. Have always liked the product. However, equally disgusted at how IBM markets it. Informix rocks when it comes to OLTP system. With that all said, do you have any comparison of Informix vs any other RDBMS for oltp systems. I have dealth with DB2 UDB also and strongly feel that it is more appropriate for DSS kind of environment.
I would be interested to know more about oracle and its locking strategy.

How many databases/DBA

How many angels can dance on the head of a pin? The premise of this blog is that there is a finite and more or less derivable value for DBA / Database.

The company I work for embeds Informix under their applications, some 3 dozen such applications by last count. I haven't counted the customer base lately, it's something like 15,000 or 30,000, not something I focus on. The point is that none of these customers have a DBA, we don't have that many DBAs (one qualified one I think (not me, I'm a developer), I think 2 of us are certified) - the applications are all self-sufficient and do not need human intervention. I dealt with two customer database issues last year, both were pilot error on the part of the customer. Otherwise we have had no database issues to deal with from MY customer base (I only represent 1/12th of our companies database applications).

If the database is properly set up, there is no need for a DBA at all. I don't think this is specific to Informix, although certainly I have had an easier time embedding Informix over Oracle, SQL Server, DB2, MySQL or SyBase (and a few others). But then I'm new at this, I've only been doing databases for the past 24 years.

The best example

Wal-Mart has over 25,000 instances of Informix running across all their stores and distribution centers. I have dealt with Wal-Mart off and on for nearly 20 years, and I know for a fact that these instances are managed by ~6-8 DBAs in totel. That is about 3,000 to 4,000 instances per person.

# of DBAs needed

Consider:
Platforms supported (i.e. DB2 (z and/or LUW), SQL Server, Sybase, Informix, Oracle). We have them all except Oracle. In house vs off the shelf software. We have a mix. Developer knowledge and experience. Are developers knowledgeable of SQL and optimizing performance? Where are your DBAs in the organization? This is really important for shops that have a lot of in house developed apps. You probably need application DBAs that have knowledge of the apps and how they interact with the databases. Change Management? How often does the environment change. We have daily database changes driven by application changes. Availability requirements and SLAs? This usually drives DBAs to work after hours and on weekends to implement changes during maintenance windows. Oh yeah, and the daily stuff doesn't stop so you get to work days, nights, and weekends. What about vacation and sick time? What if one leaves the company for some reason? Depending on the envoronment, this is a job that can easily burn folks out. You must have at least two DBAs for each platform you support to cover when one is off. Demand management. How are new projects approved and prioritized? Is consideration given to back end database support or are things just thrown over the wall and expected to be implemented? I could go on and on, but I think one thing that gets overlooked is how critical having good DBAs is to busniess. DBAs usually keep databases operating 99.99% and this leads to the assumption that the work being done behind the scenes is trivial. Installing and just letting it run probably works very well in a tightly controlled environment. The more volitilaty you add, the more challenging the DBA's job becomes.

white paper

Hi Noel; We've used numbers close to these for benchmarking. Senior management is asking if there is a white paper including these concepts?
thanks
Al