I found an interesting article on high scalability web site where it talks about EBay scalability and especially a specific passage struck me:
“(Ebay strategies)…Move work out of the database into the
applications because the database is the bottleneck. Ebay does this in
the extreme. We see it in other architecture using caching and the
file system, but eBay even does a lot of traditional database
operations in applications (like joins).”
and the above is no mistake because again in the same article:
“Move cpu-intensive work moved out of the database layer to
applications applications layer: referential integrity, joins, sorting
done in the application layer! Reasoning: app servers are cheap,
databases are the bottleneck.”
Any explanation on this? If the above were true then I should only use the database for retrieving data and do all other operations in the programming logic.
I was always told the opposite: “databases are optimized for operations on data and complex selects so use them”.
Any insight?
0
“Measure. Don’t Guess”.
We can’t assume that Ebay’s bottlenecks are the same as our own. For the specific applications that I work on, when a bottleneck exists it is rarely the database (or if it is, it’s because of poorly optimized queries). I know this because we’ve reviewed the instances of poor-performance and profiled the performance of normal performance of our application.
1
Yes, relational databases are bottlenecks when it comes to horizontal scalability; in order to scale your RDBMS server in most cases you need a more powerful machine and this leads to a limit. This is one of the main reasons why NoSQL databases have appeared as an alternative to RDBMS, by trading off ACID transactions. The bottleneck is even bigger if you put application logic in the database, in the form of stored procedures.
By shifting the processing power in the application tier, your whole application can scale out more easily, since a well written application server can be deployed on multiple machine servers. An alternative to this is using the above mentioned NoSQL data stores for achieving scalability; NoSQL data stores can also be easily deployed onto multiple nodes.
Note, that this only makes sens in a high-scalability talk, as you have pointed out in the question. In most cases, when used properly, databases represent no real bottleneck and do their job very well as they have done it for decades.
3
Memory is fast and non-durable, while drives are slow and durable. If you pull data out of a database and pop it into memory, its access will always be significantly quicker the downside is that if the server fails, you’ll lose the data.
Databases are a great fit for most applications out there. Only a small percentage of applications have the scale of google, twitter etc, and while its always fascinating on how those companies have overcome their challenges, it would be naive to try implement their solutions from the word go.
3