How can I distribute a unique database already in production?

Let’s assume a successful web Spring application running on a MySQL or PostgreSQL database. The traffic is becoming so high and the amount of data is becoming so big that a distributed database solution needs to be implemented to address scalability issue. Let’s also assume this application is using Hibernate and the data access layer is cleanly separated with DAOs.

Ideally, one should be able to add or remove databases easily. A failback solution is welcome too.

What would be the best strategy to scale this database? Is it possible to minimize sharding code (Shard) in the application?

4

Firstly, I’d look at whether there was write-intensive data that could be moved out of the relational database altogether. In a system I currently work on, we have identified quite a lot of pointless write load from writes to tables which record user activity; we use that data for analytics and user support, so it doesn’t really need to be in the main OLTPish database; it could be in a flat file, a NoSQL store, etc. I suspect that this is true of quite a lot of user-centric data in many applications – user preferences, profile, history, etc. If it’s a structured dataset which is only ever stored and retrieved by key, rather than being searched by value, then it could go in some simpler, cheaper, more write-scalable, store, like Riak or something.

With that done, I’d look at what fraction of the load was reads. If it’s at all significant (even 20%), move that to a read-only slave. MySQL and PostgreSQL both make it fairly straightforward to have one or more read-only slaves following a read-write master, and to use them as hot standbys for the master too.

However, splitting reads like this might be tricky; if you’re doing things properly, then you will wrap every unit of work in a transaction, even if it’s read-only, to guarantee a consistent view of data. Since you can’t (AFAIK!) have a single transaction efficiently use two different databases (XA is not efficient), that means you have to be able to identify read-only transactions when they start, so you can send their queries to the read-only database. That will very likely require logic in the application code that isn’t currently there, and might not be simple to retrofit. In our case, it was relatively simple: we send transactions from interactive user sessions to the read-write master, and transactions from our read-only REST API to the read-only slave. We could probably do a better job, by identifying read-only user requests, and sending those to the slave as well, but we haven’t had to do that yet.

If you’re not doing things properly, and are using no transactions, or implicit or explicit transactions around each query, then this is much simpler. A load-balancer between the app and the database can route the queries.

If segregating reads doesn’t do the job, then it’s time to look at having several machines handling writes. I have never done this, but we are talking about this a lot at my place of work, so I can give you my impressions. There are two ways to split writes: sharding, and multi-master replication of a shared database (I don’t really consider sharding to be multi-master, because any item of data belongs to a single master). Neither of these are easy, which is why this is a step of last resort.

Multi-master is simpler from the application’s point of view, because it doesn’t care which master it talks to, but is less likely to help performance, because any write to one master still has to be applied by the others. It’s only useful if the write traffic involves a lot of read work (eg inserts from selects, or you are using transactions and have units of work which do a lot of reading and a small amount of writing – which is not uncommon), and your multi-mastering operates at the row rather than the statement level.

Sharding is more complex. You will inevitably have to change your application code; you might even have to change it a lot. For example, if you split user profile data across multiple shards, you flat out lose the ability to do queries like “find me all the users with a Z in their name”, because that crosses multiple shards. You have to do either some sort of map-reduce across the shards, or have a separate index for that field off to one side. You should be able to encapsulate this in your DAOs; where you might not is if some kinds of queries become so slow that you just can’t afford to do them any more, and have to find different ways to solve the problems they solve. If your ORM has support for sharding (apparently Hibernate does), then that might help a lot, but it won’t fundamentally change the problem.

Just how much of a problem this is will depend very sensitively on your access patterns. For a classic e-commerce system, for example, users make read-only access to catalogue data, which could therefore go in a single-master, many-slave database, and read-write access to their own order data, with no online access across multiple orders, which should shard very nicely. For a crowdsourced e-commerce system (like, famously, Etsy, or eBay), you have some users (shopkeepers) writing to the catalogue while others are reading from it. That makes sharding harder. For a social network application, where users are sharing content with others in a random meshwork, you will end up needing to do huge amounts of cross-shard work. Sadly, where I work, although our application is a financial system, it has many of the data characteristics of a social network; I’m not sure there is a productive way to shard it.

The good news about sharding is that if you can make it work, then it does offer a good shot at a linear speedup. To be honest, though, I suspect that the workloads that allow linear sharding are also the ones that would fit a NoSQL store well, because of that defining characteristic of not depending on cross-shard queries, so if you’re going to do the work to do sharding, you might as well go the whole hog and put everything in whatever the hot NoSQL store today is.

Plus, I suspect that for an application of any complexity, the cost of the programmer and sysadmin time to introduce sharding may add up to more than the cost of simply buying some bigger iron.

So, yeah, if you can, don’t shard. Stretch your existing model to see if you can get some more performance out of it first.

5

The best strategy might be to use an existing database sharding solution such as dbShards. This will allow you to move to a sharded solution without having to make any significant code changes to your application (dbShards will make the shards look like a single database as far as your application is concerned). While there is a cost to the software it could well be cheaper than building out your own sharding solution.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật