Why is it so bad to read data from a database “owned” by a different microservice

I have recently read this excellent article on the microservice architecture: http://www.infoq.com/articles/microservices-intro

It states that when you load a web page on Amazon, then 100+ microservices cooperate to serve that page.

That article describes that all communication between microservices can only go through an API. My question is why it is so bad to say that all database writes can only go through an API, but you are free to read directly from the databases of the various micro services. One could for example say that only a few database views are accessible outside the micro service so that the team maintaining the micro service know that as long as they keep these views intact then they can change the database structure of their micro service as much as they want.

Am I missing something here? Is there some other reason why data should only be read via an API?

Needless to say, my company is significantly smaller than Amazon (and always will be) and the maximum number of users we can ever have is about 5 million.

Databases are not very good at information hiding, which is quite plausible, because their job is to actually expose information. But this makes them a lousy tool when it comes to encapsulation. Why do you want encapsulation?

Scenario: you tie a couple of components to an RDBMS directly, and you see one particular component becoming a performance bottle-neck for which you might want to denormalize the database, but you can’t because all other components would be affected. You may even realize that you’d be better off with a document store or a graph database than with an RDBMS. If the data is encapsulated by a small API, you have a realistic chance to reimplement said API any way you need. You can transparently insert cache layers and what not.

Fumbling with the storage layer directly from the application layer is the diametrical opposite of what the dependency inversion principle suggests to do.

What is more important and significant about a microservice: its API or its database schema? The API, because that is its contract with the rest of the world. The database schema is simply a convenient way of storing the data managed by the service, hopefully organised in a way that optimises the microservice´s performance. The development team should be free to reorganise that schema – or switch to an entirely different datastore solution – at any time. The rest of the world should not care. The rest of the world cares when the API changes, because the API is the contract.

Now, if you go peeking into their database

You add an unwanted dependency on their schema. They cannot change it without having an impact on your service.
You add unwanted and unpredictable load to their internals.
The performance of your own service will be affected by the performance of their database (they will be trying to optimise their service to perform well for clients and their database to perform well only for their service)
You are tying your implementation to a schema which may well not accurately and distinctively represent the resources in their data store – it may have extra details which are only needed to track internal state or satisfy their particular implementation (which you should not care about).
You may unwittingly destroy or corrupt the state of their service (and they will not know you are doing this)
You may update/delete/remove resources from their database without them knowing this has happened.

The last two points may not happen if you are only granted read access, but the other points are more than a good enough reason. Shared databases are a bad thing.

It is common for less experienced developers (or those who do not learn) to see the database as more important than the service, to see the database as the real thing and the service just a way of getting to it. That is the wrong way round.

Microservice Architecture is hard to describe but the best way to think about it is a marriage between Component Oriented Architecture and Service Oriented Architecture. Software as a suite is composed of many small business components with a very specific business domain responsibility. Their interface to the outside world either in provided services or required services is through an API of clearly defined services.

Writing to and even reading from a database that is outside of your components business domain is against this style of architecture.

The primary reason for this is that an API provided through a service by another software component has the reasonable expectation that the API will most likely be backwards compatible as new releases of the service providing component become available. If I am the developer of a “providing” component then I only have to worry about backwards compatibility to my API. If I know that there are three other development teams that wrote custom queries against my database directly then my job has become much more complicated.

Even worse, maybe that other team that wrote these is mid sprint in a critical project and they can’t accept this change now from your component. Now software development for your component on a business domain that you own is being driven by development on another business domain.

Full interaction through services reduce coupling between various software components so situations like this do not occur so frequently. When it comes to other components using a View in the database, then you have more capability to make the View backwards compatible if anybody else wrote queries against it. I still feel however that this should be the exception case and only should be done for perhaps reporting or batch processing where an application will need to read in enormous amounts of data.

Clearly this works well in large distributed teams where development teams are separated out by business domain like Amazon. If you are a small development shop you can still benefit by this model, especially if you need to ramp up for a big project quickly, but also if you have to deal with vendor software.

Over the last 20 years I’ve seen a few large modular database designs and I’ve seen the scenario suggested by David quite a few times now where applications have write access to their own schema/set of tables and read access to another schema/set of tables. Most often this data that an application/module gets read-only access to could be described as “master data”.

In that time I have not seen the problems that prior answers are suggesting I should have seen so I think it is worth having a closer look at the points raised in the previous answers in more detail.

Scenario: you tie a couple of components to an RDBMS directly, and you see one particular component becoming a performance bottle-neck

I agree with this comment except this is also an argument for have a copy of the data locally for the microservice to read. That is, most mature databases support replication and so without any developer effort the “master data” can be physically replicated to the microservice database if that is desired or needed.

Some might recognise this in older guise as an “Enterprise database” replicating core tables to a “Departmental database”. A point here is that generally it is good if a database does this for us with built in replication of changed data (deltas only, in binary form and at minimal cost to the source database).

Conversely, when our database choices do not allow this ‘off the shelf’ replication support then we can get into a situation where we want to push “master data” out to the microservice databases and this can result in a significant amount of developer effort and also be a substantially less efficient mechanism.

might want to denormalize the database, but you can’t because all other components would be affected

To me this statement is just not correct. Denormalisation is an “additive” change and not a “breaking change” and no application should break due to denormalisation.

The only way this break an application is where application code uses something like “select * …” and does not handle an extra column. To me that would be a bug in the application?

How can denormalisation break an application? Sounds like FUD to me.

Schema dependency:

Yes, the application now has a dependency on the database schema and the implication is that this ought to be a major problem. While adding any extra dependency is obviously not ideal my experiance is that a dependency on the database schema has not been a problem
so why might that be the case? Have I just been lucky?

Master data

The schema that we typically might want a microservice to have read-only access to is most commonly what I’d describe as “master data” for the enterprise. It has the core data that is essential to the enterprise.

Historically this means the schema we add the dependency on is
both mature and stable (somewhat fundamental to the enterprise and unchanging).

Normalisation

If 3 database designers go and design a normalised db schema they’ll end up at the same design. Ok, there might be some 4NF/5NF variation but not much. What’s more there are a series of questions that the designer can ask to validate the model so the designer can be confident that they got to 4NF (Am I too optimistic? Are people struggling getting to 4NF?).

update: By 4NF here I mean all tables in the schema got to their highest normal form up to 4NF (all tables got normalised appropriately up to 4NF).

I believe the normalisation design process is why database designers are generally comfortable with the idea of depending on a normalised database schema.

The process of normalisation gets the DB design to a known “correct” design and the variations from there ought to be denormalisation for performance.

There can be variations based on DB types supported (JSON, ARRAY,
Geo type support etc)
Some might argue for variation based on 4NF/5NF
We exclude physical variation (because that doesn’t matter)
We restrict this to OLTP design and not DW design because
those are the schemas we want to grant read-only access to

If 3 programmers where given a design to implement (as code) the expectation would be for 3 different implementations (potentially very different).

To me there is potentially a question of “faith in normalisation”.

Breaking schema changes?

Denormalisation, adding columns, alter columns for bigger storage, extending the design with new tables etc are all non-breaking changes and DB designers who got to 4th normal form will be confident of that.

Breaking changes are obviously possible by dropping columns/tables or making a breaking type change. Possible yes, but in practical terms I’ve not experienced any problems here at all. Perhaps because it is understood what breaking changes are and these have been well managed?

I’d be interested to hear cases of breaking schema changes in the context of shared read-only schema’s.

What is more important and significant about a microservice: its API or its database schema? The API, because that is its contract with the rest of the world.

While I agree with this statement I think there is an important caveat that we might hear from an Enterprise Architect which is “Data lives forever”. That is, while the API might be the most important thing the data is also rather important to the enterprise as a whole and it will be important for a very long time.

For example, once there is a requirement to populate the Data Warehouse for Business intelligence then the schema and CDC support become important from the business reporting perspective irrespective of the API.

Issues with API’s?

Now if API’s were perfect and easy all the points are moot as we’d always choose an API rather than have local read-only access. So the motivation for even considering local read-only access is that there might be some problems using API’s that local access avoids.

<code>What motivates people to desire local read-only access?

</code>

<code>What motivates people to desire local read-only access? </code>

What motivates people to desire local read-only access?

API optimisation:

LinkedIn have an interesting presentation (from 2009) on the issue of optimising their API and why it is important to them at their scale. http://www.slideshare.net/linkedin/building-consistent-restful-apis-in-a-highperformance-environment

In short, once an API has to support many different use cases it can easily get into the situation where it supports one use case optimally and the rest rather poorly from a network perspective and database perspective.

If the API does not have the same sophistication as LinkedIn then you can easily get the scenarios where:

The API fetches much more data than you need (wasteful)
Chatty API’s where you have to call the API many times

Yes, we can add caching to API’s of course but ultimately the API call is a remote call and there are a series of optimisations available to developers when the data is local.

I suspect there is a set of people out there who might add it up as:

Low cost replication of master data to microservice database (at no development cost and technically efficient)
Faith in Normalisation and the resilience of applications to schema changes
Ability to easily optimise every use case and potentially avoid chatty/wasteful/inefficient remote API calls
Plus some other benefits in terms of constraints and coherent design

This answer has got way too long.
Apologies!!

It’s not.

Reading data from a database “owned” by a different service is not bad. The microservices mantra says it’s bad, in fact, it seems to have a bit of a cult following.

why it is so bad to say that all database writes can only go through an API, [when] you are free to read directly from the databases of the various micro services.

It isn’t bad.

One could for example say that only a few database views are accessible outside the micro service so that the team maintaining the micro service know that as long as they keep these views intact then they can change the database structure of their micro service as much as they want.

That is perfectly valid. A VIEW is probably a better contract than an API; the RDBMS can enforce the security, instead of custom written code that might introduce a security vulnerability, and that only another programmer can analyse. A DBA can look at a VIEW and quickly assess whether the authorisations are configured correctly.

Am I missing something here? Is there some other reason why data should only be read via an API?

Nothing missing, just an industry that is overzealous for new stuff. There is no good reason for reading via an API gateway, that’s just a trend.

Needless to say, my company is significantly smaller than Amazon (and always will be) and the maximum number of users we can ever have is about 5 million.

Many microservices projects fail because they are not Amazon, but they try to emulate their overengineering.

I have done a lot of research about Microservices, and while it’s an incremental improvement on Service Oriented Architecture, it’s not perfect.

From the leading answer [back2dos]:

Databases are not very good at information hiding, which is quite
plausible, because their job is to actually expose information.

There is a simple and mature mechanism in relational databases for this – the VIEW. The View is a stable contract about what data will be returned. The tables beneath the VIEW may change. The tables can be inaccessible, while the VIEW is accessible – and can be accomplished by-role. Perhaps VIEWs are not in fashion, but the capability is certainly there.

You may even realize that you’d be better off with a document store or
a graph database than with an RDBMS. If the data is encapsulated by a
small API, you have a realistic chance to reimplement said API any way
you need.

I remember thinking that an ORM was great because if you needed to change the database, you could do that easily. That practically never happens. You end up spending extra time on what-ifs that never happen. This is one of those what-ifs.

If you really do need to have a special-purpose data solution, it’s highly likely that the database itself isn’t the only change required, other components will also need to be changed anyway. No big deal. When you have a performance problem, do the work then. You’ll only need to change that 1% of the system, and the rest of the 99% can succeed quite well in the relational database.

You can transparently insert cache layers and what not.

Any API can implement cache layers. But it’s better to just optimise your database in the first place. Whatever RAM you would allocate to REDIS, give that to Postgres. You can probably hold your whole DB in RAM.

From the second-top answer from [itsbruce]:

What is more important and significant about a microservice: its API or its database schema? The API, because that is its contract with the rest of the world.

An API has a View-Model. What’s the difference between a View-Model and a Data-Model? Extra unnecessary steps. A database can define a VIEW that can be consumed as a View-Model.

The database schema is simply a convenient way of storing the data managed by the service, hopefully organised in a way that optimises the microservice´s performance.

That’s probably the opposite of reality. Data is what the system is all about, not the microservice. The microservices are either mutating data, or gateways to data (read/write) or gateways to trigger mutations of data.

The development team should be free to reorganise that schema

Reorganising the schema can happen freely behind a VIEW facade/contract.

or switch to an entirely different datastore solution – at any time.

And give up a fundamental ability to easily JOIN data – at all times.

The rest of the world should not care. The rest of the world cares when the API changes, because the API is the contract.

If you have to access data via an API, then perhaps, but why are we using custom-coded web services as gateways to databases again?

State management (potentially a database) can be deployed in the Microservice’s container and exposed via an API. The a Microservice’s database is not visible to other systems outside the container – only the API. Alternatively you could have another service (e.g. a cache) manage state via an API. Having all the Microservice’s dependencies (other than API calls to other services) within a single deployable container is a key distinction in the architecture. If one does not get that go back and study the architecture.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 21:09

Thẻ: architecture, microservices, modularization, soa, web-services

Thiết kế website giá rẻ

Danh mục