Are NoSQL databases the best choice for more efficiently querying large amounts of data?

We are planning to build a travel website in which we will be integrating multiple APIs (eg. DOTW, GTA, Expedia) for Hotels. I have initially tried to use MySQL but since there are huge amounts of data in hotels and it may contain numerous “one to many” relationships with Images, Amenities and Rooms, the search becomes very slow when we have data for around 200000 Hotels. Even fetching all details for just one hotel may results in a JOIN query from at least four tables, and scanning over all hotels records. So we are planning to migrate our product schema to any NoSQL database to make our search as fast as possible.

Also sometimes we need to run certain schedulers on our database for eliminating duplicates from our database and also updating the newly added hotels which are sent by our providers.

Our tech stack is basically on Java, J2EE along with Springs and Hibernate.

I have read about about MongoDB, Cassandra, Redis and ElasticSearch but I am now confused if simply using these tools can optimize the website search performance. If so then what features differ between these tools that could help me make a determination?

I think that your search results can greatly improve through a number of techniques or database design approaches that will improve performance in your typical RDBMS. I suggest looking into and possibly prototyping the following improvements to see if they help you in performance testing first before you commit to an entirely new database technology that will require a great deal of new learning and experience to master.

Essentially you want to avoid the mindset of a “Magic Bullet”. There is a misconception that NoSQL can somehow magically solve all of our problems and performance issues with the RDBMS and that might be true sometimes but you really should try improving your database design first.

Identify your Non-Functional Requirements

Specifically identify your acceptable non-functional requirements for performance. Determine what the maximum average query wait time and use that as your goal. If you can tweak your database design to achieve this then you do not need to rearchitect your software to a NonSQL solution.

Avoid binary columns

It sounds like with the Image table that you have binary type media being stored within your database tables. While this is highly dependent on your chosen database implements binary columns, it is generally accepted that binary columns can hurt the performance of your queries. Binary columns typically invalidate the benefits that an index on a table column can provide. If you don’t believe me, join to the Image table and run an explain plan and notice how the index is probably not being used.

Use a Content Delivery Network

Instead of storing images and media in database records, store a URL that an application can use to retrieve that image, maybe in a browser. That URL can point to a unique image that is being stored and managed in a Content Delivery Network. There are a number of cloud services that can provide this or you can build your own with a number of tools. This should make all aspects of your application much more efficient.

Evaluate your table indexes

Make sure that if you are not using indexes that you build them for columns that you typically filter on or join against. To be honest, 4 tables is not a great deal of joins for a table so if you follow these guidelines you should see at least a modest improvement of performance.

If you follow these guidelines and you still cannot achieve your performance requirements then perhaps you can evaluate various NoSQL solutions and look for features that might help you.

Even fetching all details for just one hotel may results in a JOIN
query from at least four tables, and scanning over all hotels records.

A four-join query is absolutely trivial if you have the appropriate indexes for all joins.

The second part of this question is far more troubling. Why the scan over all records? Is is because of missing indexes? or do you need some aggregate data? maybe compare to an average, give a ranking index, something like that? If so, switching to NoSQL won’t help; what you need is to precalculate those aggregates so you can derive the data for each hotel quickly.

NoSQL is generally not very good with relational data. NoSQL is often great for non-relational but structured data like documents or time series.

Your “one to many” relationships may look quite like a document: e.g a “hotel” document may carry all its images, room info, etc stored together and fetched with one operation.

On the other hand, if you see a need for an SQL join, do not hesitate and use an SQL database. These are damn efficient with joins, and are quite good at pulling large amounts of data in one query.

WRT “searches become very slow”, it’s hard to tell what happened without seeing your DB structure first. Usually adding a relevant (or dropping an irrelevant) index can speed things up. Reworking the schema to make the few most important queries efficient is also known to work.

I don’t think that NoSQL will help you at search too much — at least, not before I could see the types of searches you’re going to run.

Why either/or?

I’ve worked very successfully with a hybrid approach, using a relational db (SQL Server, but pick your favourite) to hold data that needs a relational structure – most of this is IDs linking all the various domain objects, very little textual data and certainly no blobs – and a nosql db (Dynamo) to hold large relatively unstructured data, typically JSON documents picked up from third party sources. Obviously coding is more complex but it enables you to get the best of both worlds.

Of course it may be that a pure approach is ultimately best for you, but the hybrid may also help out in step-by-step refactoring.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 20:08

Thẻ: cassandra, database-design, elasticsearch, mongodb, redis

Thiết kế website giá rẻ

Danh mục