How much better is using quad trees than simple relational database for storing location data?

I want to store and find back driver location data in the standard design uber/lyft problem.

I was researching about possible system design. Several videos and tutorials usually describe storing location data in sql as not efficient and recommend to use quad trees as a better alternative.

What about using an sql design with 2 indexes for both latitude and longitude coordinates and use a square that defines the range of latitude/longitude to get the data ?

I understand quad trees can be better but why, and how much better?

4

For querying spatial data, even when you have two separate indexes on latitude and longitude, databases will only be able to utilize them one after another, but not in a truely combined fashion. After applying the first index on, lets say, the latitude, the remaining set of data points has to be processed in full somehow. The index on the longitude works only on the full data set and cannot be used to reduce the subset of the data returned from the first query part efficiently without reindexing. Obviously, trying things the other way round by starting with the longitude won’t change this.

For smaller data sets and certain query requirements reducing the data in one dimension might be fast enough. But if you are going to query all points in a 10m x 10m square, within in a total 10km x 10km raster, and you assume a median point density of ~1 point per square meter then

  • there are 100 million points in that raster

  • there are still 100,000 points which have to be processed one-by-ony after reducing the points to a stripe of 10m by utilizing the first index.

But what about index intersection?

Index intersection is a feature for making use of two indexes for queries. It is explained here for PostgresQL. For a query like WHERE x BETWEEN (x1,x1+10) AND y BETWEEN (y1,y1+10) this works as follows: it first applies the index on x and the index on y individually, yielding the ~100.000 points from each 10m stripe in form of two memory bitmaps. Afterwards it intersects the two bitmaps. This might be fast enough in reality for many applications, but obviously cannot reduce the order of growth of the running time.

A quad tree, however, is the two-dimensional equivalent of an otherwise single dimensional index. It will allow you to find the ~100 points with a running time which is proportional to the logarithm of the total number of points. If each of the tree levels reduces the number of points by, lets say a factor of 4, this means this will require a number of roughly log4(100 millions) steps through the tree, which means ~14 steps.

8

Many databases has support for spatial indices. I think most of these uses R-Trees for indexing, as R trees should handle uneven point distributions better than quadtrees.

Several videos and tutorials usually describe storing location data in sql as not efficient and recommend to use quad trees as a better alternative.

My guess is that a database should be sufficient for your use case. You may also consider specialized databases or plugins that are specifically designed for geographical data, see PostGis as an example.

But for some use cases you may need to provide a specialized implementation that is optimized for your specific use case. You would probably not build a raytracer around a sql database as an example.

What about using an sql design with 2 indexes for both latitude and longitude coordinates and use a square that defines the range of latitude/longitude to get the data ?

Most database indices is essentially an ordering of rows. So it can only use a single index for lookup. Creating a composite index will also not really help much since there will probably be only a few points with the same exact latitude or longitude, so it would only eliminate points in on dimension in an efficient way.

I understand quad trees can be better but why, and how much better?

Quad trees, (or kd-trees, R-trees etc) allow you you quickly eliminate far away points in both dimensions. The improvement compared to a 1D index should be O(log n) compared to O(n log n)

You may consider the simplest spatial index, a simple grid. Take your lat/long coordinate and truncate it to the nearest kilometer in each dimension. You can then combine this into a single number by just concatenating the bits and use it as an index. To find all points within one km you would just fetch the cell containing your search point, as well as all neighboring cells, and apply a second, more exact filtering step.

The downside with this simple method is if you want to find points at some other radius than 1km, or if you want to find the closest point, regardless of how far away it is. This is where spatial trees have their advantage. A sparse quadtree can subdivide cells until each contains about the same amount of points, removing the need to specify a cell size.

There are SQL DMBS with a spatial index support. Those may use quad trees or other structures and will most likely outperform a custom solution.

  • MySQL
  • Oracle DB
  • MS SQL
  • PostGIS

The database is not necessarily a bad idea

Searching for items in a square in an SQL database, requires to search for longitude and latitude each in a range.

If this query is executed sequentially, first looking for points in the latitude range using the first index, then filtering them on the longitude range (or vice-versa) could require to go through a large number of items. If you have however only a few thousands of vehicles, it would not be an issue at all.

If you would have a very large number of points, the sequential approach would be very inefficient. Fortunately, modern DBMS implement advanced optimisations techniques. One of it is called “index intersection“: it uses the fact that two columns are used each having an index. It then looks at the intersection of the row-ids relevant for the two columns.

I am not (or no longer since a while) a DBMS optimisation specialist. My takeaway was that the DB optimising engine is most of the time better than what we think. I do not know however if the popular DBMS that offer this technique are able to always spot its best use. I’ve read for example that older versions of the engine would rarely use it.

So a very pragmatic approach would be to start with your idea, and analyse the performance with a representative data-load (remember, that if there re only a few demo data rows, the dbms engine may use another execution plan based on the known volumes). If you find a bottleneck, you may then consider to refactor your already working but slow software, to use positional data types that leading DBs support.

I was wondering if the fact that DB vendors propose positional data types could be linked to inefficiencies in the query on attitude and longitude. But this is not necessarily a given. Because computing distance with GPS coordinates is not just 2D computations, as latitude and longitude are on a sphere and the square distorts the reality (again, it depends on the scale- within a single city, the distortion should be minimal). Moreover GPS has also an altitude, which may or may not matter in your case.

Quad-tree alternative

Implementing your own quad-tree solution rather than using positional types of a DBMS could be fun. But it may take time.

The trick is to have the leaves positioned compared to their parents. It’s as if the parent would be in a center of the square, and the leaves each correspond to a subdivision of the parent’s square in 4. The more points, the smaller the subdivision of the squares they are in.

Finding the potential nearby candidates consist then of a tree traversal starting at the point and looking for the neighbours in the same parent, then in other adjacent boxes, using the knowledge on the relative positioning (top-left, top-right, bottom-left, bottom-right) of the nodes. It is faster to search in the quad tree, because the subdivision of the space in smaller squares depends on the topology of points: the more points in one region, the smaller the boxes that subdivide it. This allows to discard a lot of false positives. On the other side, you’ll need to update the tree as the position of the points moves, this is also challenging and could also require some workload.

See also this article for additional explanations, and there are a number of videos on youtube that explains it quite well and visually, for example this one on building the tree, and this one on range search.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật