Why does git use hashes instead of revision numbers?

I always wondered why git prefers hashes over revision numbers. Revision numbers are much clearer and easier to refer to (in my opinion): There is a difference between telling someone to take a look at revision 1200 or commit 92ba93e! (Just to give one example).

So, is there any reason for this design?

A single, monotonically increasing revision number only really makes sense for a centralized version control system, where all revisions flow to a single place that can track and assign numbers. Once you get into the DVCS world, where numerous copies of the repository exist and changes are being pulled from and pushed to them in arbitrary workflows, the concept just doesn’t apply. (For example, there’s no one place to assign revision numbers – if I fork your repository and you decide a year later to pull my changes, how could a system ensure that our revision numbers don’t conflict?)

You need hashes in a distributed system. Let’s say you and a colleague are both working on the same repository and you both commit a change locally and then push it. Who gets to be revision number 1200 and who is revision number 1201 given neither party has any knowledge about each other? The only realistic technical solution is to create a hash of the changes using a known method and link things up based on that.

Interestingly HG does support version numbers but they are explicitly a local-only feature — your repository has one set, your co-worker’s repo will have a different set depending on how they pushed and pulled. It does make command line usage a bit more friendly than Git though.

Data integrity.

I respectfully disagree with the current answers. Hashes are not necessary for a DVCS, see the Bazaar way. You could do as well with any other kind of globally unique identifier. The hashes are a measure to guarantee data integrity: They represent a digest of the information contained in the object (commit, trees, …) referred to by the hash. Altering the contents without altering the hash (i.e., a preimage attack or collision attack) is believed to be difficult, although not impossible. (If you’re really into it, take a look at the 2011 paper by Marc Stevens).

Hence, referring to objects by their SHA hash allows to check if the contents have been tampered with. And, given that they’re (almost) guaranteed to be unique, they can be used as revision identifiers, too — conveniently so.

See Chapter 9 of the Git book for more details.

In layman’s words:

Hashes are intended to be nearly universally unique. It is NOT guaranteed but it is extremely unlikely that the same SHA’s are generated for different content. In practical term for a given project you can treat it as unique.
With revision numbers you would have to use a namespace in order to reffer specifically to revision 1200.
Git can work both distributed and/or centralized. So how do you get revision numbers correct and unique ?
Also using revision numbers would create the false spectation that newer revisions should have higher numbers, and that would not be true because of branching, merging, rebasing, etc.
You always have the option to put tags to commits.

In mathematical terms:

A total order over Git’s commits would be required for monotonally increasing version numbers.
Git’s commits form a directed, acyclic graph (DAG) that can only be ordered partially / topologically.

Hash is not the unique solution for distributed VCS. But when deal with a distributed system, only the partial ordering of events can be recorded. (For VCS, the event can be a commit.) That is why maintain a monotonically increasing revision number is impossible. Usually we adopt something like vector clock (or vector timestamp) to record such partial-ordered relation. This is the solution used in Bazaar.

But why Git not uses vector clock but hash? I think the root cause is cherry-pick. When we perform cherry-pick on a repository, the partial ordering of commits is changing. Some commits’ vector clocks must be re-assigned to represent the new partial ordering. However, such reassignment in distributed system would induce inconsistent vector clocks. That is the real problem which hashes deal with.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 21:50

Thẻ: git, version-control

Thiết kế website giá rẻ

Danh mục

Why does git use hashes instead of revision numbers?

Data integrity.