Git内部原理的快速介绍[翻译]

2023年 8月 21日 74.3k 0

A Fast Intro to Git Internals
Git内部原理的快速介绍

原文:sites.google.com/a/chromium.…

Many git tutorials focus on a set of commands and instructions to “get you up to speed” in git, without addressing the underlying concept of “how git works”. While the commands are important, I feel it’s more important for you to understand what’s going on behind the scenes. (Full reference material is included in the appendix at the end of this document. Most diagrams are taken from Pro Git.)

许多 git 教程侧重于一组命令和说明,以“让您加快速度”使用 git,而没有解决“git 如何工作”的基本概念。 虽然命令很重要,但我觉得对您来说了解幕后发生的事情更重要。 (完整的参考资料包含在本文档末尾的附录中。大多数图表取自 Pro Git。)

At a high level, git can be thought of as a database/filesystem/backing store that remembers a truckload of details about your code base. This information is called the git repository, and contains three types of content: blobs, trees and commits.

从较高的层面来看,git 可以被认为是一个数据库/文件系统/后备存储,它可以记住有关代码库的大量详细信息。 这些信息称为 git 存储库,包含三种类型的内容:blob、树和提交。

object-blob.png

Blobs are essentially “files” in git. Each blob is indexed by a SHA1 hash or checksum, so if the same file appears twice in your directory, it will resolve to the same blob in git. And for the record, git assumes that there will never be a hash collision. Relax, because it’s true. Seriously. If you don’t believe that, please read up on it until you do. Blobs are normally referenced by their hash, although you will rarely need to type these hashes in.

Blob 本质上是 git 中的“文件”。 每个 blob 都通过 SHA1 哈希值或校验和进行索引,因此如果同一个文件在目录中出现两次,它将解析为 git 中的同一个 blob。 郑重声明,git 假设永远不会发生哈希冲突。 放松,因为这是真的。 严重地。 如果您不相信,请阅读[on](en.wikipedia.org/wiki/ birthday_paradox) it 直到你这样做。 Blob 通常通过其哈希值来引用,尽管您很少需要输入这些哈希值。

object-tree.png

Git uses trees to store directories. Each tree contains a list of entries, which are either blobs (files) or other trees (sub-directories). Like blobs,trees are stored by their hashes. This is a significant detail.... because a single change to a single file (say in root/sub/myfile.txt) will cause myfile.txt’s hash to change, which will cause sub’s content list to change,which will cause root’s content list to change, which will cause root’s hash to change. Thus, the hash of a tree represents the entire state of every single file in that tree.

Git 使用 树 来存储目录。 每棵树都包含一个条目列表,这些条目要么是 blob(文件),要么是其他树(子目录)。 与 blob 一样,树也是通过其哈希值来存储的。 这是一个重要的细节......因为对单个文件(例如在 root/sub/myfile.txt 中)的单个更改将导致 myfile.txt 的哈希值更改,这将导致 sub 的内容列表更改,这将导致 root 的 内容列表发生变化,这将导致根的哈希值发生变化。 因此,树的哈希表示该树中每个文件的完整状态。

The next object is the commit, which is a snapshot of a tree along with some additional metadata. As discussed above, the tree reference represents the entire state of every single file in the tree. The metadata provides more context for the commit, including the author, comments, and one or more parents of the commit. Like everything else we’ve seen, commits are referenced by their hashes.

下一个对象是提交,它是树的快照以及一些附加元数据。 如上所述,树引用表示树中每个文件的完整状态。 元数据为提交提供更多上下文,包括作者、评论以及提交的一个或多个父级。 与我们看到的其他所有内容一样,提交是通过其哈希值引用的。

object-commit.png

Git maintains a set of refs, which are human readable names that resolve to specific commits. For example, the HEAD ref points to the most recent commit in your currently checked out branch. When you check in another commit, the HEAD ref gets “auto-promoted” to point to your new commit. Most git operations use the HEAD as their default target. Refs are also used to identify your own branches, and the current branch is updated to the latest commit each time you commit, too.

Git 维护一组 refs,它们是解析为特定提交的人类可读名称。 例如,HEAD ref 指向当前签出分支中的最新提交。 当您签入另一个提交时,HEAD 引用会“自动升级”以指向您的新提交。 大多数 git 操作都使用 HEAD 作为默认目标。 Refs 还用于标识您自己的分支,并且每次提交时当前分支也会更新为最新提交。

18333fig0106-tn.png

As you work, git maintains three different “views” of your filesystem. Simultaneously. For some, this is a source of confusion 😉 As you work, a single file (readme.txt) might be in all three locations, and might be different in each location.

当您工作时,git 维护文件系统的三个不同“视图”。 同时地。 对于某些人来说,这是一个混乱的根源;)在您工作时,单个文件 (readme.txt) 可能位于所有三个位置,并且每个位置可能有所不同。

The git repository is the commit (and corresponding tree) that you last checked out. Any changes you make to files on your disk are reflected in the “working directory”. You can promote these local changes into the staging area (using “git add”) as often as you like. Then you commit all of your staged files to the repository.

git 存储库 是您上次签出的提交(和相应的树)。 您对磁盘上的文件所做的任何更改都会反映在“工作目录”中。 您可以根据需要经常将这些本地更改提升到暂存区域(使用“git add”)。 然后将所有暂存文件提交到存储库。

The staging area is also called the index or the cache. The repository is sometimes called the tree or the database. Sigh.

暂存区也称为索引或缓存。 存储库有时称为树或数据库。

When you’re using git, you don’t normally think of blobs or trees. Instead, user-facing commands deal with commits and refs, and most of the work you do in git involves traversing the DAG of commits in your repository, or adding new nodes into that DAG. The DAG always starts with the “empty” tree, so if you want to think of this in terms of pointers, that would be the null pointer.

当您使用 git 时,您通常不会想到 blob 或树。 相反,面向用户的命令处理提交和引用,并且您在 git 中所做的大部分工作涉及遍历存储库中提交的 DAG,或添加 将新节点添加到该 DAG 中。 DAG 始终以“空”树开始,因此如果您想从指针的角度来考虑这一点,那就是空指针。

Branching, Rebasing, Merging

image.png

To the left is a diagram of a typical git scenario, where a dev has created a branch called “mine”, currently pointing at the same commit as “main”. (“main” was called “master” in earlier versions of git, and some of the diagrams in this document use the older term “master”).

左边是一个典型的 git 场景图,其中开发人员创建了一个名为“mine”的分支,当前指向与“main”相同的提交。 (“main”在 git 的早期版本中被称为“master”,本文档中的一些图表使用旧术语“master”)。

image.png

The dev does some work and commits C6, C7, and C8. In the meantime, other users have updated “main” with C3, C4, and C5, and those commits have been pulled down into the local main branch. In order to resolve this situation, the dev can either perform a merge or a rebase.

开发人员做了一些工作并提交了 C6、C7 和 C8。 与此同时,其他用户已使用 C3、C4 和 C5 更新了“main”,并且这些提交已被拉入本地主分支。 为了解决这种情况,开发人员可以执行 merge 或 rebase。

image.png

If the dev uses “git merge main” from the “mine” branch, the result will look like this. Note that “C9” is a commit that contains all of the merged source, which may include “new” code that was introduced to resolve merge conflicts.

如果开发人员使用“mine”分支中的“git merge main”,结果将如下所示。 请注意,“C9”是包含所有合并源的提交,其中可能包括为解决合并冲突而引入的“新”代码。

image.png

The alternate approach is to use a rebase, which creates a new commit for each rebased commit. Thus, C6’ will contain (mostly) the same changes that were in C6, C7’ will match C7, and so on.

For most operations, a rebase is preferred to a merge:It remembers each of your commits.Your commits will always show up as the last in the list (“the cream rises to the top”)Note that the old commits (C6..C8) are no longer referenced by any refs, so they are now available for “garbage collection”.

对于大多数操作,变基优于合并:它会记住您的每个提交。您的提交将始终显示为列表中的最后一个(“奶油升到顶部”)请注意,旧的提交(C6.. C8) 不再被任何引用引用,因此它们现在可用于“垃圾收集”。

The notable exception is that you should not do a rebase if other repositories have seen your commits. This might happen if others were basing their repositories on yours, or if you had pushed your own commits “upstream”.

值得注意的例外是,如果其他存储库已经看到您的提交,您不应该执行变基。 如果其他人将他们的存储库基于您的存储库,或者您将自己的提交推向“上游”,则可能会发生这种情况。

Final Notes...

Here are some extra commands to get you in trouble help you explore git in all its glory...

  • “git rev-list --objects --all” will display all of the objects in your repository.

  • “git show ” will let you see one of those objects in detail.

  • “git fsck --unreachable ” will show you all the “orphans” that are waiting for garbage collection.

  • “git reflog” will show you a list of everything you’ve done. Ever. It’s a cool tool that can help you “undo” your recent activity, and find that code you thought you had lost.

    这里有一些额外的命令可能会给你带来麻烦,帮助你探索 git 的所有荣耀......

-   “git rev-list --objects --all”将显示存储库中的所有对象。
-   “git show ”将让您详细查看其中一个对象。
-   “git fsck --unreachable ”将向您显示所有正在等待垃圾收集的“孤儿”。
-   “git reflog”将向您显示您所做的所有事情的列表。 曾经。 这是一个很酷的工具,可以帮助您“撤消”最近的活动,并找到您认为丢失的代码。

External Git documentation

  • This thread contains a good overview of the “fourth” object type in git: the ref. (Note that refs are intentionally “glossed over” in this discussion.)
  • A tour of git: the basics: General, easy to read tutorial for getting started with git.

This one is good for basic "commands to get up and running", but none of that content is in the doc we're writing, so it's good non-overlapping information.

  • git ready: General introduction and "cookbook" reference for git.

gitready.com/beginner/20… comes pretty close to the content I want but really doesn't get deep enough into it...

  • Git for Computer Scientists: Great explanation of the architecture on which Git is based.

This. I believe this doc is the one I used to finally understand the key concepts in git.

  • Git Magic: Yet another Git tutorial.I like this one but need to spend more time reading it.

Pro Git: Freely available book on Git.

This was looking really good in the 'what is a branch' section but then it goes on to encourage the user to merge without explaining why rebase is better. If the user bailed early on this doc they'd have some of the foundation they need but then do the wrong thing (merge) repeatedly.

  • Git Reference: Quick reference that links to the Pro Git book.

    • Very command-line oriented (fewer pictures of the tree, more command-line examples)
  • Visual Git Reference: A visual Git Reference, explaining quite a few commands visually.
  • 这个线程包含对“第四个 ” git 中的对象类型:ref. (请注意,在本次讨论中,参考文献被故意“掩盖”。)
  • Git 之旅:基础知识:通用、易于阅读的 git 入门教程。

这对于基本的“启动和运行命令”很有用,但我们正在编写的文档中没有任何内容,因此它是很好的非重叠信息。

  • git Ready: git 的一般介绍和“cookbook”参考。

gitready.com/beginner/20… 非常接近我想要的内容,但确实不够深入......

  • Git for Computer Scientifics:对 Git 所基于的架构进行了很好的解释。

这。 我相信这篇文档是我最终理解 git 中关键概念的文档。

  • Git Magic:又一个 Git 教程。我喜欢这个,但需要花更多时间阅读它。

    Pro Git:关于 Git 的免费书籍。

这在“什么是分支”部分看起来非常好,但随后它继续鼓励用户合并,而没有解释为什么 rebase 更好。 如果用户尽早放弃此文档,他们将拥有一些所需的基础,但随后会反复做错误的事情(合并)。

  • Git Reference:链接到 Pro Git 书籍的快速参考。

    • 非常面向命令行(树的图片更少,命令行示例更多)
  • Visual Git Reference:可视化 Git 参考,直观地解释了相当多的命令。

相关文章

JavaScript2024新功能:Object.groupBy、正则表达式v标志
PHP trim 函数对多字节字符的使用和限制
新函数 json_validate() 、randomizer 类扩展…20 个PHP 8.3 新特性全面解析
使用HTMX为WordPress增效:如何在不使用复杂框架的情况下增强平台功能
为React 19做准备:WordPress 6.6用户指南
如何删除WordPress中的所有评论

发布评论