Comments
-
Greg Hurrell
I think I probably want to go a fair bit further than just listing commits. As I make more and more of the code open source I'll want to move towards having a full repository browser integrated into the site. Don't want to reinvent the wheel, but it would probably be nice to have at least a skeletal browser in place; in order of interest are probably:
- commit messages
- commit diffs
- full blobs
- trees
- branches
-
Greg Hurrell
Summary changed:
- From: Add "commit" model
- To: Add repository browsing features
-
Greg Hurrell
I've been thinking about this and adding the "git log" of old doesn't make much sense at all. It really has to be a full-blown repo browser.
At the moment, the Git repos happen to be on the same machine as the webserver, but that is going to change with the move to AWS (see ticket #1440).
Just about every repository browser out there needs to run on the same machine as the repo (think GitWeb itself, for example), so when we're on AWS I think I'll be adding a post-receive hook that will mirror the repos to the same machine as the webserver (basically like the existing hooks which push backups of public repos to GitHub and Gitorious).
The question then is: how much to do in "real time" by shelling out the
git
command line tool and how much to store in the database?At one extreme you shell out for everything, and I don't think I'd do that, even with caching (because it would end up being a parallel caching system separate from the caching for the rest of the site).
At the other extreme you store absolutely everything in the database, but I don't think I'd do that either because Git itself is a very efficient "database" of commits and blobs and such and replicating it in a relational database like MySQL would be a horrible, inefficient duplication.
So what I thinking is that we could cache some stuff in the database at boot time (when the app boots), or when the admin forces a refresh, stuff like scanning the disk to see which repositories are present and what their "metadata" is (things like name, description, clone URL etc).
For actually generating things like logs you would probably shell out to
git
.Once it's actually up and running we can look at performance and consider caching specific things in the database, like commit messages for specific commits and diffs and such. But really don't know what is going to be useful there until I've tried it.
Will also have to look at what GitWeb does — that has good performance and as far as I know doesn't do much, if any, caching in the vanilla setup — as it is a good yardstick for what sort of performance you can expect, at least out of Perl, when shelling out to
git
.In terms of the models and URL design, the most interesting models/URLs will be:
-
repos:
-
/repos/
: index of public repos, and for admin users private repos as well -
/repos/example.git
: overview page for specific repo, showing:- "metadata" for that repo: name, description etc
- a (short) log of recent changes
- branches (perhaps, or a link to a page showing them)
- tags (perhaps, or a link to a page showing them)
-
-
branches: always nested within the context of parent repo
-
/repos/example.git/master
or/repos/example.git/maint
-
possible alternative to avoid namespace clashes:
/repos/example.git/branches/master
- show list of commits on that branch (most likely short log view)
-
-
commits: again, nested within context of parent view
/repos/example.git/commits/{hash}
- would show full log message and (configurable) diff as well
- for example, it might be interesting to allow users to see log messages of closed-source repos, but not show the actual diffs
-
tags: again, nested:
/repos/example.git/tags/{tag}
- would show tag annotation, along with the same stuff shown by the "commit" view
For me that's the fundamentally interesting stuff. Viewing trees and blobs would be some icing on the cake:
-
trees:
/repos/example.git/trees/{hash}
-
blobs:
/repos/example.git/blobs/{hash}
With all of these things (commits, trees, blobs) there would need to be as a basic security measure the restriction that the object in question should be reachable from one of the existing branch tips.
-
repos:
-
Greg Hurrell
Obviously, if I want to make commits commentable, they will need to be cached in the database. (Probably lazily; if a particular comment is never viewed there is no need for it to be in the database.)
-
Greg Hurrell
Comparison; GitHub URLs:
- http://github.com/wincent/wikitext/commit/9f3c2e891a7321e6bf08d6c83c626aae3f3b2585 (show full log message, diff and comments form)
- http://github.com/wincent/wikitext/tree/master/bin/ (tree referenced by HEAD of master)
- http://github.com/wincent/wikitext/tree/b61035fd6fd10691fa8ce2b52f2c9ee4b4225ed0/bin (tree pointed to by other commit)
- http://github.com/wincent/wikitext/commits/ (short log)
- http://github.com/wincent/wikitext/commits/1.10 (viewing a tag, shows a short log corresponding to that tag)
- http://github.com/wincent/wikitext/commits/maint (viewing a branch, shows a short log corresponding to that branch)
- http://github.com/wincent/wikitext/blob/9f3c2e891a7321e6bf08d6c83c626aae3f3b2585/LICENSE.txt (showing a blob)
- http://github.com/wincent/wikitext/blob/master/LICENSE.txt (showing a blob at HEAD of master branch)
-
Greg Hurrell
Posted a blog post about this just now.
As mentioned in the post, we want this to replace not only the old "Git Log" functionality but also the "Weekly progress reports" that I used to put up on the blog. So that means Atom feeds of commits. Probably:
/repos.atom
: all commits in all repos-
/repos/example.atom
: all commits in a specific repo (could just be the commits reachable from HEAD, but would probably be more interesting if it were all commits reachable from all branches) -
/repos/example/master.atom
: all commits in a specific branch of a repo
Two things to note:
Firstly, URLs will look nicer if I exclude the
.git
from the repo component (ie. like GitHub and unlike GitWeb).Secondly, this kind of complicated feed, especially the "all repos" feed may require some sophisticated caching. Will probably have to fire off our cache sweepers whenever from the repo post-receive hooks. Actually merging the commits from all repos into a single feed may prove to be quite complicated; luckily the atom feed doesn't need to extend very far back into the history of each repository.
-
Greg Hurrell
In GitWeb access control is all file-system based. That is, GitWeb is can be configured with
GITWEB_LIST
andGITWEB_STRICT_EXPORT
to look at a certain path on the filesystem for repositories, and will only allow access to those. (The presence of agit-daemon-export-ok
file in this case is irrelevant.)I think I want more control than that, at the application level. So I am thinking that security-wise, repos will only be shown when they are explicitly added, rather than pulling in all repos that happen to exist within a certain directory. One benefit of this is that I can reference repos in disparate places like
/a/b/repo1.git
and/b/c/repo2.git
.In addition to the above access controls we will want application-level constraints about what people can see. Admins will be able to see all configured repositories. Other users will only be able to see open source ones (ones that have been designated as such at the application level and which have a
git-daemon-export-ok
file, perhaps).By setting access control at the application level we can have finer grained levels of control, such as:
- Allowed to see all logs and content (commits, branches, trees, blobs etc)
- Allowed to see logs but not content
- Allowed to see oneline version of logs but not full version
- Allowed to see existence of repository but nothing else about it (or perhaps just branch names and tag lists, for example)
-
Greg Hurrell
Will have to show merge commits using
git diff --cc
. -
Greg Hurrell
Ok, this has been sitting around for long enough now. I think it's time to get started on implementing this:
- get transport/mirroring working between Git server and web server; still need to decide on whether to use push or pull model, but leaning towards periodic pull (from cron job), although it would involve some lag
- start with "Repo" model resource because all the others are nested inside it, and it can actually be initially implemented at the application level with no actual Git access to the filesystem (because it's metadata only, at least at first)
- move on to "Commit" model, "Branch" model and "Tag" model (although we already have a model with that name, so will need to pick another); finally tackle the "Tree" and the "Blob" models
-
Greg Hurrell
Ok, item "1" now done.
-
Greg Hurrell
Sweet, just discovered
git log -p --word-diff=porcelain
. -
Greg Hurrell
Useful:
git log --format=raw
(plus the-p --word-diff=porcelain
switches mentioned above, and-n 10
or-n 20
to limit the number of commits shown at a time). -
Greg Hurrell
Looks like I can set up a
post-receive
hook in the mirrored repositories to do cache invalidation. This will be particularly useful for things like Atom feeds which could get hit fairly often, and may be expensive to generate. -
Greg Hurrell
Ok, the basic functionality is now implemented. I'm going to mark this ticket as closed and open smaller, more focused tickets for the remaining details.
-
Greg Hurrell
Status changed:
- From: open
- To: closed
Add a comment
Comments are now closed for this issue.