Comments
-
Greg Hurrell
Thanks for the report. As you've probably guessed, the symlink back to
.
creates a circular reference, which is effectively an "infinite loop", causing Command-T to hit the file limit (10,000 by default) and then stop scanning, meaning that it never gets to yourproject
andpublic
directories.In other words, it will wind up scanning:
-
bank/*
,bin/*
andlib/*
; then: bank/private/*
; then:bank/private/private/*
; then:-
bank/private/private/private/*
; and so on, until it hits the limit.
I can see two or three workarounds here:
One is to teach Command-T about symlinks (currently, it only knows about paths) and have it not traverse into them at all. This is pretty much a no-starter idea as there are legitimate uses for symlinks in project organization.
Another: again teach it about symlinks, and have it explicitly check for them and see if it can detect circular references. Not sure if this will slow down the directory scanning much, but I do fear it will only work properly on UNIX-y platforms, seeing as Windows is symlink-challenged, as far as I know.
The other is to to replace what is effectively a depth-first traversal of the directory tree with a breadth-first one. It's written as a depth-first traversal because that was the most natural way to do so (see the implementation in
ruby/command-t/scanner/file_scanner.rb
, where we use Ruby'sDir.foreach
method, which naturally lends itself to depth-first traversal).In order to switch to breadth-first traversal we'd need a FIFO queue. Note that this would not make the circular reference go away, so the traversal would still eventually hit the file limit, but at least it would increase the likelihood of exploring some of the other directories before hitting the limit. (Still, depending on the number of files in each directory, it is possible that the circular reference could prevent full scans of the non-cyclic parts of the graph from completing.) As this is not necessarily a full mitigation of the problem, I don't think it is worth exploring. Additionally, you'd still wind up with effectively useless "garbage" paths in your search space (like
bank/private/private/private/private/private/foo
), which are only going to slow down the matching algorithm as it will have to try matching against 10,000 paths all the time, even if your project has well under 10,000 files in it.So, I don't think this is actually a "bug" as such in Command-T, and I actually think having a circular-reference in your directory hierarchy is more of a "bug" in your project layout, but I am not averse to the idea of implementing the second workaround to be more robust in the face of this kind of "pathological" input. (This is now the second time I've seen somebody puzzled by the way Command-T behaves because they had such a "bug" in their project layout. The fix in that case was for them to remove the circular reference which they hadn't even intentionally added in the first place, but I'd rather not have people puzzled by the behavior of the tool, even if it is their own "fault").
My only concern is making sure it works across all platforms (quite a few people use Command-T on Windows), and making sure that it doesn't have too much of an adverse performance impact. It may well do so, as for each path we traverse, we'll have to expand it to get its true location on the file system, record that path, and then check against that set of traversed paths for every single other path we look at during traversal. This immediately doubles our space requirements (because we have store not only the apparent path, but the actual path as well), and while checking for set membership should be relatively quick, I am not sure whether getting the real path will be an expensive operation that will turn into a noticeable delay when you have to do it for large projects.
-
-
Greg Hurrell
Summary changed:
- From: symlink prevents Command-T from scanning for files
- To: Self-referential symlink causes Command-T to hit file limit before scanning all files in a directory
-
anonymous
I admire your attention to detail! I agree that the directory structure is pathological input, but I think Command-T should be able to not fail in the face of it (even if it doesn't have to work well).
To detect circular references, wouldn't it be sufficient to only expand suspicious directories, i.e. with the same name as another directory? e.g. for foo/bar/foo/, you would check that it is not a symlink to foo/ (or vice versa). Maybe you could also raise the bar of suspiciousness to 2 repeated directory names, e.g. foo/bar/x/y/z/foo/bar/.
Another suggestion: is it possible to notify the user that the 10,000 file limit has been reached?
-
Greg Hurrell
I don't think that will work, as symlinks need not be named like their targets. ie. in your example,
private
links to.
, so the scanning algorithm would happily enter./private/bank
without any clues that it was enteringbank
for the second time, and neither.
norbank
are symlinks.As for notifying the user, I split that off into a separate ticket, issue #1839.
-
anonymous
You are right, a symlink could be named differently to its target, but you could be suspicious about any subdirectories, e.g. you could be suspicious that private/bank/ contains a circular reference somewhere, and is actually identical to bank/.
Or you could raise suspicion on only sub-subdirectories, e.g. you could be suspicious that private/bank/foo/ is identical to bank/foo/.
I don't mean to argue that I really think this is the right thing to do, I am just throwing an idea into your hat.
-
anonymous
Or why not just be supicious about any repeating name in the path, eg.
foo/foo/ (foo looks like symlink to .)
foo/bar/foo/ (bar looks like symlink to .)
-
Greg Hurrell
This pull request may handle some (or most?) of the simple cases, without incurring much cost or complexity. Not sure yet if there are any edge cases which would trip it up. I'm going to try it out on the "next" branch for a while.
-
Greg Hurrell
Status changed:
- From: new
- To: closed
Add a comment
Comments are now closed for this issue.