You are viewing an historical archive of past issues. Please report new issues to the appropriate project issue tracker on GitHub.

Home » Issues » Feature request #1726

Feature request #1726: Allow links with query strings in internal link syntax (ie. [/hello?there=true link])

Kind	feature request
Product	wikitext
When	Created 2010-11-05T20:15:43Z, updated 2010-11-06T00:25:44Z
Status	open
Reporter	anonymous
Tags	no tags

Description

If the text contains a relative URL with query-string params, the question mark appears to be confusing the parser. Here's an example:

 text = '[/bar?baz=bat link]'
 expected = '<p><a href="/bar?baz=bat">link</a></p>\n'

 assert_equal expected, Wikitext::Parser.new.parse( text )

Results in the following error:

 "<p><a href=\"/bar?baz=bat\">link</a></p>\\n" expected but was "<p>[/barbaz=bat link]</p>\n"

Comments

Greg Hurrell 2010-11-05T20:45:26Z
Two things here:

First issue
- the relative-link syntax is by design very restrictive; if you look at the parser you'll see this:
```
             // external links look like this:
            //      [http://google.com/ the link text]
            //      [/other/page/on/site see this page]
            // strings in square brackets which don't match this syntax get passed through literally; eg:
            //      he was very angery [sic] about the turn of events
```
ie. it expects either a URI or a path, and everything else is let through as is. This is because there are many places in normal text (like the "[sic]" example) where square brackets are legitimate and natural, and we don't want such places getting turned into links.

So that's one issue: internal links have always been considered an exception, a special case, in order to keep the lexer and parser simple, and avoid false positives. The idea was that if you wanted to do anything more complex than that then you'd just have to use a fully qualified URL.

It is probably within the realm of possibility to teach the lexer a slightly more nuanced notion of what constitutes a "path", but it would have to be done carefully. Here are the relevant lexer rules as they are now:
```
   uri_chars           = (alnum | [@$&'(\*\+=%_~/#] | '-')+ ;
   special_uri_chars   = ([:!,;\.\?])+ ;
   uri                 = ('mailto:'i mail) |
                         (('http'i [sS]? '://' | 'ftp://'i | 'svn://'i) uri_chars (special_uri_chars uri_chars)*) ;
   path                = '/' ([a-zA-Z0-9_\-.]+ '/'?)* ;
```
Note that the "path" segment of a uri is actually very loosely defined in terms of "uri chars" (what constitutes a "uri char" is based on the relevant RFCs, although I can't quote the exact RFC numbers from memory right now).

We can't just re-use that same definition to define the actual path rule, because that would allow absurd paths that would never exist in any sane application (and remember, as these are internal links, we pretty much expect them to be sane by definition); paths like "@@@@@@$$$$$$$$&&&&&" would be allowed, which obviously is not a good thing.

So, if you're interested in making this happening, you'd have to make the path rule quite a bit more sophisticated, and the question then becomes, where do you draw the line? Is being able to parse /foo?bar=1 good enough? Or do you want to be able to parse /foo?bar[1]=baz&bar[2]=bing? (And in that second example, note how the nested square brackets might throw a spanner in the works.)

Basically, I didn't want things to become that complex when I added this feature, and I still don't really want to now. So if you want to have a crack at it, feel free and I'd be happy to look at your patch, but I can't promise you that I'll include it.

Second issue
- The other issue, then, is that the question mark is getting eaten in test assertion that you cite there. That's a bug. Regardless of what you want to do about the first issue, I'll see if I can address this one.
Mike Stangel 2010-11-05T21:49:17Z

I agree with the need to balance simplicity against a "support every possible scenario" approach. Since relative URLs must begin with a /, however, I wonder how great is the need to write [/{something}] and NOT want that to become a link? If wikitext defines [] as the link syntax, then it seems to me we ought to default to treating this syntax as a link, unless it's wrapped in <nowiki>. Were that the case, you wouldn't need complex character matching to convince yourself it's a URL -- if it starts with [/ then you can assume everything up to the next space is the URI, and everything from that point to the closing ] is the link text. I agree that supporting square braces in the query string is tricky and probably not worth supporting in this scenario. (I consider it critical to posting forms in Rails applications but can't think of any real-world examples of where it's necessary on a GET request)

All of that being said, it's not a show-stopper for me if you were to decide not to support this at all.
Greg Hurrell 2010-11-05T23:21:55Z
And voila, you've discovered another bug.

In your post where you typed:
```
if it starts with `[/` then you can assume
```
The parser effectively ignoring the second `. Funnily enough, I wouldn't be surprised if the root cause is the same (or very close to) the "Second issue" that I mentioned in my first comment (ie. the "?" getting swallowed on a failed link).

Without looking at the code, I'd say what's happening here is:
1. see `, start <code> span
2. see [, and assume we're about to see an external link
3. scan a URI or path; in this case / counts as a path
4. expect to see a space, but fail, so rollback the failed link (ie. print the left bracket and the path)
5. continue processing... at this point the next character we see should be another ` and we should close the <code> span
Looking at the code, looks like our rollback is backfiring on us because instead of just printing the left bracket and the path, we are also printing the backtick and not assigning it any special meaning. We're emitting the backtick, but we should just restart the processing loop and handle it in the case TT rule.

As you can see, this kind of hand-crafted parser, already several thousand lines long, is an incredibly intricate and delicate piece of machinery, which is why I am hesitant about making sweeping changes. I will see if I can fix these little bugs though...

In the meantime I am going to edit your comment and mark it up like this:
```
if it starts with `<nowiki>[/</nowiki>` then you can assume
```
That should fix the display issue, at least.
Greg Hurrell 2010-11-05T23:53:31Z

Fixed that little bug.

And yes, it also explains/fixes the disappearing "?" in your initial example.
Greg Hurrell 2010-11-06T00:24:16Z
Kind changed:
- From: bug
- To: feature request
Greg Hurrell 2010-11-06T00:25:34Z
Summary changed:
- From: Question mark in relative URLs confuses parser
- To: Allow links with query strings in internal link syntax (ie. [/hello?there=true link])
Greg Hurrell 2010-11-06T00:25:44Z
Status changed:
- From: new
- To: open

Add a comment

Comments are now closed for this issue.

Feature request #1726: Allow links with query strings in internal link syntax (ie. [/hello?there=true link])

Description

Comments

First issue

Second issue

Add a comment

Menu