Empty-string URLs in HTML – A followup

Late last year, after spending 10 days tracking down a horrific bug, I posted, Empty image src can destroy your site. The post laid out a problem present in almost all modern browsers regarding empty string URLs in HTML. Empty-string URLs look like this:

<img src="">
<script src="">
<link rel="stylesheet" href="">

Depending on the browser, one or more of these elements will actually cause another request to the server. Not just any request, though, a request to the containing page. That means the entire markup of your page is regenerated and served even though no one is actually viewing it. The linked post explains in detail why this is problematic, but suffice to say, these type of unexpected requests to your server can bring down high-traffic sites by unexpected increasing traffic or alternately can corrupt user state information.

Current state of browsers

As a quick summary of where we are today, here’s how the various browsers stock up:

  • Internet Explorer through version 8 make a request for <img src=""> only.
  • Firefox 3 and earlier makes a request for all three of the patterns.
  • Firefox 3.5 fixed the <img src="'> case but not the others.
  • Safari 4 makes a request for all three patterns.
  • Chrome 4 makes a request for all three patterns.
  • Opera doesn’t make a request in any of these instances.

It’s not a pretty picture out there. But there has been movement since my last post.

Forward progress

Believing that this was the wrong behavior, I began contacting various browser vendors to ask if this behavior could be addressed. The most frequent response I received was that the browser was “following the standards” and shouldn’t be changed. I thought this was too dismissive and did some digging. The inconsistent treatment of empty-string URLs even within a single browser led me to believe that there was no specification governing this behavior. As it turned out, the specification to which everyone was referring was the URL specification (RFC 3986 – Uniform Resource Identifiers) and not HTML. While the URL specification does indicate that resolution for an empty string should result in the containing page, my argument was that this made no sense in the context of HTML.

HTML5

So in December, I posted a message to the WHAT-WG mailing list to see if I could get some consensus on this issue. After a lengthy discussion and a bunch of research, everyone ended up agreeing that this behavior was unexpected and should be changed. This month, changes were made to HTML5 specifically stating that empty-string URLs should not cause server requests for the following (complete diff):

<img src="">
<input type="image" src="">
<script src="">
<link rel="stylesheet" href="">
<embed src="">
<object data="">
<iframe src="">
<video src="">
<video poster="">
<audio src="">
<command icon="">
<html manifest="">
<source src="">

Essentially, any tag that would result in the automatic download of an external resource will not make such a request if an empty-string URL is specified.

Thusfar, the Firefox team has agreed to make this update (see bug 531327). I filed a bug with the Chromium team (issue 38144) and also commented on a bug that was already filed at WebKit (bug 30303) and am waiting for updates (if a WebKit contributor would like to take this on, please do). Perhaps not surprisingly, I’ve had a little trouble pleading my case to Microsoft. I’ve not given up, but if you are or have a Microsoft contact that could help resolve this issue, please let me know.

YSlow

Since the empty-string URL issue really affects server performance, I asked the YSlow team if they could add in empty-string URL detection. Even though the issue with empty image URLs has been resolved in Firefox, it’s still present in other browsers, so YSlow’s flagging of this serious issue can help you avoid problems in other browsers.

In this initial release with the feature, there is a new non-default rule that you can turn on in a custom ruleset. To do so, click the Edit button next to the list of rulesets, check the box next to “Avoid empty src or href”. Click “Save ruleset as…” and type in a new name. Then, select your new ruleset from the dropdown box and click the “Run Test” button.

New YSlow rule for empty href or src

This new rule is under the “Server” group of rules. YSlow will correctly detect <img src=""> and <link rel="stylesheet" href=""> and give you an F if there are any instances of either of these patterns.

New YSlow rule for empty href or src

Note that due to a bug in YSlow, you’ll sometimes also get an A for this rule even if you do have one of the offending patterns present. This will be addressed soon.

Thanks

Things sometimes seem to move slowly on the Internet, but ultimately I believe things tend to get done correctly. We’re still likely at least a year away from never needing to worry about this issue again, and for that I need to thank a bunch of people:

  • Jonas Sicking of Mozilla for suggesting that this issue be brought up on the WHAT-WG mailing list and for participating in the discussion.
  • Simon Pieters of Opera and Maciej Stachowiak of WebKit for chiming in and agreeing that this behavior seemed broken.
  • Ian Hickson for making the adjustments in HTML5 so that this issue can be put to rest.
  • Stoyan Stefanov and the YSlow team for adding empty-string URL detection to YSlow.

Comments

  1. Livingston Samuel

    Hi nczakas, you've done a great job making the internet a better place by taking this issue to the browser vendors and getting it resolved in Firefox, and adding a test for it in YSlow.

    and have you tried reporting this issue at https://connect.microsoft.c..., I think reporting there would workout as I see the IE team actively involved in responding to the reported bugs/suggestions.

  2. cancel bubble

    Nice that this has been added to YSlow, have you emailed Steve Souders to see about getting it added to Page Speed? Any reason why the addition to YSlow is not checked by default?

  3. Kirk Cerny

    Thank You! I had this issue last year, and it took me days to track it down. Its nice to see you attempting to solve it for good.

  4. Greg Griffiths

    Had this issue recently on a HTTPS site where we were using hidden IFrames as part of the solution, as the SRC attribute was being set by the JS and not onload, we encountered the "issue" you mention as well as a security alert from IE as the external resource it went to get was a HTTP and not a HTTPS one, we ended up populating the SRC attribute to point to a small gif on the HTTPS server to get around it, so thanks for getting it fixed properly.

  5. Nicholas C. Zakas

    @Livingston - I actually contacted a couple of people directly on the IE team but didn't receive a favorable response. I'm trying to find some other contacts.

    @CancelBubble - I've asked them to make it on by default. Steve doesn't manage PageSpeed, but I can certainly drop him a line and help to get the ball rolling.

  6. Marcel Duran

    Hi Nicholas, great job! I also had the same problem last year which took me some time to detect but it was related to IE6 PNG-24 alpha transparency hack, by using style="filter: progid:DXImageTransform.Microsoft.AlphaImageLoader(src='', sizingMethod='scale');" on a tag we noticed twice the normal traffic for IE6 users only which gave us some hint where to look for.

    Question: Does it request the server again the number of empty string urls found on a page? I mean if a page has 3 tags will it be requested 1(page)+3(each img src) times?

  7. Marcel Duran

    Hi Nicholas, great job! I also had the same problem last year which took me some time to detect but it was related to IE6 PNG-24 alpha transparency hack, by using style="filter: progid:DXImageTransform.Microsoft.AlphaImageLoader(src='', sizingMethod='scale');" on a tag we noticed twice the normal traffic for IE6 users only which gave us some hint where to look for.

    Question: Does it request the server again the number of empty string urls found on a page? I mean if a page has 3 tags will it be requested 1(page)+3(each img src) times?

  8. Gary Davis

    I had this issue with a template of html that had something like src="{3}" and I intended to replace the {3} with jQuery when it executed after the pageload but the browser requested the {3} before it got replaced. I then changed it to src="" with jQuery replacing the src attribute's value but experienced the issue you found (needlessly pulling in the page).

    So my final solution was to omit the src="" completely and jQuery could then add the attribute and the value. That wound up doing what I wanted without the addional bogus request to the server.

    Gary Davis
    Webguild

  9. Markus

    Excellent post. I just noticed this same issue during testing - Chrome was requesting a page from the server 3-4 times for one request, duplicating all the API calls, database queries, etc each time, of course.

    I finally tracked it down to be an img tag with an empty src. I'm very pleased that I found this before the site went into production, this would have been terrible.

    This browser behavior definitely needs to change.

    * I posted this with a bunk code tag the first time, so I resubmitted my comment

  10. Andrew Mattie

    Any chance you know an engineer at Google who could get this exact issue fixed in their AJAX Search API ASAP? I just posted _NASTY BUG_ in Google AJAX Search API v1 in the Google Group of that API, but I'm not sure how long it takes them to look at it.

    Without remembering this post and the one you made back in November, I suspect it would have taken me far longer to figure out what was causing this issue. For that, I'm incredibly grateful to you!

  11. 裕波

    首先感谢你报告了这个问题,并且通过你的努力,yslow增加了这个功能
    同时,火狐也愿意修改,至于ie,你已经做了努力
    作为中国的前端工程师,我们也应该站出来,共同努力!

    非常感谢!
    今天我使用了中文,在这里留言!

  12. andrew cates

    funny i came across this bug too last year. this is the first post i've ever seen mention it.

  13. GafroNinja

    Thanks for shouting this out AND karate chopping the vendors AND adding it to yslow AND getting it added to the HTML5.

    Do you wear a cape to work?

  14. Anton Yatsenko

    Seems we should trace not only empty attribute, but bunch of cases:

    [img src="/"/]
    [img src="#"/]
    [img src="index.html"/]
    [img src="?"/]
    [img src="index.html#"/]
    [img src="index.html?"/]

    all above make request to server in chrome (linux)

    The list of cases a huge, but main rule "url == location.href ? don't do request", on the other hand we can have guys who use exactly root url for images/scripts/etc because of "something", so it's a holywar moment.

    Anyway good move by you in showing this issue.

    Good luck in all your things.

  15. Nicholas C. Zakas

    @Anton - Those are less problematic because they are intentionally trying to get this behavior. An empty-string URL is most likely an error and results in this behavior. If the value is non-empty, chances are the behavior is expected and should be allowed.

  16. Arturo Guzman

    LOL, I will fix the JS gallery I just built right now! Thx.

  17. Bertilo Wennergren

    Maybe src="about:blank" could be a good choice if you plan to set the actual value later with JS.

  18. Does it request the server again the number of empty string urls found on a page? I mean if a page has 3 tags will it be requested 1(page)+3(each img src) times?

  19. If the value is non-empty, chances are the behavior is expected and should be allowed.

Understanding JavaScript Promises E-book Cover

Demystify JavaScript promises with the e-book that explains not just concepts, but also real-world uses of promises.

Download the Free E-book!

The community edition of Understanding JavaScript Promises is a free download that arrives in minutes.