The Lazy Man's URL Parsing in JavaScript

2012/05/07

Have you ever needed to parse a URL using regular expressions? It’s not easy to write regular expressions (for a lot of people, including myself) and it’s even tougher to test to see if that regular expression is reliable across every situation. You could, of course, just copy and paste a regular expression (or function or library) that someone else developed and use that, but I propose that there is a simpler and more concise way of parsing URLs that doesn’t require any regular expressions.

This method – originally posted on Github by John Long, though probably not originally discovered by him – uses native parsing abilities built into the DOM to give you simple access to the parts of a URL simply by querying properties of an anchor element. Check it out:

var parser = document.createElement('a');
parser.href = "http://example.com:3000/pathname/?search=test#hash";

parser.protocol; // => "http:"
parser.hostname; // => "example.com"
parser.port;     // => "3000"
parser.pathname; // => "/pathname/"
parser.search;   // => "?search=test"
parser.hash;     // => "#hash"
parser.host;     // => "example.com:3000"

This code is pulled directly from the Gist that John Long posted at the above link. I haven’t seen any statements about which browsers this works with, but I assume that, at a minimum, it works with all modern browsers. If you don’t trust it you can either test it yourself, or use a library such as URI.js.

One of the coolest things about this method is that you can enter a partial/relative URL into the href property and the browser will make it a full URL, just like it translates partial URLs on real HTML links into full URLs. For example, try this using your browsers console on this page:

var parser = document.createElement('a');
parser.href = "/";

parser.href; // => "/"

You could also just use an empty string for the href and it would give you your current URL (not including the hash, though), but this is a waste because window.location has the exact same properties, so you don’t even need to create an anchor element for that.

In all of these examples, you still need to parse the query string, but at least you’ve got it pulled out of the URL.

UrlParsing.com/Conclusion#Paragraph

I know this is shorter than my usual posts, but I think you still learned something pretty valuable, assuming you didn’t already hear about this somewhere else. I definitely wish I knew about this a while back when I was actually doing a project where I needed to parse a URL. Make sure to spread the parsing techniquearound to all of your JavaScript programming friends and leave your comments below. Happy Coding!

EDIT:

I found a post stating that this does not work in IE6 because the href property isn’t parsed into a full URL unless it is parsed by the HTML parser. There is a simple workaround that forces the HTML parser to go over it though:

function canonicalize(url) {
    var div = document.createElement('div');
    div.innerHTML = "<a></a>";
    div.firstChild.href = url; // Ensures that the href is properly escaped
    div.innerHTML = div.innerHTML; // Run the current innerHTML back through the parser
    return div.firstChild.href;
}

Author: Joe Zimmerman

Joe Zimmerman has been doing web development ever since he found an HTML book on his dad's shelf when he was 12. Since then, JavaScript has grown in popularity and he has become passionate about it. He also loves to teach others though his blog and other popular blogs. When he's not writing code, he's spending time with his wife and children and leading them in God's Word.