Parsing URLs in Twisted Python

2 PM April 2, 2004

Writing my blog replacement software with Twisted I had some dynamic URLs that were difficult to handle the normal Twisted way, so I wrote some code that looked at each segment of the URL. That code was ugly, so I threw it away and wrote another version, which uses regular expressions to pull apart the URL.

The URLs share a certain amount of commonality in their structure, but some parts are optional, and URLs that mean different things look similar too:

  • /blog/2003.html - the 2003 archive.
  • /blog/python/2003.html - the 2003 Python archive.
  • /blog/2004_02_18/blog_replacement_object_model - an article.

The code looks like this, minus comments and logging:


 class BlogResource(Resource):
    isLeaf = True

    def render(self, request):
        restOfPath = '/'.join(request.postpath)
        for (pat, pageClass) in self.MATCH_LIST:
            m = re.match(pat + '$', restOfPath)
            if m:
                return pageClass(request, *m.groups()).render()

        return common.NotFoundPage(request).render()

    MATCH_LIST = [
        (r'(?:index.html)?', BlogIndexPage),
        (r'(?:(\w+)/)?(\d\d\d\d)(?:\.html)?', 
                BlogYearArchivePage),
        (r'(?:(\w+)/)?(\d\d\d\d)_(\d\d)(?:\.html)?', 
                BlogMonthArchivePage),
        (r'(?:(\w+)/)?(\d\d\d\d)_(\d\d)_(\d\d)(?:\.html)?', 
                BlogDayArchivePage),
        (r'(\d\d\d\d)_(\d\d)_(\d\d)/(\w+)(?:\.html)?', 
                BlogEntryPage),
    ]

Step-by-step:

  1. isLeaf = true causes Twisted to always call our render() method, even if there are more URL segments.
  2. restOfPath is set to the value of the rest of the URL, not counting the '/blog' bit that got us to this class.
  3. The for statement loops through MATCH_LIST, looking for a pattern that matches restOfPath.
  4. When a match is found, a new instance of the matching pageClass is created, passing in the request, and all the matching groups out of the regular expressions as constructor parameters.
  5. render() is called on the resulting page object to render the HTML.

Now that's the kind of trick you can pull in Python, but which would just be hard in Java.

By alang | # | Comments (3)
(Posted to Python, Software Development and javablogs)
© 2003-2006 Alan Green