Website Thumbnail Generator Web App - thummer

Posted on Sat 10 January 2009 in thummer • Tagged with cutycapt, django, thummer, ubuntu, xvfb

BBC News Website Thumbnail

Over Christmas I began learning django and python. I always find it better to learn new languages when you are applying what you read to a project that you are actually interested in, and think might come in useful. Well, after finding this great article on website thumbnail generation using cutycapt and django I decided to make a small web service / web app which would generate and serve website thumbnail screenshots of varying sizes.

I've always had difficulty with the "free" web thumbnail services out there - they usually limit the number of requests and/or thumbnail sizes as well as using watermarks or requiring link backs. And this is understandable - running this kind of service en-mass is resource hungry, and those servers aren't going to pay for themselves.

The idea of thummer is to allow people to set up their own thumbnail service on their own server, allowing them to generate snapshots of websites at any size with no restrictions.

Unfortunately because of some dependencies (CutyCapt requires qt4 and an X server) setting it up isn't quite as easy as I'd hoped, but isn't very difficult (lots of steps - but not tricky).

I'll post some instructions in the near future for Ubuntu 8.04 LTS Server.

I have now posted the (rather lengthy) installation instructions for Ubuntu 8.04 LTS.

I have set up a project page on Launchpad for thummer, so any bugs and questions can be tracked. The current release can be downloaded from Launchpad here.

A demo I set up is available here (but may disappear in the future if it starts getting hammered!).

This is my first django/python project, so any feedback is more than welcome!


Serving the Correct MIME Type for XHTML

Posted on Wed 01 November 2006 in XHTML

"Work In Progess"

This article is work in progress. It will be expanded upon and become more reader friendly - at the moment it is just a brief list of my ideas.

This is a topic of much debate, but I have decided on my strategy.

  • XHTML has a much stricter markup than using HTML 4. This is one of the most appealing aspects of using it. If you decide to use it you must abide by the rules.
  • I wish to use XHTML, but also want it to be as compatible as possible with older browsers (if compatibility with really old browsers is of the utmost concern, you should probably stick to HTML 4.01 Strict).

Serving up XHTML:

  • Use XHTML 1.0 (Strict / Transitional). XHTML 1.0 should be served as application/xhtml+xml, however it may be served as text/html. Any version of XHTML later than 1.0 should not be served as text/html.
  • Serve as application/xhtml+xml to user agents which support it (e.g. Mozilla FireFox, Opera).
  • Serve as text/html to user agents which do not specify acceptance of application/xhtml+xml, but do accept text/html (e.g. Internet Explorer).
  • Serve as application/xhtml+xml (i.e. as it should be) to user agents which do not specify acceptance of text/html (e.g. the W3C Validator).
  • Using ASP.NET 2.0? See: Serving the Correct MIME Type for XHTML using ASP.NET 2.0.

Implications:

  • Use UTF-8 character encoding.
  • Only add the <?xml version="1.0" encoding="UTF-8" ?> declaration if serving as application/xhtml+xml. It should normally be included, but is not required when using UTF-8. Because IE6 switches to "quirks" mode when this is included, omit when serving as text/html.
  • Only add the <meta http-equiv="content-type" content="text/html; charset=UTF-8" /> element if serving as text/html.
  • Mark all <style> and <script> contents as CDATA (using the backwards-compatible method). See: Marking Script and Style as CDATA.
  • Follow the XHTML 1.0 Appendix C HTML Compatibility Guidelines.
  • Stick to the 4 XHTML Safe Named Entities: &lt; for <, &gt; for >, &amp; for &, &quot; for ". &apos; for ' is not supported in IE6. For this and all others use numbered entities (e.g. &#39; for ').
  • Check that all your JavaScript works. You can't use document.write() or innerHTML (this is from my reading and is unchecked/tested).
  • Make sure you style the <html> element, as the <body> element doesn't cover the entire viewpoint.
  • Validate, validate, validate! Make sure your XHTML is perfect, and that your pages work in a multitude of browsers.

Serving the Correct MIME Type for XHTML using ASP.NET 2.0

Posted on Wed 01 November 2006 in ASP.NET

This example code goes some way to serving XHTML properly, as outlined in the article: Serving the Correct MIME Type for XHTML.

Idealy, text/html should only be served when the user agent specifies acceptance of text/html, but not application/xhtml+xml. However, Internet Explorer 6 does not return anything useful in it's HTTP_ACCEPT response.

Place the following code in the Page_Init section of your master page:

String strContentType = null;

// Determine if the user agent returns any AcceptTypes (HTTP_ACCEPT) response
if (Request.AcceptTypes != null)
{
  // Determine if the user agent can handle XHTML served as "application/xhtml+xml" (e.g. FireFox, Opera)
  if (Array.IndexOf(Request.AcceptTypes, "application/xhtml+xml") > -1)
  {
    strContentType = "application/xhtml+xml";
  }
  else
  {
    // If the user agent does not specifiy acceptence of "application/xhtml+xml", then serve as "text/html" (e.g. Internet Explorer)
    strContentType = "text/html";
  }
}
else
{
  // If AcceptTypes is null, then serve as "application/xhtml+xml" (e.g. W3C Validator)
  strContentType = "application/xhtml+xml";
}

// Set the Response Content Type
Response.ContentType = strContentType.ToString();

if (strContentType == "text/html")
{
  // Generate a "content-type" meta tag in the page head
  HtmlMeta hmtContentType = new HtmlMeta();
  hmtContentType.HttpEquiv = "content-type";
  hmtContentType.Content = strContentType.ToString() + "; charset=UTF-8";
  Page.Header.Controls.Add(hmtContentType);
}
else
{
  // Generate xml declaration
  Response.Write("<?xml version="1.0" encoding="UTF-8" ?>
");
}

// Set the page encoding
Response.ContentEncoding = System.Text.Encoding.UTF8;

// Send a Vary header to inform proxy servers that content negotiation is taking place
Response.AddHeader("Vary", "Accept");

Note

When served as application/xhtml+xml, all <style> and <script> in the XHTML document must be marked as CDATA. Because ASP.NET automatically generates JavaScript, this can be a problem. See Marking ASP.NET 2.0 Generated JavaScript as CDATA for a workaround.


Marking ASP.NET 2.0 Generated JavaScript as CDATA

Posted on Wed 01 November 2006 in ASP.NET

The following code example is a workaround for marking ASP.NET generated JavaScript as CDATA, which is required when serving XHTML ASP.NET 2.0 pages as application/xhtml+xml (see Serving the Correct MIME Type for XHTML using ASP.NET 2.0).

Place the following code in your master page:

protected override void Render(HtmlTextWriter writer)
{
  StringWriter stwHtml = new StringWriter();
  base.Render(new HtmlTextWriter(stwHtml));
  String strHtml = stwHtml.ToString();

  // Enclose ASP.NET generated client ECMAScript / JavaScript in CDATA Wrapper
  Regex regScriptCDATA = new Regex("(<script\stype="text\/....script">(?:\s)*?)" + "(?:<!--\s)" + "((?:.|
)*?)" + "(?:// -->)" + "((?:\s)*?<\/script>)");
  String strScriptCDATA = "$1" + "<!--//--><![CDATA[//><!--
" + "$2" + "
//--><!]]>" + "$3";

  strHtml = regScriptCDATA.Replace(strHtml, strScriptCDATA);


  writer.Write(strHtml);
}

You may need to add the following at the beginning of your master page code:

using System.IO;
using System.Text.RegularExpressions;

Marking Script and Style as CDATA

Posted on Wed 01 November 2006 in XHTML

When using XHTML, the contents of <script> and <style> elements must be marked as CDATA. This is essential when serving XHTML correctly (as application/xhtml+xml).

When possible, use external script and style files and reference these from your XHTML document (behavioural and presentational separation). However, the examples below show how to place <script> and <style> inside your XHTML document correctly. This method should be compatible with older browsers, and therefore degrades gracefully.

Marking <script> as CDATA:

<script type="text/javascript">
  <!--//--><![CDATA[//><!--
    ......
  //--><!]]>
</script>

Marking <style> as CDATA:

<style type="text/css">
  <!--/*--><![CDATA[/*><!-- */
    ......
  /*]]>*/-->
</style>

ASP.NET 2.0

If you are serving ASP.NET pages as application/xhtml+xml (see Serving the Correct MIME Type for XHTML using ASP.NET 2.0 ), then the automatically generated JavaScript will not be marked as CDATA, and will therefore not work. For a workaround see: Marking ASP.NET 2.0 Generated JavaScript as CDATA.