2009-12-31

How to fix the Ctrl-Y Ctrl-Z inconsistency in GNOME Terminal with international keyboard layouts

This blog post describes and fixes an inconsistency between some keys with or without Ctrl in GNOME Terminal with international keyboard layouts.

The quick fix:

$ wget -O- -q http://pts-mini-gpl.googlecode.com/svn/trunk/pts-vtefix/vtefix.sh |
sudo bash

In GNOME Terminal, with the Hungarian and German keyboard layouts the keys Y and Z are switched with respect to the US English layout. So when the national layout is active, one of the keys yields the string "z", but when the same key is pressed with Ctrl, it yields "^Y", instead of the expected "^Z". All other applications (e.g. GIMP, Firefox, xterm) work as expected, except for those using VTE (e.g. GNOME Terminal).

Here is a script which fixes this on Ubuntu Hardy, Ubuntu Karmic and Ubuntu Lucid:

perl -we '
  use integer;
  use strict;
  my $fn = $ARGV[0];
  print STDERR "info: vtefix library filename: $fn\n";
  die "error: file not found: $fn\n" if !open F, "<", $fn;
  $_ = join "", ;
  my @L;
  while (/\xf6[\x80-\x8f].[\x09-\x0a]\0\0[\x00\x04]
         |\x80[\x78-\x7f].\0\x74.\xf6\x85.\xff\xff\xff[\x00\x04]
         |\x80\x7d\x32\x00\x74\x11\x41\xf6\xc5[\x00\x04]
          (?=(?:\x0f\x1f\x80\0\0\0\0)?\x0f\x85)
         /sgx) {
    push @L, pos($_) - 1;
  }
  # We patch only the first occurrence, the 2nd one is different.
  pop @L if @L == 2 and vec($_, $L[0] - 6, 8) == 0xf6 and
                        vec($_, $L[1] - 6, 8) == 0xf6;
  if (@L == 1) {
    if (vec($_, $L[0], 8) == 4) {  # need to patch
      print "info: patching at offset $L[0]\n";
      die "error: cannot open file for writing: $fn\n" if !open F, "+<", $fn;
      die if !sysseek F, $L[0], 0;
      die if !syswrite F, "\0";
      print "info: patching OK, no need to restart gnome-terminal\n";
    } else {
      print "info: already patched at offset $L[0]\n";
      exit 2;
    }
  } else {
    die "error: instruction to patch not found (@L)\n";
  }
' -- "${1:-/usr/lib/libvte.so.9}"

Download the script from: http://pts-mini-gpl.googlecode.com/svn/trunk/pts-vtefix/vtefix.sh

The script has to be run each time after the libvte9 package is upgraded.

The script modifies the i386 and amd64 compiled libvte binary:

# amd64 Ubuntu Hardy:
# $ objdump -d /usr/lib/libvte.so.9.2.17 | grep testb | grep '[$]0x4,'
# 2a786:       f6 80 88 0a 00 00 04    testb  $0x4,0xa88(%rax)
# 2a7ba:       f6 87 88 0a 00 00 04    testb  $0x4,0xa88(%rdi)
# 
# i386 Ubuntu Hardy:
# $ objdump -d /usr/lib/libvte.so.9.2.17 | grep testb | grep '[$]0x4,'
# 25cd8:       f6 80 4c 09 00 00 04    testb  $0x0,0x94c(%eax)
# 25d0d:       f6 81 4c 09 00 00 04    testb  $0x4,0x94c(%ecx)

This corresponds to the following snipped in vte-0.16.3/src/vte.c:

if (handled == FALSE && normal == NULL && special == NULL) {
  if (event->group &&
      (terminal->pvt->modifiers & GDK_CONTROL_MASK))
    keyval = vte_translate_national_ctrlkeys(event);
  keychar = gdk_keyval_to_unicode(keyval);

The script replaces GDK_CONTROL_MASK (== 4) by 0, so the condition is always false, and vte_translate_national_ctrlkeys(...) is never called, which fixes the problem. Due to memory mapping, it is not necessary to restart GNOME Terminal instances already running.

To find the spot to patch in the binary, grep for gdk_keyval_to_unicode in the output of objdump -d /usr/lib/libvte.so.9.

More information about the bug (problem):

2009-12-19

Experimental HTTP server using Stackless Python

This blog post documents my experiment to write a non-blocking HTTP server based on coroutines (tasklets) of Stackless Python. My goal was to write a minimalistic web server server which can handle cuncurrent requests by using non-blocking system calls, multiplexing with select(2) or epoll(2), returning a simple Hello, World page for each request, using the coroutines of Stackless Python. I've done this, and measured its speed using ApacheBench, and compared it to the Hello, World server of Node.js.

The code is here: http://code.google.com/p/pts-mini-gpl/source/browse/#svn/trunk/pts-stackless-httpd http://syncless.googlecode.com/svn/trunk/benchmark.old/

Relevant ApacheBench spee results (for ab -n 100000 -c 50 http://127.0.0.1:.../):

Notes about the speed measurements:
  • I was using a recently compiled Stackless Python 2.6 and a recently compiled psyco for JITting.
  • I was surpriesed that my experimental code using select(2) and Stackless Python is faster than Node.js (by a factor of 1.925 on average, and the worst-case times are faster as well).
  • The speed comparison is not fair since Node.js has a real HTTP server protocol implementation, with its overhead, and my code just skips the HTTP header without parsing it.
  • Setting the TCP socket listen queue size to 100 (using listen(2)) made a huge difference on the worst case connection time. Compared to the setting of 5, it reduced the worst-case connection time from 9200 ms to 23 ms (!) in the measurement.
  • The source code of both servers can be found in the repository above.
  • My conclusion about the speed measurements is that a HTTP server based on Stackless Python and epoll(2) can be a viable alternative of Node.js. It would be worthwhile implementing one, and then doing proper benchmarks.

The advantage of using Stackless Python over callback-based solutions (such as Node.js in JavaScript, Twisted and Tornado) is that one can implement a non-blocking TCP server without being forced to use callbacks.

The unique advantage of Node.js over other solutions is that in Node.js not only socket communication is non-blocking, but DNS lookups, local filesystem access and other system calls as well – Node.js is non-blocking by design, but with other frameworks the programmer has to be careful not to accidentally call a blocking function. Avoiding a blocking function is especially cumbersome if a library used only provides a blocking interface.

Update: I've created a HTTP server capable of running WSGI applications. I've also integrated dnspython as an asynchronous DNS resolver. See it as project Syncless.

Update: Added (web.py) and CherryPy support.

Update: I've realized that the functionality of Syncless has already been implemented many times in Python. Examples: Concurrence, eventlet, gevent. See the comparison.

Minimalistic client for Amazon S3 (AWS) in Python

I've written py-mini-s3, a minimalistic client for Amazon S3 (AWS) in pure Python. It supports the GET and PUT operations. See its source code.

The implementation is based on the documentation http://docs.amazonwebservices.com/AmazonS3/latest/index.html?RESTAuthentication.html

2009-12-15

Sex, Unix style

My instruction sequence compilation of the traditional Unix sex theme:

date; touch; kiss; grep; strip; unzip; finger; mount; fsck; more; yes; uptime; gasp; umount; sleep

2009-12-06

Polynomials in Python with operator overloading

This blog post is a fun example for implementing univariate polynomials with operator overloading in Python. It is almost a domain-specific language (DSL). Here is the code:

#! /usr/bin/python2.4

class Polynomial(object):

  __attrs__ = ['co']

  def __init__(self, co=None):
    if co:
      co = list(co)
      while len(co) > 1 and not co[-1]:
        co.pop()
      self.co = co
    else:
      self.co = [0]

  def __repr__(self):
    items = []
    for i in xrange(len(self.co)):
      if self.co[i]:
        if i == 0:
          items.append('%s' % self.co[i])
        else:
          if self.co[i] == 1:
            pre = ''
          else:
            pre = '%s * ' % self.co[i]
          if i == 1:
            post = ''
          else:
            post = ' ** %s' % i
          items.append('%sx%s' % (pre, post))
    return ' + '.join(items) or '0'

  __str__ = __repr__

  def __call__(self, x):
    i = len(self.co)
    v = 0
    while i > 0:
      i -= 1
      v = self.co[i] + x * v
    return v

  def __add__(self, p):
    if isinstance(p, Polynomial):
      q = self
      if len(p.co) > len(q.co):
        p, q = q, p
      co = list(q.co)
      for i in xrange(len(p.co)):
        co[i] += p.co[i]
    else:
      co = list(self.co)
      co[0] += p
    return type(self)(co)

  __radd__ = __add__

  def __mul__(self, p):
    if isinstance(p, Polynomial):
      co = []
      for i in xrange(len(self.co) + len(p.co) - 1):
        s = 0
        for j in xrange(max(0, i - len(p.co) + 1), min(i + 1, len(self.co))):
          s += self.co[j] * p.co[i - j]
        co.append(s)
      return type(self)(co)
    else:
      return type(self)([x * p for x in self.co])

  __rmul__ = __mul__

  def __pow__(self, n):
    assert isinstance(n, int)
    assert n >= 0
    if n == 0:
      return type(self)([1])
    co = type(self)(self.co)
    while n > 1:
      co = co * self
      n -= 1
    return co

  #__rpow__ = __pow__

  def __sub__(self, p):
    return self + (-1) * p

  def __rsub__(self, p):
    return (-1) * self + p

x = Polynomial([0, 1])

# --- Test

for p in (x * (x + 1) * (2 * x + 1),
          (x + 1) * (1 + x),
          (x + 2) * (2 + x),
          (x + 3) ** 5,
          0 * x,
          x ** 2 - 2 * x + 1,
          (x + 3) + (3 * x ** 2 + 6) + x ** 4):
  print 'p = ', p
  print 'p(0) = ', p(0)
  print 'p(7) = ', p(7)
  print

Similar code for symbolic calculation with polynomials in Perl.

2009-11-17

How fast does 8g in Google Go compile?

To get an idea how fast Google Go code can be compiled, I've compiled a big chunk of the Google Go library code with the default 8g compiler. Here are the results:
  • 90 8g compiler invocations
  • no linking
  • 21MB .8 object data generated
  • 89540 lines of .go source from goroot/src/pkg
  • 345305 words of .go source
  • 2.22MB of .go source
  • 17908 lines per second compilation speed
  • 3676757 .8 bytes per second compilation speed
  • 389473 source bytes per second compilation speed
  • average 5.7s real time
  • average 4.8s user time
  • average 0.9s system time
  • 3G RAM and Intel Core2 Duo CPU T5250 @ 1.50GHz

As a quick comparison, I've compiled Ruby 1.8.6 with gcc:

  • 38 invocations of gcc -s -O2 (GCC 4.2.4)
  • 38 .c files of Ruby 1.8.6 source code (without extensions)
  • 15 .h files
  • 1044075 bytes generated in .o files
  • 92543 source lines
  • 283554 source words
  • 2.15MB of source code
  • 62190 source bytes per second compilation speed
  • 27916 .o bytes per second compilation speed
  • 37.4s real time
  • 35.9s user time
  • 1.0s system time
  • 0.4626 .o bytes generated per source byte

What I see here is that 8g is 131 times faster than GCC if we count object bytes per second, and it is 6.26 times faster if we count source bytes per second. But we cannot learn much from these ratios, because not the same code was compiled. What is obvious is that Go object files are much larger than C object files (relative to their source files) in this experiment.

How to write a program with multiple packages in Google Go

This blog post explains how to write a Google Go program which contains multiple packages. Let's suppose your software contains multiple packages: a and b, a consisting of two source files: a1.go and a2.go, and b contains only 1 source file importing package a. Here is an example:

$ cat a1.go
package a
type Rectangle struct {
  width int;
  Height int;
}

$ cat a2.go
package a
func (r *Rectangle) GetArea() int { return r.width * r.Height }
func (r *Rectangle) Populate() {
  r.width = 2;
  r.Height = 3;
}

$ cat b.go
package main
import (
  "a";
)
func main() {
  r := new(a.Rectangle);
  r.Populate();
  // Fields (etc.) starting with a small letter are undefined.
  // (private): b.go:9: r.width undefined (type a.Rectangle has no field width)
  // print(r.width, "\n");
  print(r.GetArea(), "\n");
  print(r.Height, "\n");
  print(r, "\n");
  print("done\n")
}

$ cat abc.sh
#! /bin/bash --
set -ex
rm -f *.6 6.out
6g -o a.6 a1.go a2.go
6g -I. b.go
6l b.6  # links a.6 as well

$ ./abc.sh
...
$ ./6.out
6
3
0x7f18309691a8
done

Notes:

  • Symbols and structure fields starting with lower case are not visible in other packages.
  • You have to specify -I. for the compiler so it will find a.6 in the current directory.
  • The linker figures out that it has to load prerequisite a.6 when linking b.6.

2009-11-15

FUSE protocol tutorial for Linux 2.6

Introduction and copyright

This is a tutorial on writing FUSE (Filesystem in UserSpacE for Linux and other systems) servers speaking the FUSE protocol and the wire format, as used on the /dev/fuse device for communication between the Linux 2.6 kernel (FUSE client) and the userspace filesystem implementation (FUSE server). This tutorial is useful for writing a FUSE server witout using libfuse. This tutorial is not a reference: it doesn't contain everything, but you should be able to figure out the rest yourself.

This document has been written by Péter Szabó on 2009-11-15. It can be used and redistributed under a Creative Commons Attribution-Share Alike 2.5 Switzerland License.

This tutorial was based on versions 7.5 and 7.8 of the FUSE protocol (FUSE_KERNEL_VERSION . FUSE_KERNEL_MINOR_VERSION), but it should work with newer versions in the 7 major series.

This tutorial doesn't give all the details. For that see the sample Python source code for a FUSE server.

Further reading

There is no complete and up-to-date reference manual for the FUSE protocol. The best documents and sources are:

Requirements

  • Linux 2.6 (even though FUSE has been ported to many operating systems, this tutorial focuses on the default Linux implementatation);
  • the fuse kernel module 7.5 or later compiled and loaded;
  • you being familiar with writing a FUSE server (with libfuse or one of the Perl, Python, Ruby or other scripting language bindings);
  • support for running external programs (like with system(3)) in the programming language;
  • support for creating socketpairs (socketpair(2)) in the programming language;
  • support for receiving filehandles (with recvmsg(2)) in the programming language (this is tricky, see below).

Overview of the operation of a FUSE server

This overview assumes that the FUSE server is single-threaded.
  1. Fetch the mount point and the mount options from the command-line.
  2. Optionally, create the directory $MOUNT_POINT (libfuse doesn't do this).
  3. Optionally, do a fusermount -u $MOUNT_POINT to clean up an existing or stale FUSE filesystem in the mount point directory. (libfuse doesn't do this).
  4. Run fusermount(1) to mount a filesystem.
  5. Receive the /dev/fuse filehandle from fusermount(1).
  6. Receive, process and rely o the FUSE_INIT message.
  7. In an infinite loop:
    1. Receive a message (with a buffer or ≤ 8192 bytes, recommended 65536 + 100 bytes) from the FUSE client on the file descriptor.
    2. If ENODEV is the result, or FUSE_DESTROY is received, break from the loop.
    3. Process the message.
    4. Send the reply to on the file descriptor, except for FUSE_FORGET.
  8. Clean up so your backend filesystem remains in consistent state.
  9. Exit from the FUSE server process.

fusermount(1), when called above does the following:

  1. Uses its setuid bit to run as root.
  2. Opens the character device /dev/fuse.
  3. Mounts the filesystem with the mount(2) system call, passing the file descripto of /dev/fuse to it.

Steps of running fusermount(1) and obtaining the /dev/fuse file descriptor:

  1. Create a socketpair (AF_UNIX, SOCK_STREAM) with fd0 and fd1 as file descriptors.
  2. Run (with system(3)): export _FUSE_COMMFD=$FD0; fusermount -o $OPTS $MOUNT_POINT . Example: export _FUSE_COMMFD=3; fusermount -o ro /tmp/foo .
  3. Receive the /dev/fuse file descriptor from fd1. This is tricky. See receive_fd function in the sample receive_fd.c for this. The sample fuse0.py contains a Python implementation (using the ctypes or the dl module to call C code).
  4. Close fd0 and fd1.

Wire format and communication

Once you have received the /dev/fuse file descriptor, do a read(dev_fuse_fd, bug, 8192) on it to read the FUSE_INIT message, and you have to send your reply. After that, you should be reading more messages, and reply to all of them in sequence (except for FUSE_DESTROY and FUSE_FORGET messages, which don't require a reply). All input (FUSE client → server) message types share the same, fixed-length header format, but the message may contain optional, possible variable-length parts as well, depending the message type (opcode). Nevertheless, the whole message must be read in a single read(3), so you have to preallocate a buffer for that (at least 8192 bytes, may be larger based on FUSE_INIT negotiation, preallocate 65536 + 100 bytes to be safe). All integers in messages are unsigned (except for the negative of errno).

The input message header is:

  • uint32 size; size of the message in bytes, including the header;
  • uint32 opcode; one of the FUSE_* constants describing the message type and the interpretation of the rest of the header;
  • uint64 unique; unique identifier of the message, must be repeated in the reply;
  • uint64 nodeid; nodeid (describing a file or directory) this message applies to (can be FUSE_ROOT_ID == 1, or a larger number, what you have returned in a previous FUSE_LOOKUP repy);
  • uint32 uid; the fsuid (user ID) of the process initiating the operation (use this for access control checks if needed);
  • uint32 gid; the fsgid (group ID) of the process initiating the operation (use this for access control checks if needed);
  • uint32 gid; the PID (process ID) of the process initiating the operation;
  • uint32 padding; zeroes to pad up to 64-bits.
The interpretation of the rest of the input message depends on the opcode. The most common input message types are:
  • FUSE_LOOKUP = 1: input is a '\0'-terminated filename without slashes (relative to nodeid), output is struct fuse_entry_out;
  • FUSE_FORGET = 2: input is a struct fuse_forget_in, there is no output message;
  • FUSE_GETATTR = 3: input is empty, output is struct fuse_attr_out;
  • FUSE_OPEN = 14: input is struct fuse_open_in, output is struct fuse_open_out;
  • FUSE_READ = 15: input is struct fuse_read_in, output is the byte sequence read;
  • FUSE_RELEASE = 18: input is struct fuse_release_in, output is empty;
  • FUSE_INIT = 26: input is struct fuse_init_in, output is struct fuse_init_out;
  • FUSE_OPENDIR = 27: input is struct fuse_open_in, output is struct fuse_open_out;
  • FUSE_READDIR = 28: input is struct fuse_read_in, output is the byte sequence read (serialized as FUSE-specific dirents);
  • FUSE_RELEASEDIR = 29: input is struct fuse_release_in, output is empty;
  • FUSE_DESTROY = 38: input is empty; there is no output message.
For a read-only filesystem with some files and directories, it is enough to implement only the opcodes above. See more opcodes and their coressponding C structs in the table. The linked document contains more details about some of the message fields. The complete up-to-date opcodes and message structs can be found in fuse_kernel.h. Each reply output message (FUSE server → client) starts with this header:
  • uint32 size; size of the message in bytes, including the header;
  • int32 error; zero for successful completion, a negative errno value (such as -EIO or -ENOENT) on failure; upon failure, only the reply header is sent;
  • uint64 unique; unique identifier copied from the input message;

Please note that you have to write the whole reply at once (one write(2) call). Using any kind of buffered IO (such as stdio.h or C++ streams) can lead to problems, so don't do that.

Feel free to experiment: whatever junk you write as a reply, it won't make the kernel crash, but you'll get an EINVAL errno for the write(2) call.

Your FUSE server doesn't have to implement all possible operations (opcodes). By default, you can just return ENOSYS as errno for any operation (except for FUSE_INIT, FUSE_DESTROY and FUSE_FORGET) you don't want to implement.

Common errno values the FUSE server can return:

  • ENOSYS: The operation (opcode) is not implemented.
  • EIO: Generic I/O error, if other errno values are not appropriate.
  • EACCES: Permission denied.
  • EPERM: Operation not permitted. Most of the time you need EACCES instead.
  • ENOENT: No such file or directory.
  • ENOTDIR: Not a directoy. Return it if a directory operation was attempted on a nodeid which is not a directory.

The format of struct fuse_init_in used in FUSE_INIT:

  • uint32 init_major; the FUSE_KERNEL_VERSION in the kernel; must be exactly the same your code supports;
  • uint32 init_minor; the FUSE_KERNEL_MINOR_VERSION in the kernel; must be at least what your code supports;
  • uint32 init_readahead; ??;
  • uint32 init_flags; ??;

The format of struct fuse_init_out reply used in FUSE_INIT:

  • uint32 major; the same as FUSE_KERNEL_VERSION in the input;
  • uint32 minor; at most FUSE_KERNEL_MINOR_VERSION (init_minor) in the input, feel free to set it to less if you don't support the newest version;
  • uint32 max_readahead; ?? set it to 65536;
  • uint32 flags; ?? set it to 0;
  • uint32 unused; set it to 0;
  • uint32 max_write; ?? set it to 65536;
You have to implement FUSE_GETATTR to make the user able to do an ls -l (or stat(2)) on the mount point. It will be caled with nodeid FUSE_ROOT_ID (== 1) for the mount point.

The format of struct fuse_attr_out reply used in FUSE_GETATTR:

  • uint64 attr_value; number of seconds the kernel is allowed to cache the attributes returned, without issuing a FUSE_GETATTR call again; a zero value is OK; for non-networking filesystems you can set a very high value, since nobody else would change the attributes anyway;
  • uint32 attr_value_ns; number of nanoseconds to add to attr_value;
  • uint32 padding; to 64 bits;
  • struct fuse_attr attr; node attributes (permissions, owners etc.).

The format of struct fuse_attr reply used in FUSE_GETATTR and FUSE_LOOKUP:

  • uint64 ino; inode number copied to st_ino; can be any positive integer, the kernel doesn't depend on its uniqueness; it has no releation to nodeids used in FUSE (except for the name);
  • uint64 size; file size in bytes (or 0 for devices); make sure you set it correctly, because the kernel would truncate rads at this size even if your FUSE_READ returns more; be aware of the size being cached (using attr_value);
  • uint64 blocks; number of 512-byte blocks occupied on disk; you can safely set it to zero or any arbitrary value;
  • uint64 atime; the last access (read) time, in seconds since the Unix epoch;
  • uint64 mtime; the last content modification (write) time, in seconds since the Unix epoch;
  • uint64 ctime; the last attribute (inode) change time, in seconds since the Unix epoch;
  • uint32 atime_ns; nanoseconds part of atime;
  • uint32 mtime_ns; nanoseconds part of mtime;
  • uint32 ctime_ns; nanoseconds part of ctime;
  • uint32 more; file type and permissions; example file: S_IFREG | 0644; example directory: S_IFDIR | 0755;
  • uint32 nlink; total number of hard links; set it to 1 for both files and directories by default; for directories, you can speed up some listing operations (such as find(1)) by setting it to 2 + the number of subdirectories;
  • uint32 uid; user ID of the owner
  • uint32 gid; group ID of the owner
  • uint32 rdev; device major and minor number for device for character devices (mode & S_IFCHR) and block devices (mode & S_IFBLK).

Nodeid and generation number rules

In FUSE_LOOKUP you should return entry_nodeid and generation numbers. If I undestand correctly, the following rules hold:
  1. When a (nodeid, name) pair selected which you have never returned before, you can return any entry nodeid and generation number (except for those which are in use, see below). These two numbers uniquely identify the node for the kernel.
  2. When called again for the same (nodeid, name) pair, you must return the same entry_nodeid and generation numbers. (So you must remember what numbers you have returned previously).
  3. You should count the number of FUSE_LOOKUP requests on the same (nodeid, name). When you receive a FUSE_FORGET request for the specified entry nodeid, you must decrement the counter by the nlookups field of the FUSE_FORGET request. Once the counter is 0, you may safely forget about the entry nodeid (so it no longer considered to be in use), and next time you may return the same or a different nodeid at your choice for the same (nodeid, name) -- but with an increased generation number.
  4. You must never return the same nodeid with the same generation number again for a different inode, even after FUSE_FORGET dropped the reference counter to 0. That is: nodeids that have been released by the kernel may be recycled with a different generation number (but not with the same one!).

How to list the entries in a directory

TODO

How to read the contents of a file

TODO

Other TODO

This tutorial is work in progress. In the meantime, please see the sample Python source code for a FUSE server.

Open questions

  • How does nodeid and generation allocation and deallocation work?
  • How to run a multithreaded FUSE server?

2009-11-12

Buffered IO speed test of Google Go

This blog post presents the buffered IO speed test I've done with the standard implementation of the programming language Google Go, as compared to C on Linux amd64. The program I've run just copies everything from standard input to standard output, character by character using the default buffered IO. The conclusion: C (with the stdio in glibc) is about 2.68 times faster than Google Go (with its bufio).

$ cat cat.c
/* by pts@fazekas.hu at Thu Nov 12 12:04:49 CET 2009 */
#include 

int main(int argc, char **argv) {
  int c;
  while ((c = getchar()) >= 0 && putchar(c) >= 0) {}
  fflush(stdout);
  if (ferror(stdin)) {
    perror("error reading stdin");
    return 1;
  }
  if (ferror(stdout)) {
    perror("error reading stdin");
    return 1;
  }
  return 0;
}
$ cat cat.go
// by pts@fazekas.hu at Thu Nov 12 11:56:06 CET 2009
package main

import (
  "bufio";
  "os";
  "syscall";
  "unsafe";
)

func Isatty(f *os.File) bool {
  const TCGETS = 0x5401;  // Linux-specific
  var b [256]byte;
  _, _, e1 := syscall.Syscall(
      syscall.SYS_IOCTL, uintptr(f.Fd()), uintptr(TCGETS),
      uintptr(unsafe.Pointer(&b)));
  return e1 == 0;
}

func main() {
  r := bufio.NewReader(os.Stdin);
  w := bufio.NewWriter(os.Stdout);
  defer w.Flush();
  wisatty := Isatty(os.Stdout);
  for {
    c, err := r.ReadByte();
    if err == os.EOF { break }
    if err != nil { panic("error reading stdin: " + err.String()) }
    err = w.WriteByte(c);
    if err != nil { panic("error writing stdout: " + err.String()) }
    if (wisatty && c == '\n') {
      err = w.Flush();
      if err != nil { panic("error writing stdout: " + err.String()) }
    }
  }
}

Compilation and preparation:

$ gcc -s -O2 cat.c  # create a.out
$ ls -l a.out
-rwxr-xr-x 1 pts pts 4968 Nov 12 11:59 a.out
$ 6g cat.go && 6l cat.6  # create 6.out
$ ls -l 6.out
-rwxr-xr-x 1 pts pts 325257 Nov 12 11:55 6.out
$ dd if=/dev/urandom of=/tmp/data bs=1M count=256
$ time ./6.out /dev/null  # median of multiple runs
./6.out < /tmp/data > /dev/null  15.31s user 0.16s system 99% cpu 15.497 total
$ time ./6.out /dev/null  # median of multiple runs
./a.out < /tmp/data > /dev/null  5.92s user 0.20s system 99% cpu 6.153 total

Update: My naive implementation of zcat (gzip decompressor) in Google Go is only about 2.42 times slower than the C implementation (gcc -O3), and the C implementation is about 5.2 times slower than zcat(1).

2009-11-10

How to read a whole file to String in Java

Reading a whole file to a String in Java is tricky, one has to pay attention to many aspects:

  • Read with the proper character set (encoding).
  • Don't ignore the newline at the end of the file.
  • Don't waste CPU and memory by adding String objects in a loop (use a StringBuffer or an ArrayList<String> instead).
  • Don't waste memory (by line-buffering or double-buffering).

See my solution at http://stackoverflow.com/questions/1656797/how-to-read-a-file-into-string-in-java/1708115#1708115

For your convenience, here it is my code:

// charsetName can be null to use the default charset.    
public static String readFileAsString(String fileName, String charsetName)    
    throws java.io.IOException {    
  java.io.InputStream is = new java.io.FileInputStream(fileName);    
  try {    
    final int bufsize = 4096;    
    int available = is.available();    
    byte data[] = new byte[available < bufsize ? bufsize : available];    
    int used = 0;    
    while (true) {    
      if (data.length - used < bufsize) {    
        byte newData[] = new byte[data.length << 1];    
        System.arraycopy(data, 0, newData, 0, used);    
        data = newData;    
      }    
      int got = is.read(data, used, data.length - used);    
      if (got <= 0) break;    
      used += got;    
    }    
    return charsetName != null ? new String(data, 0, used, charsetName)    
                               : new String(data, 0, used);    
  } finally {
    is.close();  
  }
}

2009-11-08

How to convert a Unix timestamp to a civil date

Here is how to convert a Unix timestamp to a Gregorian (western) civil date and back in Ruby:

# Convert a Unix timestamp to a civil date ([year, month, day, hour, minute,
# second]) in GMT.
def timestamp_to_gmt_civil(ts)
  t = Time.at(ts).utc
  [t.year, t.month, t.mday, t.hour, t.min, t.sec]
end

# Convert a civil date in GMT to a Unix timestamp.
def gmt_civil_to_timestamp(y, m, d, h, mi, s)
  Time.utc(y, m, d, h, mi, s).to_i
end

Here is a pure algorithmic solution, which doesn't use the Time built-in class:

# Convert a Unix timestamp to a civil date ([year, month, day, hour, minute,
# second]) in GMT.
def timestamp_to_gmt_civil(ts)
  s = ts%86400
  ts /= 86400
  h = s/3600
  m = s/60%60
  s = s%60
  x = (ts*4+102032)/146097+15
  b = ts+2442113+x-(x/4)
  c = (b*20-2442)/7305
  d = b-365*c-c/4
  e = d*1000/30601
  f = d-e*30-e*601/1000
  (e < 14 ? [c-4716,e-1,f,h,m,s] : [c-4715,e-13,f,h,m,s])
end

# Convert a civil date in GMT to a Unix timestamp.
def gmt_civil_to_timestamp(y, m, d, h, mi, s)
  if m <= 2; y -= 1; m += 12; end
  (365*y + y/4 - y/100 + y/400 + 3*(m+1)/5 + 30*m + d - 719561) *
      86400 + 3600 * h + 60 * mi + s
end
t = Time.now.to_i
fail if t != gmt_civil_to_timestamp(*timestamp_to_gmt_civil(t))

Please note that most programming languages have a built-in function named gmtime for doing timestamp_to_gmt_civil's work. Please note that Python has calendar.timegm for doing gmt_civil_to_timestamp's work. Please note that most programming languages have a built-in function named mktime for converting a local time to a timestamp (compare with gmt_civil_to_timestamp, which accepts GMT time as input).

Please be aware of the rounding direction of division for negative numbers when porting the Ruby code above to other programming languages. For example, Ruby has 7 / (-4) == -2, but many other programming languages have 7 / (-4) == -1.

The algorithmic solution above is part of the programming folklore. See more examples at http://www.google.com/codesearch?q=2442+%2F\+*7305

2009-11-07

AES encpytion in Python using C extensions

This blog post gives example code how to do AES encryption and decryption in Python. The crypto implementations shown are written in C, but they have a Python interface. For pure Python implementations, have a look at Python_AES in tlslite or SlowAES.

Example Python code using the AES implementation in alo-aes:

#! /usr/bin/python2.4
# by pts@fazekas.hu at Sat Nov  7 11:45:38 CET 2009

# Download installer from http://www.louko.com/alo-aes/alo-aes-0.3.tar.gz
import aes

# Must be 16, 24 or 32 bytes.
key = '(Just an example for testing :-)'

# Must be a multiple of 16 bytes.
plaintext = 'x' * 16 * 3

o = aes.Keysetup(key)
# This uses ECB (simplest). alo-aes supports CBC as well (with o.cbcencrypt).
ciphertext = o.encrypt(plaintext)

assert (ciphertext.encode('hex') == 'fe4877546196cf4d9b14c6835fdeab1a' * 3)
assert len(ciphertext) == len(plaintext)
assert ciphertext != plaintext  # almost always true
decrypted = o.decrypt(ciphertext)
assert plaintext == decrypted

print 'benchmarking'
plaintext = ''.join(map(chr, xrange(237)) + map(chr, xrange(256))) * 9 * 16
assert len(plaintext) == 70992
for i in xrange(1000):
  assert plaintext == o.decrypt(o.encrypt(plaintext))

Example Python code using the AES implementation in PyCrypto:

#! /usr/bin/python2.4
# by pts@fazekas.hu at Sat Nov  7 12:04:44 CET 2009
# based on
# http://www.codekoala.com/blog/2009/aes-encryption-python-using-pycrypto/

# Download installer from
# http://ftp.dlitz.net/pub/dlitz/crypto/pycrypto/pycrypto-2.1.0b1.tar.gz   
from Crypto.Cipher import AES

# Must be 16, 24 or 32 bytes.
key = '(Just an example for testing :-)'

# Must be a multiple of 16 bytes.
plaintext = 'x' * 16 * 3

o = AES.new(key)
# This uses ECB (simplest). PyCrypto aes supports CBC (with AES.new(key,
# aes.MODE_ECB)) as well.
ciphertext = o.encrypt(plaintext)

assert (ciphertext.encode('hex') == 'fe4877546196cf4d9b14c6835fdeab1a' * 3)
assert len(ciphertext) == len(plaintext)
assert ciphertext != plaintext  # almost always true
decrypted = o.decrypt(ciphertext)
assert plaintext == decrypted

print 'benchmarking'
plaintext = ''.join(map(chr, xrange(237)) + map(chr, xrange(256))) * 9 * 16
assert len(plaintext) == 70992
for i in xrange(1000):
  assert plaintext == o.decrypt(o.encrypt(plaintext))

Please note how similar the two example codes are. (Only the import and the object creation are different.)

On an Intel Centrino T2300 1.66GHz system, PyCrypto seems to be about 8% faster than alo-aes. The alo-aes implementation is smaller, because it contains only AES.

2009-11-03

How to use \showhyphens with beamer

When using \showhyphens with the beamer LaTeX class, the output of \showhypens doesn't appear on LaTeX's terminal output or in the .log file. The solution is to specify \rightskip=0pt, e.g.
\showhyphens{\rightskip0pt pseudopseudohypoparathyroidism}
results
Underfull \hbox (badness 10000) in paragraph at lines 9--9
[] \OT1/cmss/m/n/10.95 pseu-dopseu-do-hy-poparathy-roidism

2009-11-01

Tiny, self-contained C compiler using TCC + uClibc

This post presents how I played with a small C compiler and a small libc on Linux. I've combined TCC (the Tiny C Compiler) 0.9.25 and uClibc 0.9.26, compressed with UPX 3.03 to a tiny, self-contained C compiler for Linux i386. You don't need any external files (apart from the compiler executable) to compile (or interpret) C code, and the compiled executable will be self-contained as well.

Here is how to download and use it for compilation on a Linux x86 or x86_64 system:

$ uname
Linux
$ wget -O pts-tcc https://raw.githubusercontent.com/pts/pts-tcc/master/pts-tcc-0.9.26-uclibc-0.9.30.1
$ chmod +x pts-tcc
$ ls -l pts-tcc
-rwxrwxr-- 1 pts pts 349640 Nov  1 13:07 pts-tcc
$ wget -O example1.c https://raw.githubusercontent.com/pts/pts-tcc/master/pts-tcc/example1.c
$ cat example1.c
#! ./pts-tcc -run
int printf(char const*fmt, ...);
double sqrt(double x);
int main() {
  printf("Hello, World!\n");
  return sqrt(36) * 7;
}
$ ./pts-tcc example1.c
$ file a.out
a.out: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, not stripped
$ ls -l a.out
-rwxrwxr-x 1 pts pts 17124 Nov  1 13:17 a.out
$ ./a.out; echo "$?"
Hello, World!
42
$ strace -e open ./pts-tcc example1.c
open("/proc/self/mem", O_RDONLY)        = 3
open("/proc/self/mem", O_RDONLY)        = 3
open("example1.c", O_RDONLY)            = 3
open("/proc/self/mem", O_RDONLY)        = 3
open("/proc/self/mem", O_RDONLY)        = 3
open("a.out", O_WRONLY|O_CREAT|O_TRUNC, 0777) = 3
$ strace ./a.out
execve("./a.out", ["./a.out"], [/* 47 vars */]) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
write(1, "Hello, World!\n", 14Hello, World!
)         = 14
_exit(42)                               = ?

As you can see above, my version the compiler is less than 250k in size, and it doesn't need any library files (such as libc.so) for compilation (as indicated by the strace output above), and the compiled binary doesn't need library files either.

TCC can run the C program without creating any binaries:

$ rm -f a.out
$ ./pts-tcc -run example1.c; echo "$?"
Hello, World!
42
$ head -1 example1.c
#! ./pts-tcc -run
$ chmod +x example1.c
$ ./example1.c; echo "$?"
Hello, World!

TCC is a very fast compiler (see the speed comparisons on it web site), it is versatile (it can compile the Linux kernel), but it doesn't optimize much: the code it produces is usually larger and slower than the output of a traditional, optimizing C compiler such as gcc -O2.

The shell script pts-tcc-0.9.26-compile.sh builds TCC from source, adding uClibc, and compressing with UPX.

Update: TCC 0.9.25 and uClibc 0.9.29 (or older)

Please note that TCC 0.9.25 is buggy, don't use it!

$ wget -O pts-tcc https://raw.githubusercontent.com/pts/pts-tcc/master/pts-tcc/pts-tcc-0.9.25
$ chmod +x pts-tcc
$ ls -l pts-tcc
-rwxr-xr-x 1 pts eng 241728 Apr  8 16:53 pts-tcc

Update: TCC 0.9.25 and uClibc 0.9.30.1

Please note that TCC 0.9.25 is buggy, don't use it!

$ wget -O pts-tcc https://raw.githubusercontent.com/pts/pts-tcc/master/pts-tcc-0.9.25-uclibc-0.9.30.1
$ chmod +x pts-tcc
$ ls -l pts-tcc
-rwxr-xr-x 1 pts eng 336756 Apr  8 16:53 pts-tcc

Update: TCC 0.9.26 and uClibc 0.9.30.1

$ wget -O pts-tcc https://raw.githubusercontent.com/pts/pts-tcc/master/pts-tcc-0.9.26-uclibc-0.9.30.1
$ chmod +x pts-tcc
$ ls -l pts-tcc
-rwxr-xr-x 1 pts eng 349640 Apr  8 16:53 pts-tcc

2009-10-25

How to enable autologin in Ubuntu Jaunty with GDM

To enable automatic login in Ubuntu Jaunty running GDM (default) without clicking, add something like this to the [daemon] section of /etc/gdm/gdm.conf-custom:
TimedLoginEnable=true
TimedLoginDelay=10
TimedLogin=myusername
Once done, restart gdm (this will force an immediate logout):
$ sudo /etc/init.d/gdm restart

2009-10-23

Using the Hama MCE remote on Linux with MPlayer

This blog post describes how to control your MPlayer media playback using the Hama MCE remote control on Linux. The instructions and software were tested on Ubuntu Jaunty, but they should work on any Linux 2.6 system which has sysfs mounted and which has USB HID support compiled to the kernel (can be checked with ls -l /sys/bus/usb/drivers/usbhid).

The instructions are for the model Hama MCE Remote 52451, also known as VRC-1100, but similar instructions might work with other remotes as well.

If you want to use the Hama MCE with XBMC, see these instructions.

The Hama MCE dongle, when connected to the USB port of a Linux machine, registers itself as two USB HID devices: a keyboard and a mouse. Most of its buttons generate regular keyboard events, for example, the numeric keys correspond to numpad keys, and its Enter key corresponds to the Enter key. You can use its mouse controls to move the mouse or click (the left and right buttons). On Ubuntu Jaunty, even the volume buttons work (they adjust the master volume).

However, if you don't want to use your Hama MCE remote as a regular keyboard or mouse, but you'd like to control MPlayer (and possibly some few other applications you specify) with it, then you need special software. LIRC, the de facto standard remote control driver and server does support USB HID devices in general, but the Hama MCE sends some quite unusual events which LIRC seems to be impossible to make recognize. (For example, the hashmark button sends Shift, 3 and 5, and some other buttons send Shift or Control too, and LIRC doesn't seem to be able to track the Shift and Control state, which would be needed to distinguish some buttons from each other.)

So I've implemented my own lircd, hama_mce_lircd.py, which can read button presses and other events from the Hama MCE remote, and it can broadcast them to applications via the socket /dev/lircd, using the traditional lircd protocol. The link above contains installation, usage and configuration instructions for controlling MPlayer with Hama MCE, using hama_mce_lircd.py.

Each button works and can be bound to any MPlayer input.conf command, with the following limitations (of the dongle hardware):

  • the ok and enter buttons are the same;
  • the play and pause buttons are the same;
  • the right click and the info buttons are the same;
  • most buttons don't repeat when held down for a long time.

An alternative and more generic Python Linux event reading and mangling library with Hama MCE support is available at http://github.com/rmt/pyinputevent/.

Here is how to use hama_mce_lircd.py with MPlayer:

$ wget -O /tmp/lircrc.hama \
  http://pts-mini-gpl.googlecode.com/svn/trunk/hama-mce-linux/lircrc
$ wget -O /tmp/hama_mce_lircd.py \
  http://pts-mini-gpl.googlecode.com/svn/trunk/hama-mce-linux/hama_mce_lircd.py
$ sudo killall lircd
$ sudo mkdir -p /var/run/lirc
$ sudo python /tmp/hama_mce_lircd.py /var/run/lirc/lircd
(connect the USB dongle, wait 3 seconds, and watch the script detect it)
(keep it running, open another terminal window)
$ mplayer -lircconf /tmp/lircrc.hama ${VIDEO_FILENAME}

The most important Hama MCE remote keys supported by lircrc.hama: Pause, Stop, Volume Up, Volume Down, Channel Plus (to speed up playback), Channel Minus (to slow down playback), Info (for showing/hiding OSD with the time indicator) the big round button (for seeking), the up and down arrows (for adjusting the subtitle delay).

2009-10-22

Design and print your own geek clock with LaTeX and TikZ

Would you like to have your custom clock for geeks (geek clock, math clock)? You can
design and print the front plate of your own analog geek clock using LaTeX and TikZ. Here is how I did it (download):

% geek_clock.tex
% by pts@fazekas.hu at Thu Oct 22 14:24:35 CEST 2009
\documentclass[a4paper]{article}
\usepackage{tikz}
\usepackage[latin2]{inputenc}
\usepackage{t1enc}
\usepackage{lmodern}

\pdfpagewidth=\paperwidth
\pdfpageheight=\paperheight

\newenvironment{fullpage}{%
\shipout\vbox\bgroup\kern-1in\moveleft1in\vbox to\paperheight\bgroup
\parindent0pt\hsize\paperwidth
}{%
\vfil\egroup\egroup
}

\begin{document}

% Define a few constants for easy configuration
\def\framehalf{9.2cm}
\def\radius{8.8cm}
\def\onedegrad{8.6cm}
\def\fivedegrad{8.5cm}
\def\tendegrad{8.2cm}
\def\labelrad{8.3cm}

\begin{fullpage}
\vfil\noindent\hfil
\begin{tikzpicture}[scale=1]
\draw (-\framehalf,-\framehalf) --
(\framehalf,-\framehalf) --
(\framehalf,\framehalf) --
(-\framehalf,\framehalf) -- cycle;
\draw (0,0) circle (\framehalf);
\draw (0,0) circle (\radius);
\draw[fill=black] (0,0) circle (2.5mm);
\node[draw, circle, inner sep=.2mm] (a) at (0,0) {};
% main lines
\foreach \x in {0, 6,...,360}
\draw[line width=1.5pt] (\x:\onedegrad) -- (\x:\radius);
% labels and longer lines at every 6 degrees
\foreach \x in {30, 60, ..., 360} {
\draw[line width=4pt] (\x:\tendegrad) -- (\x:\radius);
}
\node[scale=3,anchor=north east] at (450-1*30:\labelrad)
{$B'_L$\kern-1ex};
\node[scale=2.5,anchor=east] at (450-2*30:\labelrad)
{\lower8ex\hbox{$\displaystyle \sum_{i=0}^{\infty}1/2^i$\kern-2ex}};
\node[scale=3,anchor=east] at (450-3*30:\labelrad)
{$7\odot-5$};
\node[scale=3,anchor=south east] at (450-4*30:\labelrad)
{$\lceil\pi\rceil$};
\node[scale=3,anchor=south east] at (450-5*30:\labelrad)
{$(2\varphi-1)^2$\kern-4ex};
\node[scale=3,anchor=south] at (450-6*30:\labelrad)
{$3!$};
\node[scale=3,anchor=south west] at (450-7*30:\labelrad)
{$\!6.\overline{9}$};
\node[scale=4,anchor=west] at (450-8*30:\labelrad)
{\raise1ex\hbox{${\bullet}{\circ}{\circ}{\circ}$}};
\node[scale=2.5,anchor=west] at (450-9*30:\labelrad)
{$\lfloor(7\cdot.1\!-\!.7)^{\textrm{-6e-2}}\rfloor$};
\node[scale=2.5,anchor=west] at (450-10*30:\labelrad)
{\lower6ex\hbox{$\displaystyle{5\choose2}$}};
\node[scale=2.5,anchor=north west] at (450-11*30:\labelrad)
{\lower3ex\hbox{\kern-3ex$7^5\mathop{\mathrm{mod}} 13$}};
\node[scale=3,anchor=north] at (450-12*30:\labelrad)
{$\sqrt[3]{1728}$};
\end{tikzpicture}
\end{fullpage}

\end{document}
I used pdflatex and TikZ in TeX Live 2008 to compile the file above to a PDF file, which I printed, cut and glued to a stock clock.

The PDF version of the clock face is available here. It looks like this:

References:

2009-10-21

Screen blanking, DPMS, screen saver control and timeout settings on X11

This blog post explains how to blank and unblank the screen manually, how to activate and deactivate display power saving (DPMS), and how to specify inactivity timeouts for screen blanking on X11. The examples were tried with Ubuntu Jaunty, X.Org 7.4 and icewm. This blog post doesn't explain how to make your changes permanent after logout, or how to change your GNOME settings.

The results below may be distorted if gnome-screensaver is running. To get rid of it, do this:
$ gconftool-2 --type boolean -s /apps/gnome_settings_daemon/screensaver/start_screensaver false
$ killall gnome-screensaver
Without setting start_screensaver to false above, gnome-settings-daemon would restart gnome-screensaver whenever a GNOME application gets started. (Firefox a GNOME Terminal is GNOME applications in Ubuntu Jaunty, but icewm and xterm aren't.)

X11 supports two distinct features for power saving when the user is away from the computer: screen blanking and power saving. Confusingly enough, the man page for xset uses the term screen saver instead of screen blanking. In our terminology, screen blanking is a feature to display a blank (solid black) screen after long enough user inactivity. However, screen saver is a program which gets started upon long enough user inactivity, which draws some simple animation on the screen, and asks the user's password before letting him back to his screen and applications. Power saving (DPMS turns the display off or puts it to some power-saving mode after long enough user inactivity. In power saving mode the display doesn't show anything, but it consumes much less power, see the DPMS link above for typical watt values for the power-saving modes standby, suspend and off (and also in the mode on, when power saving is inactive). User inactivity means that the user doesn't use the keyboard or the mouse for a configurable number of seconds. The typical default inactivity timeouts are around 2 minutes. By default, the X server disables screen blanking and power saving as soon as the user presses a key or moves the mouse. From now on we assume that there is no screen saver running, or its timeout is set to be high enough so it won't ever trigger. Screen blanking and power saving are independent features, they have to be configured independently.

How to set up the timeouts

  • To disable screen blanking, run xset s off
  • To make the screen blank after 600 seconds of user inactivity, run xset s 600
  • To put the display to standby after 100 seconds, to suspend after 200 seconds, and turn it off after 300 seconds, run xset dpms 100 200 300
  • To disable power saving, run xset dpms 0 0 0
  • To make the screen blanking display thee X11 logo on a blue background instead of displaying a solid black screen, run xset s noblank
  • To make the screen blanking display a solid black screen (default), run xset s blank

How to change the screen state immediately

  • To blank the screen, run xset s activate
  • To unblank the screen, run xset s reset
  • To turn the screen off, run xset dmps force off
  • To activate the suspend power-saving mode (good savings, good resume time), run xset dpms force suspend
  • To activate the standby power-saving mode (minimal savings, very quick resume time), run xset dpms force standby
  • To turn the display on, run xset dpms force on; xset s reset (the latter is necessary because the screen was blanked automatically when power saving was activated).

How to prevent screen blanking and power saving in MPlayer

Most media player applications disable blanking and power saving while they are playing a video. To enable this functionality in MPlayer, start it with mplayer -stop-xscreensaver or add the line stop-xscreensaver=1 to your ~/.mplayer/config . By doing so, MPlayer will simulate user activity (just as if the user has moved the mouse) upon startup and then each 30 seconds. So this will prevent the screen from blanking or going to power saving mode if all your timeouts are larger than 30 seconds. Please note that the name of the flag is confusing, because its effect isn't related to screen savers, it just simulates user activity. If a screen saver is running, it may or may not pick up the simulated activity generated by MPlayer. To prevent screen saver activation in this case, you may have to run mplayer -heartbeat-cmd "xscreensaver-command -deactivate" or mplayer -heartbeat-cmd "gnome-screensaver-command -p" .

2009-10-04

JavaScript and ActionScript performance for big integer multiplication

This blog post documents the speed measurement of various JavaScript and ActionScript (and other) implementations multiplying 2048-bit big integers. I needed big integer multiplication for the Diffie-Hellman key exchange in one of my projects.

Since neither JavaScript nor ActionScript has big integer support built in, I had to implement it. I represent a big integer as a JavaScript array of digits in base 32768, in the base To multiply two 2048-bit integers use Karatsuba multiplication, which does 3 multiplications with 1024-bit operand size. To multiply two 1024-bit integers, I use Karatsuba multiplication (again), which does 3 multiplications with 512-bit operand size. To multiply two 512-bit integers, I use traditional O(n^2) ``schoolbook'' multiplication. (The Karatsuba multiplication would have been slower for this size.) All JavaScript and ActionScript code is heavily optimized, constants are inlined, and functions are instantiated for different operand sizes.

I did at least 500 multiplications of 2048-bit operand size. Here is the average number of milliseconds needed for 1 multiplication in the various implementations on an Intel Core 2 Duo 1.5 GHz machine running 32-bit Unbuntu Hardy:
  • 00.0156 ms: Python (C) 2.4 gmpy (mpz of libgmp)
  • 00.0396 ms: Java 1.6.0_16 BigInteger
  • 00.0516 ms: Ruby (C) 1.9.1 Bignum
  • 00.0650 ms: Python (C) 2.4 long
  • 01.3030 ms: JavaScript Google Chrome V8 1.1.1.4
  • 02.1750 ms: ActionScript 3 for Flash Player 10
  • 02.7120 ms: JavaScript Google Chrome V8 1.1.1.4 using BigInt.js' mult()
  • 04.5150 ms: JavaScript Firefox 3.5 TraceMonkey from Mercurial on 2009-10-04
  • 05.0400 ms: JavaScript Firefox 3.5.2 TraceMonkey
  • 05.2860 ms: JavaScript Firefox 3.0.14 TraceMonkey (?) JavaScript
  • 06.9050 ms: ActionScript 3 for Flash Player 9 on Flash Player 9
  • 08.5320 ms: ActionScript 3 for Flash Player 9 on Flash Player 10
  • 10.2900 ms: JavaScript SpiderMonkey command-line interpreter 1.7.0 2007-10-03
The speed tests were run on multiple 32 and 64-bit Debian and Ubuntu installations, and the ratios came out almost the same as above.

The JavaScript and ActionScript (and some other) sources are avaialble at http://code.google.com/p/pts-mini-gpl/source/browse/#svn/trunk/javascript-multiplication.

Conclusion: If you need big integer arithmetic in the browser, detect Java and use its BigInteger from JavaScript. Otherwise, if the browser is Google Chrome, then do it in JavaScript. Otherwise, if there is Flash Player 10, do it in ActionScript 3. Otherwise, use JavaScript (and suggest installation of Java or Flash to the user).

How to create an ActionScript 3 (AS3) flash movie (SWF) without a GUI

This blog post explains how to create a Flash movie completely automatically, without using a GUI, just by writing a program in the language ActionScript 3 (AS3). The post gives instructions for Linux, but they should work with minor modifications on other operating systems as well since the compilers to be used are either written in Java or they are available for multiple systems.

ActionScript is a language similar to JavaScript, with some differences and incompatibilities. The most important difference is that you can declare types of variables (strongly recommended in ActionScript 3), and it has class-based inheritance, so classes are declared and use differently from JavaScript, more similarly to Java. To get a quick introduction to ActionScript, read http://en.wikipedia.org/wiki/ActionScript. The reference manual is hosted by Adobe, at http://livedocs.adobe.com/flash/9.0/ActionScriptLangRefV3/. The filename extension is .as.

ActionScript has two major versions in used today: ActionScript 2 (AS2) is supported by Flash Player 7, 8, 9 and 10. ActionScript 3 (AS3) is supported by Flash Player 9 and 10. ActionScript 2 is about 3.5 times (or even more) slower than the JavaScript interpreter in Firefox 3.0 (SpiderMonkey). ActionScript 3, as implemented in Flash Player 9, has just about the same speed as Firefox 3.0 JavaScript. ActionScript 3, as implemented in Flash Player 10, can be up to 4 times faster than FireFox 3.0 JavaScript if int is used instead of Number for integer computations, and Vector.<int> is used instead of Vector. Both Flash Players 9 and 10 have a JIT engine that compiles ActionScript to native code, but this doesn't seem to become faster than Firefox 3.0 JavaScript in some number-crunching applications.

It is possible to call a JavaScript function from ActionScript and vice versa, using the ExternalInterface ActionScript class (flash.external.ExternalInterface). Please note that ExternalInterface doesn't work for the file:/// protocol due to security reasons (as documented by Adobe here) – you have to upload both your .swf and .html files to the same web server to make use of ExternalInterface.

To compile ActionScript 2 to an SWF file, the free MTASC compiler is recommended, which is very fast, works at the command line, and is said to be safer than the compilers written by Adobe. An example ActionScript2 script:
// Tuto2.as, an example ActionScript 2 script
import flash.external.ExternalInterface;
class Tuto2 {
static var app : Tuto2;
function Tuto2() {
_root.createTextField("tf",0,0,0,800,200);
_root.tf.mutliline = true;
_root.tf.text = "first";
var myformat = new TextFormat();
myformat.color = 0x00000;
myformat.underline = true;
myformat.size = 30;
// Set this, because the default font, "Times New Roman" may not be available on Linux.
myformat.font = "serif";
_root.tf.setTextFormat(myformat);
_root.tf.text = ExternalInterface.call + "\nHello";
// Call this again after changing the text, otherwise the formatting is lost.
_root.tf.setTextFormat(myformat);
}
static function main(mc) { // Entry point.
app = new Tuto2();
}
}
To compile it, download MTASM first:
$ wget http://www.mtasc.org/zip/mtasc-1.12-linux.tgz
$ mkdir mtasc
$ cd mtasc
$ tar xzvf ../mtasc-1.12-linux.tgz
$ cp .../Tuto2.as .
Then compile with:
$ ./mtasc -version 8 -swf tuto2.swf -main -header 800:200:20 Tuto2.as
The corresponding HTML would look like this for Firefox:
<embed src="tuto2.swf"
quality="high" bgcolor="#eeeeee" fgcolor="#000000" width="800" height="2
name="tuto" id="tuto" align="middle" allowScriptAccess="always"
allowFullScreen="true" type="application/x-shockwave-flash"
pluginspage="http://www.macromedia.com/go/getflashplayer" />
To make it cross-browser, use SWFObject to embed it. Once created, load the HTML in your browser to view the Flash animation you've just created.

From now on we focus on ActionScript 3 and Flash Player 10. Adobe provides the Flex SDK, which contains a compiler (called mxmlc, or mxmlc.exe on Windows) which can compie ActionScript 3 .as source files to .swf. The Flex SDK is free to use and is partially open source, and it is implemented in Java. Download version 3.4 (which is the most recent stable version at the time of writing this blog post, and is about 120 MB compressed) from http://opensource.adobe.com/wiki/display/flexsdk/Downloads. Also make sure you have Java (at least a JRE) installed. Here are the corresponding download command-lines:
$ wget http://fpdownload.adobe.com/pub/flex/sdk/builds/flex3/flex_sdk_3.4.0.9271.zip
$ mkdir flex
$ cd flex
$ unzip ../flex_sdk_3.4.0.9271.zip
$ bin/mxmlc -help
Here is a sample ActionScript 3 program:
// Tuto3.as, a sample ActionScript3 program
package {
import flash.display.Sprite;
import flash.events.Event;
import flash.text.TextField;
import flash.text.TextFormat;
import flash.utils.getTimer;
import flash.external.ExternalInterface;
public class Tuto3 extends Sprite {
function Tuto3() : void {
var circle : Sprite = new Sprite();
circle.graphics.beginFill(0xFF7744);
circle.graphics.drawCircle(0, 0, 30);
circle.graphics.endFill();
addChild(circle);
//myInit();
//addEventListener(Event.ADDED_TO_STAGE, myInit);
ExternalInterface.addCallback("myFunction", myInit);
}
public function myInit(e : Event = null) : String {
var startTime : Number = getTimer();
var tField : TextField = new TextField();
var tFormat : TextFormat = new TextFormat();
tFormat.size = 20;
// Set this, because the default font, "Times New Roman" may not be available on Linux.
tFormat.font = "serif";
tField.autoSize = "left";
tField.background = true;
tField.border = true;
tField.multiline = true;
tField.x = 20;
tField.y = 40;
tField.text = "hello1";
tField.setTextFormat(tFormat);
var endTime : Number = getTimer();
trace('hello2 ' + startTime + ' ' + (endTime - startTime));
tField.text = 'ms=' + (endTime - startTime);
// Call this again after changing the text, otherwise the formatting is lost.
tField.setTextFormat(tFormat);
addChild(tField);
}
}
}
Here is how to compile it (creates Tuto3.swf):
$ ./bin/mxmlc -target-player 10 -default-size 800 200 -optimize Tuto3.as
Here is the corresponding HTML:
<html><head><script type="text/javascript">
function myonload() {
// document.tuto for Firefox, window.tuto for Internet Explorer
var tuto = document.tuto || window.tuto || document.getElementById("tuto");
if (tuto && tuto.myFunction)
document.getElementById("result").innerHTML = tuto.myFunction();
}
</script></head><body onload="myonload()">
<embed src="Tuto3.swf"
quality="high" bgcolor="#eeeeee" fgcolor="#000000" width="800" height="200"
name="tuto" id="tuto" align="middle" allowScriptAccess="always"
allowFullScreen="true" type="application/x-shockwave-flash"
pluginspage="http://www.macromedia.com/go/getflashplayer" />
<div id="result">no result yet</div>
</body></html>
The HTML above also demonstrates calling an ActionScript method form JavaScript. To call a JavaScript function from ActionScript, do
ExternalInterface.call("JavaScriptFunctionName", arg1, ...)
Don't forget to copy the .html and the .swf to a http:// location, otherwise ExternalInterface won't work because the security limitations imposed by the Flash Player.

Please note that according to Adobe, AVM2, the virtual machine in Flash Player 9 and 10 for ActionScript 3 execution does a JIT for all functions and methods except for constructors. So don't put performance-critical code to the constructor. I couldn't measure any speed difference in the constructor and other methods. The last statement that the constructor is not JITted I could find is from 2007 (http://blog.pixelbreaker.com/flash/as30-jit-vs-interpreted/).