2015-12-05

Announcing flickrurlget: Flickr photo downloader from command-line in batch

This blog post is an announcement of flickrurlget, a command-line tool for Unix, written in Python 2.x that can be used to download photos from Flickr in batch. flickrurlget itself doesn't download photos, but it generates a list of raw photo URLs which can be downloaded with a download manager (even with `wget -i').

Download and install from source.

There are many photo download tools for Flickr (including command-line tools, GUI tools, Firefox extensions and Chrome extensions), none of which I tried having the feature set of flickrurlget:

  • Can get highest-resolution (if possible, original) image URLs in:
    • a Flickr photo
    • a photostream of a Flickr user (i.e. all photos of the user)
    • the favorite photos of a Flickr user
    • an album (photoset) of a Flickr user
    • a Flickr group
    • a gallery of a Flickr user
  • Batch operation: started from the command-line with a bunch of Flickr page URLs, and it discovers all the photos in there without any interaction.
  • Can get anybody's photos (not only your own).
  • Can get non-public and adult (non-safe) photos as well (after loggig in).
  • Doesn't need a GUI or a web browser for most of the operations.

There is the command-line tool flickr_download (source code and PyPi projet page), which is also written in python. Here is how flickrurlget is better than flickr_download:

  • Takes Flickr page URLs and figures out all parameters automatically.
  • Can download groups and galleries (in addition to user photostreams and photosets) as well.
  • Has user ID, group ID and photoset ID discovery: the user doesn't have to do manual ID lookups before running the tool.
  • Uses only standard Python modules, no other dependencies.
  • Better compatibility with old Python: works with 2.7, 2.6, 2.5 and 2.4; not only 2.7.

Flickr has a nice, full-featured REST API (works with both XML and JSON), its documentation is available here. flickrurlget uses this API. It can also optionally OAuth to log the user in.

2015-11-27

How to compute the intersection of two sorted lists (in Python)

This blog post explains how to compute the sorted intersection of two sorted lists, and it shows a fast Python implementation. The time complexity is O(min(n + m, n · log(m)), where n is the minumum of the list lengths, and m is the maximum of the list lengths.

The first idea is to take the first element of both lists, and, if they are different, discard the smaller one of the two. (The smaller one can't be in the intersection because it's smaller than all elements of the other list.) If the first elements were equal, then emit the value as part of the intersection, and discard both first elements. Continue this until one of the lists run out. If discarding is implemented by incrementing index variables, this method finishes in O(n + m) time, because each iteration discards at least one element.

We can improve on the first idea by noticing that it's possible to discard multiple elements in the beginning (i.e. when the two first elements are different): it's possible to discard all elements which are smaller than the larger one of the two first elements. Depending on the input, this can be a lot of elements, for example if all elements of the first list are smaller than all elements of the second list, then it will discard the entire first list in one step. In the general case, binary search can be used figure out how many elements to discard. However, binary search takes logarithmic time, so the total execution time is O(m · log(m)), which can be faster or slower than the O(n + m) of the previous solution. In fact, by more careful analysis of the number of runs (blocks which are discarded), it's possible to show that the execution time is just O(n · log(m)), but that can still be slower than the previous solution.

It's possible to combine the two solutions: estimate the execution time of the 2nd solution, and if the estimate is smaller than n + m, execute the 2nd solution, otherwise execute the first solution. Please note that the estimate also takes into account the constants (not only big-O). The resulting Python (≥2.4 and 3.x) code looks like:

def intersect_sorted(a1, a2):
  """Yields the intersection of sorted lists a1 and a2, without deduplication.

  Execution time is O(min(lo + hi, lo * log(hi))), where lo == min(len(a1),
  len(a2)) and hi == max(len(a1), len(a2)). It can be faster depending on
  the data.
  """
  import bisect, math
  s1, s2 = len(a1), len(a2)
  i1 = i2 = 0
  if s1 and s1 + s2 > min(s1, s2) * math.log(max(s1, s2)) * 1.4426950408889634:
    bi = bisect.bisect_left
    while i1 < s1 and i2 < s2:
      v1, v2 = a1[i1], a2[i2]
      if v1 == v2:
        yield v1
        i1 += 1
        i2 += 1
      elif v1 < v2:
        i1 = bi(a1, v2, i1)
      else:
        i2 = bi(a2, v1, i2)
  else:  # The linear solution is faster.
    while i1 < s1 and i2 < s2:
      v1, v2 = a1[i1], a2[i2]
      if v1 == v2:
        yield v1
        i1 += 1
        i2 += 1
      elif v1 < v2:
        i1 += 1
      else:
        i2 += 1

The numeric constant 1.4426950408889634 in the code above is 1/math.log(2).

The code with some tests and with support for merging multiple sequences is available on GitHub here.

Related questions on StackOverflow: this and this.

2015-03-20

How to use rsync over USB on Android with adb

This blog post explains how to copy files using rsync between the computer running Unix (typically Linux or Mac OS X) and the mobile device running Android, using an USB cable. It's not necessary to root the device. It's not necessary to install any app.

If you want to copy over wifi rather than USB, then please use the app rsync backup for Android (rsync4android) instead. The rest of this tutorial describes a method which needs the computer and the mobile device connected with a USB data cable.

Enable USB debugging on the device. If you don't know how to do it in Settings, then find a tutorial online.

Install adb (Android Debug Bridge) to the Unix system. For example, on Ubuntu: sudo apt-get install android-tools-adb

Connect the mobile device to the computer. Run: adb shell id on the computer. (All commands should be run on the computer unless asked otherwise.) It should display something like:

$ adb shell id
uid=2000(shell) gid=2000(shell) groups=1004(input),... context=u:r:shell:s0

If it doesn't work, you may have to enable Settings / Developer options / USB debugging on the device, then reconnect, then click OK on the dialog box in the device, then rerun adb shell id on the computer.

Please note that on Cyanogenmod rsync is installed by default and it is on the $PATH, so you can skip some of the steps below. (If you don't know what to skip, just do everything anyway.)

Download the rsync binary: wget -O rsync.bin http://github.com/pts/rsyncbin/raw/master/rsync.rsync4android

Copy the rsync binary to the device: adb push rsync.bin /data/local/tmp/rsync

Make the rsync binary on the device executable: adb shell chmod 755 /data/local/tmp/rsync

Make sure you have a backup copy of the binary in a more permanent directory: adb shell cp /data/local/tmp/rsync /sdcard/rsync.bin

Get the rsync version by running adb shell /data/local/tmp/rsync --version . Typical output:

$ adb shell /data/local/tmp/rsync --version
rsync  version 3.1.1  protocol version 31
Copyright (C) 1996-2014 by Andrew Tridgell, Wayne Davison, and others.
Web site: http://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 32-bit timestamps, 64-bit long ints,
    no socketpairs, hardlinks, symlinks, no IPv6, batchfiles, inplace,
    append, no ACLs, no xattrs, no iconv, no symtimes, no prealloc

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.

Run rsync --version . If you get something like

rsync: command not found

, then rsync isn't installed to the computer. On Ubuntu you can install it with sudo apt-get install rsync

Create an rsyncd.conf config file and install it to the device by running: adb shell 'exec >/sdcard/rsyncd.conf && echo address = 127.0.0.1 && echo port = 1873 && echo "[root]" && echo path = / && echo use chroot = false && echo read only = false' . This must finish without displaying any (error) message.

Start the rsync daemon in the device by running: adb shell /data/local/tmp/rsync --daemon --no-detach --config=/sdcard/rsyncd.conf --log-file=/proc/self/fd/2 . It must start up with just a single message:

2015/03/19 15:58:14 [5814] rsyncd version 3.1.1 starting, listening on port 1873

Keep it running for the duration of the copies (below), and continue working in another terminal window. Or press Ctrl-C to exit right now, and restart it (so it will start running in the background on the device) like this: adb shell '/data/local/tmp/rsync --daemon --config=/sdcard/rsyncd.conf --log-file=/data/local/tmp/foo &'

Start port forwarding by running: adb forward tcp:6010 tcp:1873

Now you can start copying files with rsync (back and forth). An example command: rsync -av --progress --stats rsync://localhost:6010/root/sdcard/Ringtones .

You may find the --size-only flag useful if rsync is copying the same files over and over again.

You may want to copy from or to /storage/sdcard1 instead of /sdcard on the device.

Some newer storage devices have the exfat filesystem (older ones typically have fat or some emulation of it, and that's just fine). Writing to exfat drives rsync -av crazy: it reports steady progress with lots of Operation not permitted errors, but it actually doesn't create any files. This applies both rsync running on the Linux computer and rsync running on the device. A solution is to replace rsync -av with rsync -vrtlD , and restart the copy.

2015-03-03

How to avoid data copies with move semantics in C++11

This blog post explains how to avoid data copies in assignment from temporary values in C++. The move assignment operator (a feature introduced in C++11) will be defined for the class, and it will get called instead of the copy assignment operator, and the copy will be avoided.

Let's consider std::string, a type whose values are expensive to copy (assuming that the implementation copies the entire string data, not just a pointer to t buffer). Both the copy constructor and the copy assignment operator (operator=) copy the old data from the new data, like this for the copy assignment operator:

#include <string.h>
namespace std {
string &operator=(const string &other) {
  resize(other.size());
  memcpy(&(*this)[0], other.data(), other.size() + 1);
  return *this;
}
}

Let's assume that we have a function which returns a string: std::string GetUserName();. We can call this function and save the result to a variable: std::string user_name = GetUserName();. (It also works the same way with const in the beginning.) How many times does the value have to be copied until it lands in the variable user_name? Most modern compilers do the return value optimization to avoid all copies (so no copy constructor and no copy assignment operator is run). But if we already have the variable std::string user_name; and we want the assignment user_name = GetUserName(); avoid copies, then we need to define a move assignment operator (taking an rvalue reference (&&) argument instead of a const reference (const&) argument), and the assignment above will use the move assignment operator, which is faster than the copy assignment operator, because it can steal the resources from the source. An example implementation:

#include <string.h>
namespace std {
string &operator=(string &&other) {
  capacity_ = other.capacity_;
  size_ = other.size_;
  data_ = other.data_;  // Copies just the pointer.
  other.capacity_ = other.size_ = 0;
  other.data_ = nullptr;
  return *this;
}
}

There is also a corresponding move constructor which can be called instead of the copy constructor to avoid the copy. It works even if the return value optimization cannot be applied (e.g. when the function body has both return a; and return b;).

Let's see a more detailed example which has all these:

  • copy constructor (*C): C(const C&)
  • move constructor (&C): C(&&) (only for C++11)
  • copy assignment operator (=C): C &operator(const C&)
  • move assignment operator (#C): C &operator(C &&) (only for C++11)
#include <stdio.h>

class C {
 public:
  explicit C(int v) { printf("+C %d\n", v); }
  ~C() { puts("~C"); }
  C(const C&) { puts("*C"); }
  C &operator=(const C&) { puts("=C"); return *this; }
#if __GXX_EXPERIMENTAL_CXX0X__ || __cplusplus >= 201100
  C(C&&) { puts("&C"); }
  C &operator=(C &&) { puts("#C"); return *this; }
#endif
};

static inline C C10(int v) {
  return C(v * 10);
}

int main(int argc, char **argv) {
  (void)argc; (void)argv;
  C ca = C10(11);
  puts("---R1");
  C cb(22);
  puts("---R2");
  cb = C10(33);
  puts("---R3");
  return 0;
}

We can compile it for C++98 (older C++ standard than C++11) and run it:

$ g++ -W -Wall -Wextra -Werror -s -O2 -ansi -pedantic test_assign_with_move_semantics.cc && ./a.out
+C 110
---R1
+C 22
---R2
+C 330
=C
~C
---R3
~C
~C

And for C++11:

$ g++ -W -Wall -Wextra -Werror -s -O2 -std=c++0x -pedantic test_assign_with_move_semantics.cc && ./a.out
+C 110
---R1
+C 22
---R2
+C 330
#C
~C
---R3
~C
~C

The only difference is =C has changed to #C when C++11 features were enabled. That's because the body of the #if in the code gets compiled only for C++11 and above, and this body contains the move assignment operator. If there is a move assignment operator (e.g. in our C++11 version), then the line cb = C10(33); will use it, otherwise (e.g. in our C++98 version) that line will use the copy assignment operator.

Where do the actual data copies occur? In the copy assignment operator (#C) and in the copy constructor (*C, not called at all in the example). By defining a move assignment operator in C++11, we can prevent the copy assignment operator from getting called, thus we can avoid a copy when assigning from a temporary (rvalue).

Please note that the return value optimization avoids the copy in the C ca = C10(11); statement. This works even in C++98, without the move constructor.

In C++98, a copy can be avoided by using swap at the call site, for example if the caller replaces user_name = GetUserName(); with { std::string tmp = GetUserName(); user_name.swap(tmp); }, then the copy will be avoided: the definition of tmp takes advantage of the return value optimization, and swap swaps only the pointers and the sizes, not the actual data.

2015-02-12

How to generate all subsets of size k in Java

This blog post shows a Java class to generate all subsets of size k of the set {0, 1, ..., n - 1}, in lexicographic order. The code uses O(k) memory, and it doesn't store multiple subsets at the same time in memory. This is achieved by implementing the Iterable interface.

import java.util.Iterator;
import java.util.NoSuchElementException;

/**
 * Generate all subsets of {0, ..., n-1} of size k, in lexicographically
 * increasing order.
 */
public class Fixsub implements Iterator<int[]>, Iterable<int[]> {
  private int n;
  private int k;
  private int[] a;
  public Fixsub(int n, int k) {
    assert(k >= 0);
    assert(n >= k);
    this.n = n;
    this.k = k;
  }
  private int findIdxToIncrease() {
    int i;
    for (i = k - 1; i >= 0 && a[i] == n - k + i; --i) {}
    return i;
  }
  @Override public Iterator<int[]> iterator() {
    return this;
  }
  /**
   * Always returns the same array reference, the caller is responsible for
   * making a copy. The caller shouldn't modify the array elements.
   */
  @Override public int[] next() {
    if (a == null) {
      a = new int[k];
      for (int i = 0; i < k; ++i) {
        a[i] = i;
      }
    } else {
      int i = findIdxToIncrease();
      if (i < 0) throw new NoSuchElementException();
      for (++a[i++]; i < k; ++i) {
        a[i] = a[i - 1] + 1;
      }
    }
    return a;
  }
  @Override public boolean hasNext() {
    return a == null || findIdxToIncrease() >= 0;
  }
  @Override public void remove() {
    throw new UnsupportedOperationException();
  }

  public static void main(String[] args) {
    for (int[] p : new Fixsub(7, 4)) {
      StringBuilder sb = new StringBuilder();
      for (int i = 0; i < p.length; ++i) {
        if (i != 0) sb.append(',');
        sb.append(p[i]);
      }
      sb.append('\n');
      System.out.print(sb);
    }
  }
}