TokyoCabinetでハッシュいろいろ

mixi engineer blogmacで試してみたので、そのログ
TokyoCabinetのインストールはTokyoCabinet/TokyoTyrant を Rails で使う - なんとなく日記を参考に、最新バージョンをとってきて、./configure && make && sudo make install、あとRubyバインディングも併せてインストール。
テストコードそのままだと、Macではprocディレクトリがないため、メモリサイズが取得できない。のでmemory_usageメソッド(ps -lの出力結果の「RSS」列を取得)を書き換えたのが以下。

# tchash.rb
require 'tokyocabinet'
include TokyoCabinet

def memory_usage
  return `ps -l #{$$} | awk '{print $9}'`.gsub(/.+\n(\d+).*/, '\1').chomp.to_i / 1024.0
end

rnum = ARGV.length > 0 ? ARGV[0].to_i : 1000000

time = Time.now
size = memory_usage
if ARGV.length > 1
  db = ADB::new
  db.open(ARGV[1] + "#bnum=" + rnum.to_s + "#mode=wct#xmsiz=0") || raise("open failed")
  (0...rnum).each do |i|
    buf = sprintf("%08d", i)
    db.put(buf, buf)
  end
  time = Time.now - time
  GC.start
  size = memory_usage - size
  db.close
else
  hash = {}
  (0...rnum).each do |i|
    buf = sprintf("%08d", i)
    hash[buf] = buf
  end
  time = Time.now - time
  GC.start
  size = memory_usage - size
end

printf("Time: %.3f sec.\n", time)
printf("Usage: %.3f MB\n", size)
ruby hash% ruby -v tchash.rb 
ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0]
Time: 8.384 sec.
Usage: 240.781 MB

on-memory hash% ruby -v tchash.rb 1000000 '*'
ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0]
Time: 2.953 sec.
Usage: 71.375 MB

on-memory tree% ruby -v tchash.rb 1000000 '+'
ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0]
Time: 2.705 sec.
Usage: 46.914 MB

hash database% ruby -v tchash.rb 1000000 'foo.tch'
ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0]
Time: 17.326 sec.
Usage: 4.578 MB
                            
B+ tree database% ruby -v tchash.rb 1000000 'foo.tcb'
ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0]
Time: 4.253 sec.
Usage: 5.875 MB

fixed-length database% ruby -v tchash.rb 1000000 'foo.tcf'
ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0]
Time: 4.220 sec.
Usage: 244.688 MB

table database% ruby -v tchash.rb 1000000 'foo.tct'
ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0]
Time: 21.121 sec.
Usage: 4.629 MB

ADB#open(name)の引数はmovedに倣った

Open a database.
`name' specifies the name of the database.
If it is "*", the database will be an on-memory hash database.
If it is "+", the database will be an on-memory tree database.
If its suffix is ".tch", the database will be a hash database.
If its suffix is ".tcb", the database will be a B+ tree database.
If its suffix is ".tcf", the database will be a fixed-length database.
If its suffix is ".tct", the database will be a table database.
Otherwise, this function fails. Tuning parameters can trail the name, separated by "#".
Each parameter is composed of the name and the value, separated by "=".
On-memory hash database supports "bnum", "capnum", and "capsiz".
On-memory tree database supports "capnum" and "capsiz".
Hash database supports "mode", "bnum", "apow", "fpow", "opts", "rcnum", and "xmsiz".
B+ tree database supports "mode", "lmemb", "nmemb", "bnum", "apow", "fpow", "opts", "lcnum", "ncnum", and "xmsiz".
Fixed-length database supports "mode", "width", and "limsiz".
Table database supports "mode", "bnum", "apow", "fpow", "opts", "rcnum", "lcnum", "ncnum", "xmsiz", and "idx".
If successful, the return value is true, else, it is false.
The tuning parameter "capnum" specifies the capacity number of records.
"capsiz" specifies the capacity size of using memory.
Records spilled the capacity are removed by the storing order.
"mode" can contain "w" of writer, "r" of reader, "c" of creating, "t" of truncating, "e" of no locking, and "f" of non-blocking lock.
The default mode is relevant to "wc".
"opts" can contains "l" of large option, "d" of Deflate option, "b" of BZIP2 option, and "t" of TCBS option.
"idx" specifies the column name of an index and its type separated by ":".
For example, "casket.tch#bnum=1000000#opts=ld" means that the name of the database file is "casket.tch", and the bucket number is 1000000, and the options are large and Deflate.

繰り越し:
今回は、Hashへの100万回書き込みだった訳だが、他にもいろいろ試してみたい。実装ものぞいてみたい。