TokyoCabinetでハッシュいろいろ
mixi engineer blogをmacで試してみたので、そのログ
TokyoCabinetのインストールはTokyoCabinet/TokyoTyrant を Rails で使う - なんとなく日記を参考に、最新バージョンをとってきて、./configure && make && sudo make install、あとRubyのバインディングも併せてインストール。
テストコードそのままだと、Macではprocディレクトリがないため、メモリサイズが取得できない。のでmemory_usageメソッド(ps -lの出力結果の「RSS」列を取得)を書き換えたのが以下。
# tchash.rb require 'tokyocabinet' include TokyoCabinet def memory_usage return `ps -l #{$$} | awk '{print $9}'`.gsub(/.+\n(\d+).*/, '\1').chomp.to_i / 1024.0 end rnum = ARGV.length > 0 ? ARGV[0].to_i : 1000000 time = Time.now size = memory_usage if ARGV.length > 1 db = ADB::new db.open(ARGV[1] + "#bnum=" + rnum.to_s + "#mode=wct#xmsiz=0") || raise("open failed") (0...rnum).each do |i| buf = sprintf("%08d", i) db.put(buf, buf) end time = Time.now - time GC.start size = memory_usage - size db.close else hash = {} (0...rnum).each do |i| buf = sprintf("%08d", i) hash[buf] = buf end time = Time.now - time GC.start size = memory_usage - size end printf("Time: %.3f sec.\n", time) printf("Usage: %.3f MB\n", size)
ruby hash% ruby -v tchash.rb ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0] Time: 8.384 sec. Usage: 240.781 MB on-memory hash% ruby -v tchash.rb 1000000 '*' ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0] Time: 2.953 sec. Usage: 71.375 MB on-memory tree% ruby -v tchash.rb 1000000 '+' ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0] Time: 2.705 sec. Usage: 46.914 MB hash database% ruby -v tchash.rb 1000000 'foo.tch' ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0] Time: 17.326 sec. Usage: 4.578 MB B+ tree database% ruby -v tchash.rb 1000000 'foo.tcb' ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0] Time: 4.253 sec. Usage: 5.875 MB fixed-length database% ruby -v tchash.rb 1000000 'foo.tcf' ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0] Time: 4.220 sec. Usage: 244.688 MB table database% ruby -v tchash.rb 1000000 'foo.tct' ruby 1.8.7 (2009-04-08 patchlevel 160) [i686-darwin9.6.0] Time: 21.121 sec. Usage: 4.629 MB
ADB#open(name)の引数はmovedに倣った
Open a database.
`name' specifies the name of the database.
If it is "*", the database will be an on-memory hash database.
If it is "+", the database will be an on-memory tree database.
If its suffix is ".tch", the database will be a hash database.
If its suffix is ".tcb", the database will be a B+ tree database.
If its suffix is ".tcf", the database will be a fixed-length database.
If its suffix is ".tct", the database will be a table database.
Otherwise, this function fails. Tuning parameters can trail the name, separated by "#".
Each parameter is composed of the name and the value, separated by "=".
On-memory hash database supports "bnum", "capnum", and "capsiz".
On-memory tree database supports "capnum" and "capsiz".
Hash database supports "mode", "bnum", "apow", "fpow", "opts", "rcnum", and "xmsiz".
B+ tree database supports "mode", "lmemb", "nmemb", "bnum", "apow", "fpow", "opts", "lcnum", "ncnum", and "xmsiz".
Fixed-length database supports "mode", "width", and "limsiz".
Table database supports "mode", "bnum", "apow", "fpow", "opts", "rcnum", "lcnum", "ncnum", "xmsiz", and "idx".
If successful, the return value is true, else, it is false.
The tuning parameter "capnum" specifies the capacity number of records.
"capsiz" specifies the capacity size of using memory.
Records spilled the capacity are removed by the storing order.
"mode" can contain "w" of writer, "r" of reader, "c" of creating, "t" of truncating, "e" of no locking, and "f" of non-blocking lock.
The default mode is relevant to "wc".
"opts" can contains "l" of large option, "d" of Deflate option, "b" of BZIP2 option, and "t" of TCBS option.
"idx" specifies the column name of an index and its type separated by ":".
For example, "casket.tch#bnum=1000000#opts=ld" means that the name of the database file is "casket.tch", and the bucket number is 1000000, and the options are large and Deflate.
繰り越し:
今回は、Hashへの100万回書き込みだった訳だが、他にもいろいろ試してみたい。実装ものぞいてみたい。