Sunday, May 27, 2007

Cubulus and bitmap indexes opinion

S. Lott has written a short note regarding Cubulus. Quote: "It may be that this technique helps a lot with an RDBMS that doesn't support the star schema via bit-mapped indexes. It may be that this technique only helps a little with a more modern RDBMS."

Somewhat true, haven't measured GROUP BY versus range-queries with bitmap indexes (say, in Oracle), and neither in MonetDB. Time will tell if simpler and "embarrasingly parallel" SQL is better..

Daily Python URL

Cubulus mentioned in Pythonware's Daily Python URLon 2007-05-25

Cubulus mentioned by pyparsing

Pyparsing is an easy to use parser for Python. The notice link: here

Wednesday, May 16, 2007

compiling MonetDB python libraries on the Mac

MonetDB is a "think different" kind of database. Link: http://monetdb.cwi.nl

Here are some notes for the builtin python2.3 for Mac OS X 10.4.9 . It builds correctly, and it works after manually moving some libs .
-headers: /usr/include/python2.3/
-libs: /usr/lib (/usr/lib/libpython.dylib , /usr/lib/libpython2.dylib , /usr/lib/libpython2.3.dylib )
-site-packages: /usr/lib/python2.3/site-packages/
-dynamic libs: /usr/lib/python2.3/lib-dynload/

It is easy to get readline from www.darwinports.org (sudo port install readline):
-Headers: /opt/local/include/readline
-Libs: /opt/local/lib/libreadline.5.0.dylib /opt/local/lib/libreadline.a /opt/local/lib/libreadline.5.dylib /opt/local/lib/libreadline.dylib

Swig also from darwinports , installed as: /opt/local/bin/swig . (Get it with: sudo port install swig)


./configure --with-python=python2.3 --with-python-library=/usr/lib --with-python-incdir=/usr/include/python2.3/ --with-swig=/opt/local/bin/swig
make
sudo make install


This still DOES NOT copy MonetSQLdb under python2.3 site-packages, and places dynamic libs under /usr/local/lib/python2.3/site-packages/lib/ instead of usr/lib/python2.3/lib-dynload/

Fixed with:
cp /usr/local/lib/python2.3/site-packages/lib/_* ~/Desktop/clients/src/python/Cimpl/MonetSQLdb
cp /usr/local/lib/python2.3/site-packages/lib/_* ~/Desktop/clients/src/python/Cimpl

Saturday, April 28, 2007

waiting for Sun's Try & Buy

Cubulus can scale (tested on 1 and 2 cores computers = we can call this "theory" with regards to real servers) horizontally and vertically. I just can't wait for the Sun Ultra T1000 server from Try & Buy programme.

24 threads (or 32, depending on config), now that's scaling !

Monday, April 23, 2007

Cubulus OLAP mentioned by Gobán Saor

Submitting Cubulus on Freshmeat was very succesfull: people started noticing it, and writing about it. For example
Goban Saor

Sunday, April 22, 2007

Cubulus OLAP mentioned in Chris Webb's blog

Cubulus OLAP mentioned in Chris Webb's blog:
http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!1132.entry

Hurray!!

Friday, April 20, 2007

Python and multiple processes

Hi,

Somebody was wondering about locking in Python (and CherryPy), link: http://groups.google.com/group/cherrypy-users/browse_thread/thread/bbc2800aa0f329c4?hl=en

Why not running running multiple python processes for CherryPy ? That would ease the GIL . You can "load balance" requests with Apache's mod_proxy .

If you need sessions, try web reverse proxy + state-full balancer called Pound : http://www.apsis.ch/pound/

From the docs: "a load balancer: it will distribute the requests from the client browsers among several back-end servers, while keeping session information." ... "Five possible ways of detecting a session have been implemented in Pound (hopefully the most useful ones): by client address, by Basic authentication, by URL parameter, by cookie and by header value"

Cheers, Alex

Wednesday, April 4, 2007

digest authentication with CherryPy

For the authorizations, Trac project elegantly delegates authentication to Apache , see here

Unaware of Trac solution, I was fighting problem on my own:

CherryPy has built-in support for authorization. "Digest" and "ldap" are the ones that make most sense in practice.

I've been wondering if CherryPy is the right place for doing authorization. Because instead of _coding_ authorization it is easier to _declare_ it in the front web servers. Instead of re-inventing the wheel, let the right tool handle the job.

IMHO it is the Right Thing (TM) to do, since it also makes sense to use a reverse proxy (apache/lighttpd/squid) in front of CherryPy to deal with buffer overflows, malformed URLs and other nasty things.

This allows CherryPy to run in session-less mode, which allows for easier fail-over or load balancing. Another benefit Windows users: one could use "Integrated Windows Authorization" with IIS.

I tried to deploy this with Apache mod_proxy and mod_digest. Unfortunately, mod_proxy "eats" all extra headers (see http://allafrica.com/tools/apache/mod_proxy ). On Webfaction forum, Remi Delon suggested to try mod_rewrite. Well, Apache mod_rewrite is "voodoo" (quote from http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html).. If someone manages to get it work, please let me know.

Instead, here is quick-and-dirty solution with .. lighttpd. Relevant lines from lighttpd.conf are :
server.modules = (
"mod_access",
"mod_auth",
"mod_proxy",
"mod_accesslog" )
server.port = 8090
proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => 8080 ) ) )
auth.backend = "plain"
auth.backend.plain.userfile = "/opt/local/etc/lighttpd/lighttpd.user"
auth.require = ( "/" => (
"method" => "digest",
"realm" => "cubulus",
"require" => "valid-user"
) )

CherryPy receives header 'Authorization' with content 'Digest username="a", realm="cubulus", nonce=.., uri=.. qop="auth" ..... '

Easy thing is that if authorization fails, CherryPy receives.. nothing, so it's enough to look for Digest username="XXX"

Cheers, Alex
links: CherryPy wiki , Webfaction forum

Monday, April 2, 2007

digest authentication in CherryPy

Apache already has mod_digest (and mod_ldap), and IMHO it is better to delegate security to Apache (rather than re-implement digest in CherryPy). Also, it might be safer to _configure_ security declaratively than code it.

Has anyone deployed CherrPy behind mod_digest (or mod_ldap), and retrieved the "Authorization-Info" ?

Or, is it a wrong question? I am trying a parallel with MS IIS , where is possible to set Directory Security to "Windows Integrated". The CGI or ASP pages retrieve the "authenticated user" server variable, or headers.

When deploying CherryPy (3.0.1) behind Apache (Apache/1.3.33-Darwin with mod_digest and mod_proxy), CherryPy only receives these headers, and there is no username...

Request Headers:
X-FORWARDED-SERVER: local
REFERER: http://local/index
ACCEPT-LANGUAGE: en
HOST: localhost:8080
ACCEPT: */*
USER-AGENT: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/419 (KHTML, like Gecko) Safari/419.3
CONNECTION: close
Remote-Addr: 127.0.0.1
X-FORWARDED-HOST: local
X-FORWARDED-FOR: 192.168.255.3
ACCEPT-ENCODING: gzip, deflate

Saturday, March 31, 2007

mySQL slow on MacOS, most of CPU time is %nice


Most of the time mySQL appears "blue" in Activity Monitor, meaning that CPU time is mostly spent as "nice".

















This happens with mysql-standard-4.1.13-apple-darwin8.1.0-powerpc (the one that comes with OS X 10.4.9), and same with mysql-5.1.16-beta-osx10.4-powerpc

First I thought that I'm running "wrong queries" (generated by Cubulus, work in progress), but today it happened during standard operations ex: "alter table fact drop column dim_4;" (MyISAM table with 2 500 000 rows)

Computer is an iMac G5 1.8 Ghz with 1.5 Gb memory (table fits in memory, there is no waiting due to disk). At the time of screen capture there was no other task doing anything relevant, except for Activity Monitor. There was no other query for mySQL.

Configuration /etc/my.cnf file is:
[mysqld]
port = 3306
socket = /tmp/mysql.sock
skip-locking
key_buffer = 256M
max_allowed_packet = 1M
table_cache = 1024
sort_buffer_size = 1M
read_buffer_size = 4M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache = 8
query_cache_size= 0M
#lowered in order to minimize OS X context switching overhead
thread_concurrency = 2

#OS X recommended setting
skip-thread-priority

skip-networking

log-bin

server-id = 1

[mysqldump]
quick
max_allowed_packet = 16M

[mysql]
no-auto-rehash

[isamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M

[myisamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M

[mysqlhotcopy]
interactive-timeout

Thursday, March 29, 2007

Cubulus OLAP post on Partha's blog

The post about Cubulus OLAP appeared in Partha's BLOG about partitioning in MySQL 5

Hurray!

live demo at Webfaction

There is a live demo of Cubulus. Webfaction offers free hosting for Open Source projects.

Monday, March 26, 2007

if you must use Win**, use ipfw firewall

Windows firewalls usually all suck. They have a complicated interface where you can see the alerts, flash when there i traffic..

Anyway, use wipfw, which is a port of the FreeBSD ipfw firewall (also used in Darwin and Mac OS).

Sunday, March 18, 2007

Cubulus Olap

I have an Open Source project at http://sourceforge.net/projects/cubulus . Who knows, maybe someone will read this blog, and click there

OS X firewall

There is a nice article about improving OS X security http://www.tuaw.com/2007/03/12/make-os-x-more-secure-now/
. I am interested in closing some ports I don't need (no DNS server in my machine, no Bonjour, no printing..) in such way that fw rules will stay after reboot.

I have a script to delete un-needed ports. Since I can't figure a better way to modify the default rules of ipfw2, I will follow Sebastiaan de With's advice to use Lingon.

Here is the script, the Lingon job is easy: create new "User Daemon", put the path (no command line params needed) , set "Run at load", and reboot to see if it works. It Does !

-----------------------------------------------
/sbin/ipfw show > /tmp/origrules

awk '/dst-port 137 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules > /tmp/newrules
awk '/dst-port 427 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/dst-port 631 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/dst-port 5353 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/67 to me in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/5353 to me in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules

chmod +x /tmp/newrules
sh /tmp/newrules

rm /tmp/origrules
rm /tmp/newrules

Saturday, February 17, 2007

memcached ssslllooowww on Mac OS X

The purpose of memcached is .. to be fast .. . Unfortunately the default port (Darwinports) is really slow (slow as in 3 seconds of doing nothing on a 1.8GHz PowerPC G5), and somebody figured out a fix.

Now here are few notes on how to recompile it (quick and dirty):

  • find the source /opt/local/var/db/dports/distfiles/memcached/memcached-1.1.11.tar.gz . Unpack it (ex: on Desktop)
  • as suggested in fix., edit the memcached.c file, adding #undef TCP_NOPSH on line 105
  • find the location of libevent (which came as part of port install)
  • in the folder with patched memcached run
    • ./configure --with-libevent=/opt/local/var/db/dports/software/libevent/1.1a_0/opt/local/
    • make
    • take a look at the original permissions
    • copy new file into it's original location by: sudo mv memcached /opt/local/bin/memcached
    • restore original permissions : sudo chown root:admin /opt/local/bin/memcached
Now start memcached frm command line (memcached -vv for verbose logging), and test as it says in the README (from python):

import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)

mc.set("some_key", "Some value")
value = mc.get("some_key")

Now it will not wait 3 seconds

threadsafe CheryPy

Using CherryPy is so easy, like everything in Python. But when running some pages with heavy SQL processing time, there are a lot of funny errors (malloc.. double free). Ooops, looks as something is not theadsafe. mySQLdb is not necessarily threadsafe, but cherrypy.engine.on_start_thread_list should handle this as suggested . Instead of digging deep into debugging, here is an alternative fix, using DBUtils.

Yipee, my code got posted into CherryPy:

import cherrypy
import MySQLdb

from DBUtils.PersistentDB import PersistentDB

class HomePage:
@cherrypy.expose

def index(self):
sql = "select max(whatever) from bigtable;" #long running query

c = persDb.connection().cursor()
c.execute(sql)

res = c.fetchone()
c.close()
return ' '.join( ['<html>', repr(res), '</html>'] )


cherrypy.tree.mount(HomePage())

if __name__ == '__main__':
persDb=PersistentDB(MySQLdb, 10000, host='localhost', db='db', passwd='****')

cherrypy.config.update({'server.thread_pool': 10})
cherrypy.server.quickstart()

cherrypy.engine.start()