S. Lott has written a short note regarding Cubulus. Quote: "It may be that this technique helps a lot with an RDBMS that doesn't support the star schema via bit-mapped indexes. It may be that this technique only helps a little with a more modern RDBMS."
Somewhat true, haven't measured GROUP BY versus range-queries with bitmap indexes (say, in Oracle), and neither in MonetDB. Time will tell if simpler and "embarrasingly parallel" SQL is better..
Sunday, May 27, 2007
Wednesday, May 16, 2007
compiling MonetDB python libraries on the Mac
MonetDB is a "think different" kind of database. Link: http://monetdb.cwi.nl
Here are some notes for the builtin python2.3 for Mac OS X 10.4.9 . It builds correctly, and it works after manually moving some libs .
-headers: /usr/include/python2.3/
-libs: /usr/lib (/usr/lib/libpython.dylib , /usr/lib/libpython2.dylib , /usr/lib/libpython2.3.dylib )
-site-packages: /usr/lib/python2.3/site-packages/
-dynamic libs: /usr/lib/python2.3/lib-dynload/
It is easy to get readline from www.darwinports.org (sudo port install readline):
-Headers: /opt/local/include/readline
-Libs: /opt/local/lib/libreadline.5.0.dylib /opt/local/lib/libreadline.a /opt/local/lib/libreadline.5.dylib /opt/local/lib/libreadline.dylib
Swig also from darwinports , installed as: /opt/local/bin/swig . (Get it with: sudo port install swig)
./configure --with-python=python2.3 --with-python-library=/usr/lib --with-python-incdir=/usr/include/python2.3/ --with-swig=/opt/local/bin/swig
make
sudo make install
This still DOES NOT copy MonetSQLdb under python2.3 site-packages, and places dynamic libs under /usr/local/lib/python2.3/site-packages/lib/ instead of usr/lib/python2.3/lib-dynload/
Fixed with:
cp /usr/local/lib/python2.3/site-packages/lib/_* ~/Desktop/clients/src/python/Cimpl/MonetSQLdb
cp /usr/local/lib/python2.3/site-packages/lib/_* ~/Desktop/clients/src/python/Cimpl
Here are some notes for the builtin python2.3 for Mac OS X 10.4.9 . It builds correctly, and it works after manually moving some libs .
-headers: /usr/include/python2.3/
-libs: /usr/lib (/usr/lib/libpython.dylib , /usr/lib/libpython2.dylib , /usr/lib/libpython2.3.dylib )
-site-packages: /usr/lib/python2.3/site-packages/
-dynamic libs: /usr/lib/python2.3/lib-dynload/
It is easy to get readline from www.darwinports.org (sudo port install readline):
-Headers: /opt/local/include/readline
-Libs: /opt/local/lib/libreadline.5.0.dylib /opt/local/lib/libreadline.a /opt/local/lib/libreadline.5.dylib /opt/local/lib/libreadline.dylib
Swig also from darwinports , installed as: /opt/local/bin/swig . (Get it with: sudo port install swig)
./configure --with-python=python2.3 --with-python-library=/usr/lib --with-python-incdir=/usr/include/python2.3/ --with-swig=/opt/local/bin/swig
make
sudo make install
This still DOES NOT copy MonetSQLdb under python2.3 site-packages, and places dynamic libs under /usr/local/lib/python2.3/site-packages/lib/ instead of usr/lib/python2.3/lib-dynload/
Fixed with:
cp /usr/local/lib/python2.3/site-packages/lib/_* ~/Desktop/clients/src/python/Cimpl/MonetSQLdb
cp /usr/local/lib/python2.3/site-packages/lib/_* ~/Desktop/clients/src/python/Cimpl
Saturday, April 28, 2007
waiting for Sun's Try & Buy
Cubulus can scale (tested on 1 and 2 cores computers = we can call this "theory" with regards to real servers) horizontally and vertically. I just can't wait for the Sun Ultra T1000 server from Try & Buy programme.
24 threads (or 32, depending on config), now that's scaling !
24 threads (or 32, depending on config), now that's scaling !
Friday, April 27, 2007
Tuesday, April 24, 2007
Cubulus OLAP mentioned on Teleinteractive and Technorati
One more post here: http://press.teleinteractive.net/oss/2007/04/23/cubulus_presentation , and tagged on Technorati
Monday, April 23, 2007
Cubulus OLAP mentioned by Gobán Saor
Submitting Cubulus on Freshmeat was very succesfull: people started noticing it, and writing about it. For example
Goban Saor
Goban Saor
Sunday, April 22, 2007
Cubulus OLAP mentioned in Chris Webb's blog
Cubulus OLAP mentioned in Chris Webb's blog:
http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!1132.entry
Hurray!!
http://cwebbbi.spaces.live.com/blog/cns!7B84B0F2C239489A!1132.entry
Hurray!!
Friday, April 20, 2007
Python and multiple processes
Hi,
Somebody was wondering about locking in Python (and CherryPy), link: http://groups.google.com/group/cherrypy-users/browse_thread/thread/bbc2800aa0f329c4?hl=en
Why not running running multiple python processes for CherryPy ? That would ease the GIL . You can "load balance" requests with Apache's mod_proxy .
If you need sessions, try web reverse proxy + state-full balancer called Pound : http://www.apsis.ch/pound/
From the docs: "a load balancer: it will distribute the requests from the client browsers among several back-end servers, while keeping session information." ... "Five possible ways of detecting a session have been implemented in Pound (hopefully the most useful ones): by client address, by Basic authentication, by URL parameter, by cookie and by header value"
Cheers, Alex
Somebody was wondering about locking in Python (and CherryPy), link: http://groups.google.com/group/cherrypy-users/browse_thread/thread/bbc2800aa0f329c4?hl=en
Why not running running multiple python processes for CherryPy ? That would ease the GIL . You can "load balance" requests with Apache's mod_proxy .
If you need sessions, try web reverse proxy + state-full balancer called Pound : http://www.apsis.ch/pound/
From the docs: "a load balancer: it will distribute the requests from the client browsers among several back-end servers, while keeping session information." ... "Five possible ways of detecting a session have been implemented in Pound (hopefully the most useful ones): by client address, by Basic authentication, by URL parameter, by cookie and by header value"
Cheers, Alex
Wednesday, April 4, 2007
digest authentication with CherryPy
For the authorizations, Trac project elegantly delegates authentication to Apache , see here
Unaware of Trac solution, I was fighting problem on my own:
CherryPy has built-in support for authorization. "Digest" and "ldap" are the ones that make most sense in practice.
I've been wondering if CherryPy is the right place for doing authorization. Because instead of _coding_ authorization it is easier to _declare_ it in the front web servers. Instead of re-inventing the wheel, let the right tool handle the job.
IMHO it is the Right Thing (TM) to do, since it also makes sense to use a reverse proxy (apache/lighttpd/squid) in front of CherryPy to deal with buffer overflows, malformed URLs and other nasty things.
This allows CherryPy to run in session-less mode, which allows for easier fail-over or load balancing. Another benefit Windows users: one could use "Integrated Windows Authorization" with IIS.
I tried to deploy this with Apache mod_proxy and mod_digest. Unfortunately, mod_proxy "eats" all extra headers (see http://allafrica.com/tools/apache/mod_proxy ). On Webfaction forum, Remi Delon suggested to try mod_rewrite. Well, Apache mod_rewrite is "voodoo" (quote from http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html).. If someone manages to get it work, please let me know.
Instead, here is quick-and-dirty solution with .. lighttpd. Relevant lines from lighttpd.conf are :
server.modules = (
"mod_access",
"mod_auth",
"mod_proxy",
"mod_accesslog" )
server.port = 8090
proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => 8080 ) ) )
auth.backend = "plain"
auth.backend.plain.userfile = "/opt/local/etc/lighttpd/lighttpd.user"
auth.require = ( "/" => (
"method" => "digest",
"realm" => "cubulus",
"require" => "valid-user"
) )
CherryPy receives header 'Authorization' with content 'Digest username="a", realm="cubulus", nonce=.., uri=.. qop="auth" ..... '
Easy thing is that if authorization fails, CherryPy receives.. nothing, so it's enough to look for Digest username="XXX"
Cheers, Alex
links: CherryPy wiki , Webfaction forum
Unaware of Trac solution, I was fighting problem on my own:
CherryPy has built-in support for authorization. "Digest" and "ldap" are the ones that make most sense in practice.
I've been wondering if CherryPy is the right place for doing authorization. Because instead of _coding_ authorization it is easier to _declare_ it in the front web servers. Instead of re-inventing the wheel, let the right tool handle the job.
IMHO it is the Right Thing (TM) to do, since it also makes sense to use a reverse proxy (apache/lighttpd/squid) in front of CherryPy to deal with buffer overflows, malformed URLs and other nasty things.
This allows CherryPy to run in session-less mode, which allows for easier fail-over or load balancing. Another benefit Windows users: one could use "Integrated Windows Authorization" with IIS.
I tried to deploy this with Apache mod_proxy and mod_digest. Unfortunately, mod_proxy "eats" all extra headers (see http://allafrica.com/tools/apache/mod_proxy ). On Webfaction forum, Remi Delon suggested to try mod_rewrite. Well, Apache mod_rewrite is "voodoo" (quote from http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html).. If someone manages to get it work, please let me know.
Instead, here is quick-and-dirty solution with .. lighttpd. Relevant lines from lighttpd.conf are :
server.modules = (
"mod_access",
"mod_auth",
"mod_proxy",
"mod_accesslog" )
server.port = 8090
proxy.server = ( "" => ( ( "host" => "127.0.0.1", "port" => 8080 ) ) )
auth.backend = "plain"
auth.backend.plain.userfile = "/opt/local/etc/lighttpd/lighttpd.user"
auth.require = ( "/" => (
"method" => "digest",
"realm" => "cubulus",
"require" => "valid-user"
) )
CherryPy receives header 'Authorization' with content 'Digest username="a", realm="cubulus", nonce=.., uri=.. qop="auth" ..... '
Easy thing is that if authorization fails, CherryPy receives.. nothing, so it's enough to look for Digest username="XXX"
Cheers, Alex
links: CherryPy wiki , Webfaction forum
Monday, April 2, 2007
digest authentication in CherryPy
Apache already has mod_digest (and mod_ldap), and IMHO it is better to delegate security to Apache (rather than re-implement digest in CherryPy). Also, it might be safer to _configure_ security declaratively than code it.
Has anyone deployed CherrPy behind mod_digest (or mod_ldap), and retrieved the "Authorization-Info" ?
Or, is it a wrong question? I am trying a parallel with MS IIS , where is possible to set Directory Security to "Windows Integrated". The CGI or ASP pages retrieve the "authenticated user" server variable, or headers.
When deploying CherryPy (3.0.1) behind Apache (Apache/1.3.33-Darwin with mod_digest and mod_proxy), CherryPy only receives these headers, and there is no username...
Request Headers:
X-FORWARDED-SERVER: local
REFERER: http://local/index
ACCEPT-LANGUAGE: en
HOST: localhost:8080
ACCEPT: */*
USER-AGENT: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/419 (KHTML, like Gecko) Safari/419.3
CONNECTION: close
Remote-Addr: 127.0.0.1
X-FORWARDED-HOST: local
X-FORWARDED-FOR: 192.168.255.3
ACCEPT-ENCODING: gzip, deflate
Has anyone deployed CherrPy behind mod_digest (or mod_ldap), and retrieved the "Authorization-Info" ?
Or, is it a wrong question? I am trying a parallel with MS IIS , where is possible to set Directory Security to "Windows Integrated". The CGI or ASP pages retrieve the "authenticated user" server variable, or headers.
When deploying CherryPy (3.0.1) behind Apache (Apache/1.3.33-Darwin with mod_digest and mod_proxy), CherryPy only receives these headers, and there is no username...
Request Headers:
X-FORWARDED-SERVER: local
REFERER: http://local/index
ACCEPT-LANGUAGE: en
HOST: localhost:8080
ACCEPT: */*
USER-AGENT: Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/419 (KHTML, like Gecko) Safari/419.3
CONNECTION: close
Remote-Addr: 127.0.0.1
X-FORWARDED-HOST: local
X-FORWARDED-FOR: 192.168.255.3
ACCEPT-ENCODING: gzip, deflate
Saturday, March 31, 2007
mySQL slow on MacOS, most of CPU time is %nice
Most of the time mySQL appears "blue" in Activity Monitor, meaning that CPU time is mostly spent as "nice".
This happens with mysql-standard-4.1.13-apple-darwin8.1.0-powerpc (the one that comes with OS X 10.4.9), and same with mysql-5.1.16-beta-osx10.4-powerpc
First I thought that I'm running "wrong queries" (generated by Cubulus, work in progress), but today it happened during standard operations ex: "alter table fact drop column dim_4;" (MyISAM table with 2 500 000 rows)
Computer is an iMac G5 1.8 Ghz with 1.5 Gb memory (table fits in memory, there is no waiting due to disk). At the time of screen capture there was no other task doing anything relevant, except for Activity Monitor. There was no other query for mySQL.
Configuration /etc/my.cnf file is:
[mysqld]
port = 3306
socket = /tmp/mysql.sock
skip-locking
key_buffer = 256M
max_allowed_packet = 1M
table_cache = 1024
sort_buffer_size = 1M
read_buffer_size = 4M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache = 8
query_cache_size= 0M
#lowered in order to minimize OS X context switching overhead
thread_concurrency = 2
#OS X recommended setting
skip-thread-priority
skip-networking
log-bin
server-id = 1
[mysqldump]
quick
max_allowed_packet = 16M
[mysql]
no-auto-rehash
[isamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer = 128M
sort_buffer_size = 128M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
Thursday, March 29, 2007
live demo at Webfaction
There is a live demo of Cubulus. Webfaction offers free hosting for Open Source projects.
Monday, March 26, 2007
if you must use Win**, use ipfw firewall
Windows firewalls usually all suck. They have a complicated interface where you can see the alerts, flash when there i traffic..
Anyway, use wipfw, which is a port of the FreeBSD ipfw firewall (also used in Darwin and Mac OS).
Anyway, use wipfw, which is a port of the FreeBSD ipfw firewall (also used in Darwin and Mac OS).
Sunday, March 18, 2007
Cubulus Olap
I have an Open Source project at http://sourceforge.net/projects/cubulus . Who knows, maybe someone will read this blog, and click there
OS X firewall
There is a nice article about improving OS X security http://www.tuaw.com/2007/03/12/make-os-x-more-secure-now/
. I am interested in closing some ports I don't need (no DNS server in my machine, no Bonjour, no printing..) in such way that fw rules will stay after reboot.
I have a script to delete un-needed ports. Since I can't figure a better way to modify the default rules of ipfw2, I will follow Sebastiaan de With's advice to use Lingon.
Here is the script, the Lingon job is easy: create new "User Daemon", put the path (no command line params needed) , set "Run at load", and reboot to see if it works. It Does !
-----------------------------------------------
/sbin/ipfw show > /tmp/origrules
awk '/dst-port 137 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules > /tmp/newrules
awk '/dst-port 427 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/dst-port 631 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/dst-port 5353 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/67 to me in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/5353 to me in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
chmod +x /tmp/newrules
sh /tmp/newrules
rm /tmp/origrules
rm /tmp/newrules
. I am interested in closing some ports I don't need (no DNS server in my machine, no Bonjour, no printing..) in such way that fw rules will stay after reboot.
I have a script to delete un-needed ports. Since I can't figure a better way to modify the default rules of ipfw2, I will follow Sebastiaan de With's advice to use Lingon.
Here is the script, the Lingon job is easy: create new "User Daemon", put the path (no command line params needed) , set "Run at load", and reboot to see if it works. It Does !
-----------------------------------------------
/sbin/ipfw show > /tmp/origrules
awk '/dst-port 137 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules > /tmp/newrules
awk '/dst-port 427 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/dst-port 631 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/dst-port 5353 in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/67 to me in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
awk '/5353 to me in/ {print "/sbin/ipfw del", substr($1,0,5)}' /tmp/origrules >> /tmp/newrules
chmod +x /tmp/newrules
sh /tmp/newrules
rm /tmp/origrules
rm /tmp/newrules
Saturday, February 17, 2007
memcached ssslllooowww on Mac OS X
The purpose of memcached is .. to be fast .. . Unfortunately the default port (Darwinports) is really slow (slow as in 3 seconds of doing nothing on a 1.8GHz PowerPC G5), and somebody figured out a fix.
Now here are few notes on how to recompile it (quick and dirty):
import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
mc.set("some_key", "Some value")
value = mc.get("some_key")
Now it will not wait 3 seconds
Now here are few notes on how to recompile it (quick and dirty):
- find the source /opt/local/var/db/dports/distfiles/memcached/memcached-1.1.11.tar.gz . Unpack it (ex: on Desktop)
- as suggested in fix., edit the memcached.c file, adding #undef TCP_NOPSH on line 105
- find the location of libevent (which came as part of port install)
- in the folder with patched memcached run
- ./configure --with-libevent=/opt/local/var/db/dports/software/libevent/1.1a_0/opt/local/
- make
- take a look at the original permissions
- copy new file into it's original location by: sudo mv memcached /opt/local/bin/memcached
- restore original permissions : sudo chown root:admin /opt/local/bin/memcached
import memcache
mc = memcache.Client(['127.0.0.1:11211'], debug=0)
mc.set("some_key", "Some value")
value = mc.get("some_key")
Now it will not wait 3 seconds
threadsafe CheryPy
Using CherryPy is so easy, like everything in Python. But when running some pages with heavy SQL processing time, there are a lot of funny errors (malloc.. double free). Ooops, looks as something is not theadsafe. mySQLdb is not necessarily threadsafe, but cherrypy.engine.on_start_thread_list should handle this as suggested . Instead of digging deep into debugging, here is an alternative fix, using DBUtils.
Yipee, my code got posted into CherryPy:
Yipee, my code got posted into CherryPy:
import cherrypy
import MySQLdb
from DBUtils.PersistentDB import PersistentDB
class HomePage:
@cherrypy.expose
def index(self):
sql = "select max(whatever) from bigtable;" #long running query
c = persDb.connection().cursor()
c.execute(sql)
res = c.fetchone()
c.close()
return ' '.join( ['<html>', repr(res), '</html>'] )
cherrypy.tree.mount(HomePage())
if __name__ == '__main__':
persDb=PersistentDB(MySQLdb, 10000, host='localhost', db='db', passwd='****')
cherrypy.config.update({'server.thread_pool': 10})
cherrypy.server.quickstart()
cherrypy.engine.start()
Subscribe to:
Posts (Atom)