SQL ALCHEMY

SQLAlchemy Value Types

This is an expositon on storing value types in a MySQL database. By this we mean complex objects possibly spread across many tables that are not really shared.

One to Many Relationships

Here is an SQLA setup that implements a OneToMany scenario where the parent Table is the owner of the child list. So that the list is like a value type and the children have only one parent. When the parent dies the children are collected too. This should also work for a OneToOne relation.

We assume the following imports
from sqlalchemy import *
from sqlalchemy import orm
engine=create_engine('mysql://user:password@localhost/test',echo=False)
meta=MetaData(bind=engine)
Session=orm.scoped_session(orm.sessionmaker(bind=engine))
Define the tables. Our model is a database of Photos each of which can be tagged with multiple tags. Note that photo has no reference to any tags object. We also put an index on the foreign key column (as Elixir does).
# note no reference to tags 
photo = Table('photo',meta,
    Column('id',Integer,primary_key=True),
    Column('name',String(80),nullable=False),
    mysql_engine='InnoDB' # only this table type works MyISAM does not
    )

tag  = Table('tag',meta,
    Column('id',Integer,primary_key=True),
    Column('photo_id',None,index=True,ForeignKey(photo.columns.id,
    	ondelete='CASCADE', # if photo row is deleted then owned tags are deleted too.
    	onupdate='RESTRICT')),#DON'T ALLOW photo TABLE to change its id column
                              # (see HERE)
    Column('tag',String(80),nullable=False),
    mysql_engine='InnoDB' # only this table type works MyISAM does not
    )

# create the database tables [if not exists]
# will not alter tables!
meta.create_all()
Now create the Object relationship mapping and the "fake" tags attribute. Note that because we are using an InnoDB database the child deletes with be preserved. see OrmObject:

class Photo(OrmObject):pass
class Tag(OrmObject):pass

orm.mapper(Photo,photo,properties=dict(
    tags=orm.relation(Tag,cascade='all, delete-orphan',
    	#backref=orm.backref('photo',lazy=True) # see garbage collection notes below
        passive_deletes=True,passive_updates=True # because we have on delete 
                                                  # cascade and InnoDB Tables
        ), 
))
orm.mapper(Tag,tag)
Note that the cascade option is not really required when we have active foreign keys. The following simple relation emits the same SQL as above when photos are deleted
    tags=orm.relation(Tag)
This is because SQLA doesn't cacasde anything (so doesn't emit any unnessecary SQL) but the objects are removed just the same. (This is different from cascade=None where the Tag children are not even added to the database... remember cascade=False or cascade='save-update,merge' is the default.

We can add objects but remember that a Tag object can only have one parent (one2many relationship)

tag1=Tag(tag='tag1')
p1=Photo(tags=[tag1,Tag(tag='tag2')])# this tag1 is not saved
p2=Photo(tags=[tag1,Tag(tag='tag3')])# p2 will "own" tag1
session.add_all((p1,p2))# NOT! session.add(p1,p2)
session.commit()
p=session.query(Photo).get(1)
p.tags=[Tag(tag='t2'),Tag(tag='t3')]# should delete old tags and add new
session.commit()
Remeber tags is a list: you can't set it to None to remove all elements. Use an empty list.
for p in session.query(Photo):
	p.tags=[]# OK!
	p.tags = None # no!
Note that without cascade='all, delete-orphan'. When we update the tags on a photo we have the following diff (i.e. without the cascade set top 'all, delete-orphan' the photo_id's are set to null, with the cascade SA emits a delete statement.)
88,93d93
<  UPDATE tag SET photo_id=%s WHERE tag.id = %s
<  [None, 4L]
<  UPDATE tag SET photo_id=%s WHERE tag.id = %s
<  [None, 5L]
<  UPDATE tag SET photo_id=%s WHERE tag.id = %s
<  [None, 6L]
97a98,99
>  DELETE FROM tag WHERE tag.id = %s
>  [[4L], [5L], [6L]]
Note also that BULK deletes work! (See below) viz:
constraint=Photo.tags.any(Tag.tag.like('tag-p0-%'))
todelete=session.query(Photo).filter(constraint)
#for p in todelete: # don't want to fetch things we nolonger need....
#    session.delete(p)
todelete.delete(synchronize_session=False)# this all works
session.commit()
If we use only the MyISAM table then the updates work! (since SA knows that the old Tags are orphaned) but the deletes of Photo don't cascade to delete the owned Tags.

If we only want to use MyISAM

It might be instructive to just use only MyISAM tables and pick up the slack programatically at least for databases that are rarely updated. First we remove the InnoDB reference and the passive references (The onupdate='RESTRICT' etc are now "harmless")
#    mysql_engine='InnoDB'
#    passive_deletes=True,passive_updates=True 
Now we have to be careful when we delete a Photo to ensure that the SQLA cascade='all, delete-orphan' is executed. Viz: this works:
constraint=Photo.tags.any(Tag.tag.like('tag-p0-%'))
todelete=session.query(Photo).filter(constraint)
for p in todelete: session.delete(p)
session.commit()
deleting the owned tags aswell and emitting the following SQL:
SELECT photo.id AS photo_id, photo.name AS photo_name 
FROM photo 
WHERE EXISTS (SELECT 1 FROM tag WHERE photo.id = tag.photo_id AND tag.tag 
	LIKE %s) ['tag-p0-%']
SELECT tag.id AS tag_id, tag.tag AS tag_tag, tag.photo_id AS tag_photo_id 
FROM tag 
WHERE %s = tag.photo_id [1L]
DELETE FROM tag WHERE tag.id = %s [[1L], [2L], [3L]]
DELETE FROM photo WHERE photo.id = %s [1L]
COMMIT
However, BULK DELETES DO NOT WORK! even with synchronize_session='fetch' (see here)
session.query(Photo).filter(constraint).delete(synchronize_session='fetch')
emitting only:
BEGIN
SELECT photo.id AS photo_id 
FROM photo WHERE EXISTS (SELECT 1 FROM tag WHERE photo.id = tag.photo_id AND tag.tag 
	LIKE %s) ['tag-p0-%']

DELETE FROM photo WHERE EXISTS (SELECT 1 
FROM tag  WHERE photo.id = tag.photo_id AND tag.tag LIKE %s) ['tag-p0-%']
COMMIT
If we want to use a bulk delete then we must do some garbage collection (see the Many2Many relationship below)
# delete required photos en-masse
session.query(Photo).filter(constraint).delete(synchronize_session=False)
# now remove any Tags that now have no parent
session.query(Tag).filter(~exists().where(Tag.photo_id == Photo.id)\
	.delete(synchronize_session=False)
# NOTE: this doesn't work since Tag.photo_id isn't if fact NULL after a bulk delete
session.query(Tag).filter(Tag.photo == None).delete()
# this doesn't even parse!
session.query(Tag).filter(~Tag.photo.in_(Photo)).delete()
The equivalent Elixir code is the following. First ensure all tables are InnoDB with using_table_options(mysql_engine='InnoDB') tables or use a global remedy:
import sqlalchemy
InnoDB=dict(
        mysql_engine='InnoDB',
        mysql_charset='utf8')
def inno(kwargs):
    for key in InnoDB:
        if key not in kwargs:
            kwargs[key]=InnoDB[key]
class Table(sqlalchemy.Table):
    def __init__(self,*args,**kwargs):
        inno(kwargs)# insert engine 
        super(sqlalchemy.Table,self).__init__(*args,**kwargs)
OR maybe better not to subclass just make it a function:
SQLATable = sqlalchemy.Table # avoid recursion!
def Table(*args,**kwargs):
	inno(kwargs)
	return SQLATable(*args,**kwargs)
# ensure all tables are InnoDB
sqlalchemy.Table = Table
Now use Elixir DDL
from elixir import options
options.options_defaults['shortnames']=True # keep it simple
from elixir import *
metadata.bind='mysql://user:password@localhost/test'
metadata.bind.echo = True
# handy function (note the To and "2" )
def Many2One(*args,**kwargs):
	kwargs.update(dict(ondelete='CASCADE',onupdate='RESTRICT'))
    return ManyToOne(*args,**kwargs)
def One2Many(*args,**kwargs):
	kwargs.update(dict(cascade='all, delete-orphan',
    	passive_deletes=True,passive_updates=True))
    return OneToMany(*args,**kwargs)

class Tag(Entity):
    tag=Field(String(80),nullable=False)
    photo=Many2One('Photo')

class Photo(Entity):
    name=Field(String(80),nullable=False)
    tags=One2Many('Tag')

setup_all()
create_all()
The only difference in the emitted DDL is that Elixir creates an index on photo_id CREATE INDEX ix_tag_photo_id ON tag (photo_id) equivalent to index=True on the photo_id Column. Does InnoDB automatically create such indexes?

Because Elixir uses a (Deprecated) context-aware Session we have to remove a relation before resetting it. Viz:

p=Photo.get(1)
p.tags=[]# need this to avoid orphaning Tags half way through the update
p.tags=[Tag(tag='t2'),Tag(tag='t3')]

The problem is with autoflush and delete-orphan see: delete-orphan and here

A closer look at the SQL emitted with different cascades on the relation

Setting passive_deletes=False and passive_updates=False. Let's look at the emitted deletion SQL for the following code
constraint=Photo.tags.any(Tag.tag.like('tag-p0-%'))
todelete=session.query(Photo).filter(constraint)
for p in todelete:
    session.delete(p)
constraint=Photo.tags.any(Tag.tag.like('tag-p1-%'))
todelete=session.query(Photo).filter(constraint)
for p in todelete:
    del p.tags[0]
constraint=Photo.tags.any(Tag.tag.like('tag-p2-%'))
todelete=session.query(Photo).filter(constraint)
for p in todelete: 
    tag=p.tags[0]
    p.tags.remove(tag)
session.commit()

Without a cascade tag set on the tags relation, SQLAlchemy just sets the photo_id foreign key to None in the tags table.

Adding cascade='delete' DOES NOT WORK! the Tags are not even added to the photos and no photo is deleted!

Setting cascade='all' or cascade='all, delete' results in a DELETE FROM tag WHERE tag.id = %s (1,2,3) being emitted for the session.delete(photo) but UPDATE tag SET photo_id=%s WHERE tag.id = %s (None,4) for the two other Tag removals.

With cascade='all, delete-orphan' or cascade='all, delete, delete-orphan' The other two Tags are also deleted. i.e. There doesn't seem to be a difference between del p.tag[0] and p.tag.remove(p.tag[0]).

Summary

  • Best to use InnoDB then ordinary and bulk deletes work.
  • Without InnoDB you should keep the cascade='all, delete-orphan' key and use just ordinary (session.delete(obj)) deletes.
  • If you do want to do a bulk delete (on a Parent) then you have to garbage collect the Children with another bulk delete (viz: session.query(Child).filter(~exists().where(Child.parent_id==Parent.id).delete()).

One To One relationship

We can take One To One relationship to be effectively a list of length 1. If we use uselist=False in the relation then we now have None for a non-existent object. Everything above applies. But can we go one better...
  • Can we effectively embed another row within our row? Yes using an association_proxy
  • Can we ensure deletion of the embedded row if the outer row is deleted? Yes using a foreign key and delete cascade
  • Can we prevent the deletion of the embedded row while the outer row still exists? No. Can't seem to have to foreign keys pointing back to one another with the necessary restrictions. This is probably related to the problem that foreign key integrity constraints for the InnoDB are check row-by-row instead of deferred until the end of the SQL statement (as the standard requires).
# note no reference to value 
photo = Table('photo',meta,
        Column('id',Integer,primary_key=True),
        Column('name',String(80),nullable=False),
        mysql_engine='InnoDB',
        )
# our "value" object containing 2 attributes.
value  = Table('value',meta,
        Column('id',Integer,primary_key=True),
        Column('attr1',Integer,nullable=False,default=0),
        Column('attr2',Integer,nullable=False,default=0),
        # can be a value that doesn't belong to any photo
        Column('photo_id',None,ForeignKey(photo.c.id,
        		onupdate='RESTRICT',ondelete='CASCADE'),
        		index=True,nullable=False),
        mysql_engine='InnoDB', # only this table type works MyISAM does not
        )
meta.create_all()
from sqlalchemy.ext.associationproxy import association_proxy
class Photo(OrmObject):
    attr1 = association_proxy('value','attr1') # associate attr1 directly on Photo
    attr2 = association_proxy('value','attr2')

class Value(OrmObject):pass

orm.mapper(Photo,photo,properties=dict(
    value=orm.relation(Value,uselist=False,
    	cascade='all, delete-orphan',# need for objects in memory
        passive_deletes=True,passive_updates=True
        ), # don't need to emit SQL because we have on delete cascade
))
orm.mapper(Value,value)
Now we can have the following:
for photo in session.query(Photo):
        print photo.value.attr1
        # OR
        print photo.attr1
        photo.attr2=505 # can set them too!

Bi Directional Links

In pursuing the objective of making the child value object undeletable unless its parent is being deleted Let's add a ForeignKey on the parent (Photo) object

photo = Table('photo',meta,
        Column('id',Integer,primary_key=True),
        Column('name',String(80),nullable=False),
        Column('value_id',Integer,ForeignKey('value.id',# string because value is not defined yet
        	onupdate='RESTRICT',ondelete='CASCADE', # Really want ondelete='RESTRICT'
       		 use_alter=True,name='parent'# avoid CirularDependecyError need 'name' for use
       	 	),index=True,nullable=False),         
        mysql_engine='InnoDB',
        )
value  = Table('value',meta,
        Column('id',Integer,primary_key=True),
        Column('attr1',String(80),nullable=False),
        Column('attr2',Integer,nullable=False,default=0),
        # can be a value that doesn't belong to any photo
        Column('photo_id',None,ForeignKey(photo.c.id,
        onupdate='RESTRICT',ondelete='CASCADE'),index=True, 
            nullable=True# value rows created before photo row created so no id yet
        ),       
        mysql_engine='InnoDB', # only this table type works MyISAM does not
        )
We now need to specify the primary joins etc from each relationship
orm.mapper(Photo,photo,properties=dict(
    value=orm.relation(Value,uselist=False
        ,cascade='all, delete-orphan',# need all
        passive_deletes=True,passive_updates=True
         ,primaryjoin=photo.c.id == value.c.photo_id# need to specify the join
        ,post_update=True# CircularDependencyError
        ), 
))

orm.mapper(Value,value,properties=dict(
    photo=orm.relation(Photo,uselist=False
        ,primaryjoin=value.c.id == photo.c.value_id # need to specify the join
        ,cascade='all, delete-orphan',
        passive_deletes=True,passive_updates=True
)))
One more thing needs to be done. We need to link the bi-directional relation at object creation time viz:
class Photo(object):
    def __init__(self,name,value):
        self.name=name
        self.value=value
        value.photo=self # link value object to this photo
    attr1 = association_proxy('value','attr1')
    attr2 = association_proxy('value','attr2')
With these relationships deleting either a Photo or a Value will delete the other. So just don't ever delete a Value!

Many To Many Relationships

This is much more difficult because we each child may have multiple parents or owners so we can't synchronize deletion/alteration with the parents deletion/alteration. In fact even with an InnoDB table we will have to instigate some form of garbage-collection. Since we will have to implement garbage collection anyway the InnoDB becomes a little less useful.

Here our model is a set of jobs that deal with a set of files. These files might be shared by many jobs so we don't want to remove them until the last job referencing them is done. First we create some helpers

# create a Table based on the InnoDB engine.
def InnoTable(*args,**kwargs):
        if 'mysql_engine' not in kwargs:
			kwargs['mysql_engine']='InnoDB'
        return Table(*args,**kwargs)
# create an InnoDB with an Integer id surrogate key
def IdTable(name,meta,*args,**kwargs):
       args = (Column('id',Integer,primary_key=True),) + args]
       return InnoTable(name,meta,*args,**kwargs)
# create a Foreign key that cascades deletes and prevents fiddling with id's
def InnoFK(*args,**kwargs):
	kwargs.update(dict(ondelete='CASCADE',onupdate='RESTRICT'))
	return ForeignKey(*args,**kwargs)
# create a relation that cascades deletes etc. but doesn't generate
# any excessive updates.
def passiverelation(*args,**kwargs):
        kwargs.update(dict(passive_deletes=True,passive_updates=True,
        	cascade='all, delete-orphan'))
        return orm.relation(*args,**kwargs)
# create an owned child table
def ChildTable(name,parent,meta,*args,**kwargs):
	args = (Column(parent.name+'_id',None,InnoFK(parent.c.id)),) + args
	return IdTable(name,meta,*args,**kwargs) 
# create a Join table from two IdTables
def JoinTable(left,right,meta,*args,**kwargs):
	return InnoTable(left.name+'2'+right.name,meta,
    	*([Column(left.name+'_id',None,InnoFK(left.c.id),primary_key=True),
    	   Column(right.name+'_id',None,InnoFK(right.c.id),primary_key=True)]
    	  + list(args)),**kwargs
    )

Now we are getting closer to Elixir DDL :).
jobs = IdTable('jobs',meta,
    Column('jobname',String(255),nullable=False),
    )
files = IdTable('files',meta,
    Column('filename',String(255),nullable=False),
    )
# create an association table
jobs2files = JoinTable(jobs,files,meta);
meta.create_all()
Now lets map these tables
class Job(OrmObject):pass
class File(OrmObject):pass
class Job2File(OrmObject):pass

orm.mapper(Job,jobs,properties=dict(
    files=passiverelation(File,secondary=jobs2files,
        backref=orm.backref('jobs',lazy='dynamic')# put a jobs list on each file
                                                  # for garbage collection
)))
orm.mapper(File,files)
#orm.mapper(Job2File,jobs2files)
Now populate a database with three jobs each with three files one file of which 'global' belongs to all of the jobs.
def populate():
    globalfile=File(filename='global')  
    for jobid in (0,1,2):
        files=[File(filename='job%d-file%d' % (jobid,fileid)) for fileid in (0,1)]
        job=Job(jobname='job%d' % jobid,files=files)
        job.files.append(globalfile)# add the global file
        session.add(job)
    session.commit()
populate()
Now lets delete 'job1' the normal (non bulk):
for job in session.query(Job).filter(Job.jobname == 'job1'):
        session.delete(job)
session.commit()
In the database the Job 'job1' has gone and because of the ondelete='CASCADE' the appropriate rows in the jobs2files table are gone also. But the files 'job1-file0' and 'job1-file1' are still in the files table. Note if we make all the tables MyISAM then orphan row exist in the jobs2files table also.

Let look at the various possibilities when we delete a job in two different ways.

Normal DeleteBulk Delete
cons=Job.jobname == 'job1'
todel=session.query(Job).filter(cons)
for job in todel: session.delete(job)
cons=Job.jobname == 'job1'
todel=session.query(Job).filter(cons)
todel.delete(synchronize_session=False)

Outcomes of various setting

SQLAchemy SetupEffects
Tablerelationcascadedeletefiles tablejobs2filescomments
InnoDBpassiveFalse(default)normal and bulktable untouchedcleanorphaned files hanging around
MyISAMpassiveFalse(default)normaltable untouched untouched orphaned files hanging around. jobs2files table untouched also. Adding cascade='all' has no effect
MyISAMactiveFalse(default)normaluntouchedclean(!) extra delete done on table jobs2files but orphaned files hanging around still.
DELETE FROM jobs2files WHERE jobs2files.job_id = %s AND jobs2files.file_id = %s 
	[[2L, 3L], [2L, 4L], [2L, 5L]]
DELETE FROM jobs WHERE jobs.id = %s [2L]
MyISAMactiveNonebulkuntoucheduntouched! orphaned files and jobs2files table untouched
MyISAMactive'all'bulkuntoucheduntouched! orphaned files and jobs2files table untouched
MyISAMactive'delete'normal and bulksee commentsee comment We get an IntegrityError! when trying to construct our tables. NULL values are being inserted into the jobs2file table
INSERT INTO jobs (jobname) VALUES (%s) 'job0'
INSERT INTO jobs (jobname) VALUES (%s) 'job1'
INSERT INTO jobs (jobname) VALUES (%s) 'job2'
INSERT INTO jobs2files (job_id, file_id) VALUES (%s, %s)
	[[1L, None], [1L, None], [1L, None], [2L, None], [2L, None], 
	[2L, None], [3L, None], [3L, None], [3L, None]]
MyISAMactive'all'normalglobal file deleted!clean(!) extra delete done on tables jobs2files and files so that global file has been deleted (in error!)
DELETE FROM jobs2files WHERE jobs2files.job_id = %s AND jobs2files.file_id = %s
[[1L, 3L], [3L, 3L]]
DELETE FROM jobs WHERE jobs.id = %s [2L]
DELETE FROM files WHERE files.id = %s [[3L], [4L], [5L]]

Garbage Collection

Clearly using an InnoDB with ForeignKeys is the best followed by using MyISAM and only doing normal deletes. We still need to garbage collect with
# this is where the backref comes in handy. Note that since the jobs2file
# relationship table is cleaned File.jobs is NULL
session.query(File).filter(File.jobs == None).delete(synchronize_session=False)
If we want to use a MyISAM database with bulk deletes then use also
session.query(Job2File).filter(~exists().where(Job2File.job_id == Job.id))\
	.delete(synchronize_session=False)
This can be automated at the session level with using the SessionExtension
from sqlalchemy.orm.interfaces import SessionExtension
class GarbageCollection(SessionExtension):
    def after_bulk_delete(self,session,query,ctxt,connection):
        if jobs in ctxt.froms:
        	session.query(File).filter(File.jobs == None).delete(synchronize_session=False)
engine=create_engine('mysql://user:password@localhost/test')
meta=MetaData(bind=engine)
Session=orm.scoped_session(orm.sessionmaker(bind=engine,extension=GarbageCollection()))

What have we learned?

So how do we instrument MySQL and SQLAlchemy so that we can make it bomb proof? Use InnoDB tables then One2Many (we take One2Many situations it is only incidental that there is only one parent to be a subset of the Many2Many relationship). Will key referential integrity perfectly for deletes. Use some helpers
parent = IdTable('parent',meta,
        Column('mycolumn',...)
        #...
        )
child = ChildTable('child',parent,meta,
        Column('mychildcol',....)
        #....
        )
class Parent(OrmObject):pass
class  Child(OrmObject):pass
orm.mapper(Parent,parent,properties=dict(
    children=passiverelation(Child,backref=orm.backref('parent'))
    #OR
    onlychild=passiverelation(Child,uselist=False,backref=orm.backref('parent'))
));		
orm.mapper(Child,child);
This should be good against passive and bulk deletes (see here again for caveats about bulk deletes).

Pickling/Saving SQLAlchemy queries in a database

if you attach an SQL alchemy query object directly to a (BLOB type) attribute. The SQLAlchemy API get's confused when generating SQL. Here is a workaround

# a table of SQLA queries
query = Table('query',meta,
        Column('id',Integer,primary_key=True),
        Column('name',String(255)),
        Column('rawsql',Text),
        # this must be first!
        # equal to: "last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP"
        Column('last_updated',TIMESTAMP),
        # must try to set to null when initializing SQLA does this automatically
        # equal to: "created TIMESTAMP DEFAULT '0000-00-00 00:00:00'"
        Column('created',TIMESTAMP),
        )
        
query.create()

# we create a simple decorator
from sqlalchemy.ext import serializer
class SQLAlchemy(object):
    def __init__(self,picklecol):
        self._picklecol = picklecol
    def __get__(self,instance,klass):
        # howto get metadata object? currently global 
        # meta=query.metadata
        prop=getattr(instance,self._picklecol)
        return serializer.loads(prop,meta,lambda:orm.object_session(instance))
    def __set__(self,instance,value):
        setattr(instance,self._picklecol,serializer.dumps(value))

# map sql to rawsql text
class Query(OrmObject):
    sql=SQLAlchemy('rawsql') # map rawsql to sql 

orm.mapper(Query,query)

def load():
    for i in xrange(10):
        o=Query(name='name%d' % i)
        so=Session().query(Query).filter_by(name=o.name)
        # set the query object
        o.sql=so
        session.add(o)
    session.commit()

def show():
    for o in Session().query(Query):
        row=o.sql.one()
        assert(o == row)

load()
show()

Links

SQLAlchemy Home
Coders Blog
Armin Roncher
logos The Government of Western Australia The University of Western Australia Australian Research Council Centre of Excellence in Plant Energy Biology