Tuesday, June 29, 2010

Hooking transactions

Prerequisites

This article assumes you are already familiar with the basics of the transaction module. If "commit", "abort", "savepoint" or "rollback" don't mean much to you, please make yourself familiar with the basics of transactions first.

You might also want to download the example code for this article from: http://pastebin.com/77H9ns28. It shows most of the things discussed here in action.

What are transaction hooks?

Transaction hooks are user-defined functions which are called at certain times during the processing of a transaction. For example you can register hooks which are called after a commit.

Transaction hooks can be categorized into three methods:

  1. transaction synchronizers
  2. before/afterCommit hooks
  3. data managers

Each method has different characteristics and suits different use cases. Sometimes it might also be necessary to register a combination of these hooks to achieve the desired effect.

Use cases

When do you want to hook transactions? Example use cases include:

  • sending an e-mail notification every time a commits fails
  • synchronizing transactions with an external data storage like an rdbms
  • delaying object indexing until a transaction commits, rather than indexing every time an object is changed
  • check invariants after a set of operations
  • instrumentation of transactions

Choosing the right hook method for your use case

Transaction synchronizers

Registration:

Transaction synchronizers are registered and unregistered per transaction manager. Once registered they will be called for all subsequent transactions involving the transaction manager.

Points of invocation:

  • beforeCompletion: In commit(), before transaction status changes to "committing". After the "beforeCommit" hooks have been called.
  • afterCompletion: In commit(), after the commit has finished (either successfully or unsuccessfully, e.g. due to a conflict error)
  • beforeCompletion: In abort(), before aborting the transaction
  • afterCompletion: In abort(), after the transaction was aborted
  • newTransaction: Invoked when transaction.begin() is called.
  • The synchronizers are not invoked for savepoint/rollback actions

When to use:

Transaction synchronizers are useful when you want to execute code before and after commit()/abort() for each transaction. E.g. it could be used to send an e-mail notification for any commit that fails or is aborted. To check if a commit failed you can check the .status attribute of the transaction object which was passed in to the afterCompletion hook.

The ZODB.Connection class which you are likely using in your code is using the beforeCompletion/afterCompletion/newTransaction hooks to perform synchronization with the underlying storage for example.

Interfaces:


class ISynchronizer(zope.interface.Interface):
"""Objects that participate in the transaction-boundary notification API.
"""
def beforeCompletion(transaction):
"""Hook that is called by the transaction at the start of a commit.
"""

def afterCompletion(transaction):
"""Hook that is called by the transaction after completing a commit.
"""

def newTransaction(transaction):
"""Hook that is called at the start of a transaction.

This hook is called when, and only when, a transaction manager's
begin() method is called explictly.
"""





class ITransactionManager(zope.interface.Interface):
def registerSynch(synch):
"""Register an ISynchronizer.
Synchronizers are notified about some major events in a transaction's
life. See ISynchronizer for details.
"""
def unregisterSynch(synch):
"""Unregister an ISynchronizer.
Synchronizers are notified about some major events in a transaction's
life. See ISynchronizer for details."""


Before/after commit hooks

Registration:

Before/after commit hooks are registered per transaction. When the transaction has finished, the hooks are cleared and will not be called for the next transaction.

Points of invocation:

  • before commit hook: In commit(), before transaction status changes to "committing". Before the synchronizer "beforeCompletion" hook is called.
  • after commit hook:: In commit(), after the commit has finished (either successfully or unsuccessfully, e.g. due to a conflict error). Receives a "status" flag which tells you if the commit was successful or unsuccessful.
  • The before/after commit hooks are not invoked for savepoint/rollback actions
  • The before/after commit hooks are not invoked when aborting a transction

When to use:

Before/after commit hooks are useful when you want to execute code before and after the commit() of a specific transaction. For example if you want to send an e-mail if a highly critical transaction's commit() failed.

Interfaces:




class ITransaction(zope.interface.Interface):
def addBeforeCommitHook(hook, args=(), kws=None):
"""Register a hook to call before the transaction is committed.
The specified hook function will be called after the transaction's
commit method has been called, but before the commit process has been
started. The hook will be passed the specified positional (`args`)
and keyword (`kws`) arguments. `args` is a sequence of positional
arguments to be passed, defaulting to an empty tuple (no positional
arguments are passed). `kws` is a dictionary of keyword argument
names and values to be passed, or the default None (no keyword
arguments are passed).

Multiple hooks can be registered and will be called in the order they
were registered (first registered, first called). This method can
also be called from a hook: an executing hook can register more
hooks. Applications should take care to avoid creating infinite loops
by recursively registering hooks.

Hooks are called only for a top-level commit. A
savepoint creation does not call any hooks. If the
transaction is aborted, hooks are not called, and are discarded.
Calling a hook "consumes" its registration too: hook registrations
do not persist across transactions. If it's desired to call the same
hook on every transaction commit, then addBeforeCommitHook() must be
called with that hook during every transaction; in such a case
consider registering a synchronizer object via a TransactionManager's
registerSynch() method instead.
"""

def getBeforeCommitHooks():
"""Return iterable producing the registered addBeforeCommit hooks.
A triple (hook, args, kws) is produced for each registered hook.
The hooks are produced in the order in which they would be invoked
by a top-level transaction commit.
"""

def addAfterCommitHook(hook, args=(), kws=None):
"""Register a hook to call after a transaction commit attempt.
The specified hook function will be called after the transaction
commit succeeds or aborts. The first argument passed to the hook

is a Boolean value, true if the commit succeeded, or false if the
commit aborted. `args` specifies additional positional, and `kws`
keyword, arguments to pass to the hook. `args` is a sequence of
positional arguments to be passed, defaulting to an empty tuple
(only the true/false success argument is passed). `kws` is a
dictionary of keyword argument names and values to be passed, or
the default None (no keyword arguments are passed).

Multiple hooks can be registered and will be called in the order they
were registered (first registered, first called). This method can
also be called from a hook: an executing hook can register more
hooks. Applications should take care to avoid creating infinite loops
by recursively registering hooks.

Hooks are called only for a top-level commit. A
savepoint creation does not call any hooks. Calling a
hook "consumes" its registration: hook registrations do not
persist across transactions. If it's desired to call the same
hook on every transaction commit, then addAfterCommitHook() must be
called with that hook during every transaction; in such a case
consider registering a synchronizer object via a TransactionManager's
registerSynch() method instead.
"""

def getAfterCommitHooks():
"""Return iterable producing the registered addAfterCommit hooks.
A triple (hook, args, kws) is produced for each registered hook.
The hooks are produced in the order in which they would be invoked
by a top-level transaction commit.
"""

Further links:


Data managers

Registration:

Data managers are registered for each transaction. This is called "joining". When the transaction has finished, the data manager will not be automatically called for the next transaction.

Points of invocation:

  • commit
  • abort
  • savepoint
  • rollback

When to use:

Data managers are the most flexible kind of hook. They are the workhorses of the transaction module as they provide the implementations for the two phase commit. Putting the label "hook" on them does not describe their use cases well.

The ZODB.Connection class which you are likely using in your code is an implementation of a data manager.

Since data managers are so flexible, hook deep into the two phase commit and allow you to do many useful things, they deserve an article of their own.

Fortunately, this has already been done: http://repoze.org/tmdemo.html. While there are some repoze specific things in there, the article explains the gist of data managers very well. Please read the article if you want to know how to use data managers.

One thing that I'd like to see improved/made clear in regards to data managers is the order in which they are invoked. The sortKey() is currently only effective during commit whereas it is also useful to have a sorted data manager execution order for savepoints and rollbacks.

Interfaces:




class ITransaction(zope.interface.Interface):

def join(datamanager):
"""Add a data manager to the transaction.
`datamanager` must provide the transactions.interfaces.IDataManager
interface.
"""

class IDataManager(zope.interface.Interface):
"""Objects that manage transactional storage.

These objects may manage data for other objects, or they may manage
non-object storages, such as relational databases. For example,
a ZODB.Connection.

Note that when some data is modified, that data's data manager should
join a transaction so that data can be committed when the user commits
the transaction.
"""

transaction_manager = zope.interface.Attribute(
"""The transaction manager (TM) used by this data manager.

This is a public attribute, intended for read-only use. The value
is an instance of ITransactionManager, typically set by the data
manager's constructor.
""")

def abort(transaction):
"""Abort a transaction and forget all changes.

Abort must be called outside of a two-phase commit.

Abort is called by the transaction manager to abort
transactions that are not yet in a two-phase commit. It may
also be called when rolling back a savepoint made before the
data manager joined the transaction.

In any case, after abort is called, the data manager is no
longer participating in the transaction. If there are new
changes, the data manager must rejoin the transaction.
"""

# Two-phase commit protocol. These methods are called by the ITransaction
# object associated with the transaction being committed. The sequence
# of calls normally follows this regular expression:
# tpc_begin commit tpc_vote (tpc_finish | tpc_abort)
def tpc_begin(transaction):
"""Begin commit of a transaction, starting the two-phase commit.

transaction is the ITransaction instance associated with the
transaction being committed.
"""

def commit(transaction):
"""Commit modifications to registered objects.

Save changes to be made persistent if the transaction commits (if
tpc_finish is called later). If tpc_abort is called later, changes
must not persist.

This includes conflict detection and handling. If no conflicts or
errors occur, the data manager should be prepared to make the
changes persist when tpc_finish is called.
"""

def tpc_vote(transaction):
"""Verify that a data manager can commit the transaction.

This is the last chance for a data manager to vote 'no'. A
data manager votes 'no' by raising an exception.

transaction is the ITransaction instance associated with the
transaction being committed.
"""

def tpc_finish(transaction):
"""Indicate confirmation that the transaction is done.

Make all changes to objects modified by this transaction persist.

transaction is the ITransaction instance associated with the
transaction being committed.

This should never fail. If this raises an exception, the
database is not expected to maintain consistency; it's a
serious error.
"""

def tpc_abort(transaction):
"""Abort a transaction.

This is called by a transaction manager to end a two-phase commit on
the data manager. Abandon all changes to objects modified by this
transaction.

transaction is the ITransaction instance associated with the
transaction being committed.

This should never fail.
"""

def sortKey():
"""Return a key to use for ordering registered DataManagers.

ZODB uses a global sort order to prevent deadlock when it commits
transactions involving multiple resource managers. The resource
manager must define a sortKey() method that provides a global ordering
for resource managers.
"""
# Alternate version:
#"""Return a consistent sort key for this connection.
#
#This allows ordering multiple connections that use the same storage in
#a consistent manner. This is unique for the lifetime of a connection,
#which is good enough to avoid ZEO deadlocks.
#"""

Further links:


No comments:

Post a Comment