Tuesday, June 29, 2010

Contributions

We have received an encouraging amount of participation from the community. There are two ways to participate, writing blog entries or financial contributions.

I have already been in contact with Adam Groszer and Matthias(Nitro) by adding them to the blog. Hopefully we will see some articles from these two guys in the near term. Anyone is welcome to post a mini-howto, recipe or just their experience with the ZODB. All material is welcome.

Financial contributions will fund Carlos to focus to develop the book to completion. We currently have the first round of money to get an advance to Carlos. We need significantly more money to finish the book. All donors will be listed on the blog. I have put the first batch of donors on the widget. This will be updated occasionally as we get more contributions.

If you have either money or experience with ZODB -- you can participate. Contact me.

All those who have contributed - thank you.

Hooking transactions

Prerequisites

This article assumes you are already familiar with the basics of the transaction module. If "commit", "abort", "savepoint" or "rollback" don't mean much to you, please make yourself familiar with the basics of transactions first.

You might also want to download the example code for this article from: http://pastebin.com/77H9ns28. It shows most of the things discussed here in action.

What are transaction hooks?

Transaction hooks are user-defined functions which are called at certain times during the processing of a transaction. For example you can register hooks which are called after a commit.

Transaction hooks can be categorized into three methods:

  1. transaction synchronizers
  2. before/afterCommit hooks
  3. data managers

Each method has different characteristics and suits different use cases. Sometimes it might also be necessary to register a combination of these hooks to achieve the desired effect.

Use cases

When do you want to hook transactions? Example use cases include:

  • sending an e-mail notification every time a commits fails
  • synchronizing transactions with an external data storage like an rdbms
  • delaying object indexing until a transaction commits, rather than indexing every time an object is changed
  • check invariants after a set of operations
  • instrumentation of transactions

Choosing the right hook method for your use case

Transaction synchronizers

Registration:

Transaction synchronizers are registered and unregistered per transaction manager. Once registered they will be called for all subsequent transactions involving the transaction manager.

Points of invocation:

  • beforeCompletion: In commit(), before transaction status changes to "committing". After the "beforeCommit" hooks have been called.
  • afterCompletion: In commit(), after the commit has finished (either successfully or unsuccessfully, e.g. due to a conflict error)
  • beforeCompletion: In abort(), before aborting the transaction
  • afterCompletion: In abort(), after the transaction was aborted
  • newTransaction: Invoked when transaction.begin() is called.
  • The synchronizers are not invoked for savepoint/rollback actions

When to use:

Transaction synchronizers are useful when you want to execute code before and after commit()/abort() for each transaction. E.g. it could be used to send an e-mail notification for any commit that fails or is aborted. To check if a commit failed you can check the .status attribute of the transaction object which was passed in to the afterCompletion hook.

The ZODB.Connection class which you are likely using in your code is using the beforeCompletion/afterCompletion/newTransaction hooks to perform synchronization with the underlying storage for example.

Interfaces:


class ISynchronizer(zope.interface.Interface):
"""Objects that participate in the transaction-boundary notification API.
"""
def beforeCompletion(transaction):
"""Hook that is called by the transaction at the start of a commit.
"""

def afterCompletion(transaction):
"""Hook that is called by the transaction after completing a commit.
"""

def newTransaction(transaction):
"""Hook that is called at the start of a transaction.

This hook is called when, and only when, a transaction manager's
begin() method is called explictly.
"""





class ITransactionManager(zope.interface.Interface):
def registerSynch(synch):
"""Register an ISynchronizer.
Synchronizers are notified about some major events in a transaction's
life. See ISynchronizer for details.
"""
def unregisterSynch(synch):
"""Unregister an ISynchronizer.
Synchronizers are notified about some major events in a transaction's
life. See ISynchronizer for details."""


Before/after commit hooks

Registration:

Before/after commit hooks are registered per transaction. When the transaction has finished, the hooks are cleared and will not be called for the next transaction.

Points of invocation:

  • before commit hook: In commit(), before transaction status changes to "committing". Before the synchronizer "beforeCompletion" hook is called.
  • after commit hook:: In commit(), after the commit has finished (either successfully or unsuccessfully, e.g. due to a conflict error). Receives a "status" flag which tells you if the commit was successful or unsuccessful.
  • The before/after commit hooks are not invoked for savepoint/rollback actions
  • The before/after commit hooks are not invoked when aborting a transction

When to use:

Before/after commit hooks are useful when you want to execute code before and after the commit() of a specific transaction. For example if you want to send an e-mail if a highly critical transaction's commit() failed.

Interfaces:




class ITransaction(zope.interface.Interface):
def addBeforeCommitHook(hook, args=(), kws=None):
"""Register a hook to call before the transaction is committed.
The specified hook function will be called after the transaction's
commit method has been called, but before the commit process has been
started. The hook will be passed the specified positional (`args`)
and keyword (`kws`) arguments. `args` is a sequence of positional
arguments to be passed, defaulting to an empty tuple (no positional
arguments are passed). `kws` is a dictionary of keyword argument
names and values to be passed, or the default None (no keyword
arguments are passed).

Multiple hooks can be registered and will be called in the order they
were registered (first registered, first called). This method can
also be called from a hook: an executing hook can register more
hooks. Applications should take care to avoid creating infinite loops
by recursively registering hooks.

Hooks are called only for a top-level commit. A
savepoint creation does not call any hooks. If the
transaction is aborted, hooks are not called, and are discarded.
Calling a hook "consumes" its registration too: hook registrations
do not persist across transactions. If it's desired to call the same
hook on every transaction commit, then addBeforeCommitHook() must be
called with that hook during every transaction; in such a case
consider registering a synchronizer object via a TransactionManager's
registerSynch() method instead.
"""

def getBeforeCommitHooks():
"""Return iterable producing the registered addBeforeCommit hooks.
A triple (hook, args, kws) is produced for each registered hook.
The hooks are produced in the order in which they would be invoked
by a top-level transaction commit.
"""

def addAfterCommitHook(hook, args=(), kws=None):
"""Register a hook to call after a transaction commit attempt.
The specified hook function will be called after the transaction
commit succeeds or aborts. The first argument passed to the hook

is a Boolean value, true if the commit succeeded, or false if the
commit aborted. `args` specifies additional positional, and `kws`
keyword, arguments to pass to the hook. `args` is a sequence of
positional arguments to be passed, defaulting to an empty tuple
(only the true/false success argument is passed). `kws` is a
dictionary of keyword argument names and values to be passed, or
the default None (no keyword arguments are passed).

Multiple hooks can be registered and will be called in the order they
were registered (first registered, first called). This method can
also be called from a hook: an executing hook can register more
hooks. Applications should take care to avoid creating infinite loops
by recursively registering hooks.

Hooks are called only for a top-level commit. A
savepoint creation does not call any hooks. Calling a
hook "consumes" its registration: hook registrations do not
persist across transactions. If it's desired to call the same
hook on every transaction commit, then addAfterCommitHook() must be
called with that hook during every transaction; in such a case
consider registering a synchronizer object via a TransactionManager's
registerSynch() method instead.
"""

def getAfterCommitHooks():
"""Return iterable producing the registered addAfterCommit hooks.
A triple (hook, args, kws) is produced for each registered hook.
The hooks are produced in the order in which they would be invoked
by a top-level transaction commit.
"""

Further links:


Data managers

Registration:

Data managers are registered for each transaction. This is called "joining". When the transaction has finished, the data manager will not be automatically called for the next transaction.

Points of invocation:

  • commit
  • abort
  • savepoint
  • rollback

When to use:

Data managers are the most flexible kind of hook. They are the workhorses of the transaction module as they provide the implementations for the two phase commit. Putting the label "hook" on them does not describe their use cases well.

The ZODB.Connection class which you are likely using in your code is an implementation of a data manager.

Since data managers are so flexible, hook deep into the two phase commit and allow you to do many useful things, they deserve an article of their own.

Fortunately, this has already been done: http://repoze.org/tmdemo.html. While there are some repoze specific things in there, the article explains the gist of data managers very well. Please read the article if you want to know how to use data managers.

One thing that I'd like to see improved/made clear in regards to data managers is the order in which they are invoked. The sortKey() is currently only effective during commit whereas it is also useful to have a sorted data manager execution order for savepoints and rollbacks.

Interfaces:




class ITransaction(zope.interface.Interface):

def join(datamanager):
"""Add a data manager to the transaction.
`datamanager` must provide the transactions.interfaces.IDataManager
interface.
"""

class IDataManager(zope.interface.Interface):
"""Objects that manage transactional storage.

These objects may manage data for other objects, or they may manage
non-object storages, such as relational databases. For example,
a ZODB.Connection.

Note that when some data is modified, that data's data manager should
join a transaction so that data can be committed when the user commits
the transaction.
"""

transaction_manager = zope.interface.Attribute(
"""The transaction manager (TM) used by this data manager.

This is a public attribute, intended for read-only use. The value
is an instance of ITransactionManager, typically set by the data
manager's constructor.
""")

def abort(transaction):
"""Abort a transaction and forget all changes.

Abort must be called outside of a two-phase commit.

Abort is called by the transaction manager to abort
transactions that are not yet in a two-phase commit. It may
also be called when rolling back a savepoint made before the
data manager joined the transaction.

In any case, after abort is called, the data manager is no
longer participating in the transaction. If there are new
changes, the data manager must rejoin the transaction.
"""

# Two-phase commit protocol. These methods are called by the ITransaction
# object associated with the transaction being committed. The sequence
# of calls normally follows this regular expression:
# tpc_begin commit tpc_vote (tpc_finish | tpc_abort)
def tpc_begin(transaction):
"""Begin commit of a transaction, starting the two-phase commit.

transaction is the ITransaction instance associated with the
transaction being committed.
"""

def commit(transaction):
"""Commit modifications to registered objects.

Save changes to be made persistent if the transaction commits (if
tpc_finish is called later). If tpc_abort is called later, changes
must not persist.

This includes conflict detection and handling. If no conflicts or
errors occur, the data manager should be prepared to make the
changes persist when tpc_finish is called.
"""

def tpc_vote(transaction):
"""Verify that a data manager can commit the transaction.

This is the last chance for a data manager to vote 'no'. A
data manager votes 'no' by raising an exception.

transaction is the ITransaction instance associated with the
transaction being committed.
"""

def tpc_finish(transaction):
"""Indicate confirmation that the transaction is done.

Make all changes to objects modified by this transaction persist.

transaction is the ITransaction instance associated with the
transaction being committed.

This should never fail. If this raises an exception, the
database is not expected to maintain consistency; it's a
serious error.
"""

def tpc_abort(transaction):
"""Abort a transaction.

This is called by a transaction manager to end a two-phase commit on
the data manager. Abandon all changes to objects modified by this
transaction.

transaction is the ITransaction instance associated with the
transaction being committed.

This should never fail.
"""

def sortKey():
"""Return a key to use for ordering registered DataManagers.

ZODB uses a global sort order to prevent deadlock when it commits
transactions involving multiple resource managers. The resource
manager must define a sortKey() method that provides a global ordering
for resource managers.
"""
# Alternate version:
#"""Return a consistent sort key for this connection.
#
#This allows ordering multiple connections that use the same storage in
#a consistent manner. This is unique for the lifetime of a connection,
#which is good enough to avoid ZEO deadlocks.
#"""

Further links:


Monday, June 28, 2010

Thinking about the structure of the book

Update: to avoid continuous posting and reposting of the outline, we have created a page where the latest version will always be available. A link to this page is located in the top right portlet on this blog.

At the start of this project we posted a suggested outline, but while the research phase is going on, the book structure will likely undergo several changes. Please remember that on this phase your comments and suggestions are specially valuable and can be more influential on the final result. We really look forward to getting more comments and ideas.

Right now the outline I propose for the book is the following (thanks to Shane Hathaway for his input on this):

Part one: getting started. This part will have an emphasis on getting an application up and running while making simple use of the ZODB. A developer who just needs to add a simple persistent layer to his application might have enough with this.

  • Introduction to ZODB.
    This will be a very short chapter, just to get things going. What is the ZODB. Maybe some bits about the NoSQL craze, how the ZODB has been doing that for more than 10 years. Why is the ZODB a nice tool to keep in your Python developer's arsenal and when is it a good fit for your apps?

  • Your first ZODB application.
    Installation and running the first app. The objective of this chapter is to let the reader do something that works immediately. Just the basics to get an app running. Not a lot of details here.

  • Transactions.
    The ZODB depends on the transaction package and understanding this package is very important to working effectively with it. This chapter introduces transactions, shows what happens when you commit or abort, describes what a conflict error is and explains why it's a good idea to avoid long running transactions.

  • A more complex application.
    A bit more involved explanation of how the ZODB works and a more useful sample application. This chapter will build on our understanding of transactions.

  • Basic indexing and searching.
    The Catalog and indexes. I propose to use repoze.catalog here, which uses zope.index.

  • Maintenance.
    Packing, backups, etc.

  • Scaling.
  • The ZODB cache, ZEO and replication services.

    Part two - advanced topics. This will be a more in-depth review of techniques and concepts for ZODB development.


  • A more in-depth look at the ZODB internals.
    A little more information about how the ZODB works. At least enough stuff to understand the later chapters about storages and debugging.

  • Advanced transaction management. How to create data managers for working with other storages in the same transaction, how to best approach the need for well behaved, long running transactions.

  • ZODB Storages.
    Details about the FS storage and discussion of RelStorage and maybe DirectoryStorage.

  • Popular third party packages.
    Some of the most important packages for the ZODB will be described here.

  • Other indexing and searching strategies.
    Other catalog implementations, third party indexes and using external indexing solutions, like Solr.

  • Advanced ZODB.
    Evolving schemas, creating custom indexes, using ZODB in an asynchronous framework like twisted.

  • The debugging FAQ: frequent problems and suggested solutions.
    General debugging strategies and then a FAQ with common problems. For example, common traps like attempting to load an object state when the connection is closed

  • Part three - ZODB API

    The official public API will be documented here. This could serve as a really quick reference for developers. We might include APIs for some other modules, like transaction.




    Here we go

    Alan Runyan told me that the donations are piling up and to get things going now, so here I go.

    First of all, thanks to all the people and companies who have contributed so far. I've always admired the way the Zope community responds to these calls for action. I promise to make the most out of your donations.

    For those who may wonder who exactly is this person who will write the ZODB book, My name is Carlos de la Guardia and I've been part of the Zope community for more than 10 years. I recently wrote a book about web development with Grok, which actually includes a chapter on the ZODB and even a section about running it outside of Zope.

    I am very happy to get the chance to write about the ZODB and my aim is set on writing something that can both be useful to the community and open up the doors to get more Python developers benefiting from the ZODB. That's why Zope and Plone will not be discussed in the book and the focus will not be on web development. We want to appeal to a more general Python audience here.

    Throughout the writing process, we plan to use this blog to let people know how the book is going, as well as to publish small articles around ZODB topics that we'll cover. This is also a good place for volunteers to write about ZODB experiences, favorite packages, tips, etc. If anybody wants to contribute to the blog, please say so in the comments and we'll get you started. If you don't want to write posts but have an idea or suggestion for us that would be great as well.

    I will follow this post with an updated table of contents. The research phase for the book has officially started and I promise to keep you posted. Thanks again for the opportunity.

    Friday, June 25, 2010

    Topics for Upcoming ZODB Book

    There are several aspects of using the ZODB which need unusually in-depth coverage. What may be surprising to some readers is that these are supporting libraries of the ZODB. The most important of these libraries which could be used outside of the context of the ZODB is transaction. The transaction library for the ZODB is the most robust generic transaction library available in the Python world. It is library-oriented and easier to include into any python program than the deeply integrated transaction frameworks such as MTS or JTA. Why django and turbogears does not use the transaction package has puzzled me.

    Debugging approaches with the ZODB is vital. The majority of users reading this book will be familiar with RDBMS/SQL. Since ZODB is NoSQL this audience will need to understand how to use a new tool chain to accomplish debugging. An example of this would be: If you are using MSSQLServer you can run the SQL Profiler and watch the SQL statements being executed as your program sends them. The ZODB does not have a traditional client/server architecture such as SQL. This means that debugging requires a different tactic. (Looking for some experienced ZODB developers to blog on debugging approaches;)

    Another important characteristic of the ZODB which will need to be documented thoroughly will be concurrency and conflict resolution. Currently the ZODB is transactional (rumors in making this optional?). The current transaction requirement is at odds with most of NoSQL and one of the reasons the ZODB community does not more closely associate itself with the movement. Real world applications have concurrent updates. Newbies who do not use conflict-aware data structures can run into problems. This is a well known problem set and can be made approachable with documentation incrementally introducing people to the concepts. It is easy for experienced developers to dive into the inner-workings of technology and turn off new users. This is probably the topic we want the most focus.

    Partitioning databases, BLOBs, and Import/Export are native ZODB topics that will also be documented. What other features of the ZODB do you think should be documented? What priorities would you give each feature of the ZODB?

    Thursday, June 24, 2010

    Guiding Parameters for the Book

    The ZODB (Zope Object DataBase) is arguably the most widely deployed Object Oriented Persistence system in the world. Certainly it is the most widely deployed Python-centric persistence mechanism in the world. Every Zope, Plone, and Zenoss installation (well over 3 million downloads) ships with the ZODB. The library is usable by all versions of Python (except the 3.x series) and has been hardened with over a decade of production usage.

    This post is an attempt to provide some constraints around the upcoming ZODB book. Specifically to talk about what will be omitted from the book. I believe it is as important to talk about what is out of scope than what will be covered. So here goes:

    • Zope or Plone usage will not be covered. The goal of the ZODB book is to cover the principles and usage of the persistence system in any Python application. Many people use the ZODB in numerous framework contexts: django, repoze, twisted. Talking about the ZODB in the context of an application/framework shifts energy from the primary task, explaining how to use the ZODB and its supporting libraries from a batteries included Python distribution.

    • Private implementation details of the ZODB. I believe there is some sense of what is public / private members of the classes. Although this clarity is not consistent through all of the packages. There are details in packages like ZEO such as the RPC mechanism which are certainly private implementation and will not benefit ZODB consumers from thorough explanation/documentation.

    • Unused features. There are some aspects of the ZODB that exist and "could be used" but these features may not have wide adoption or consistent experiences from the broader community. Examples include persistent modules or persistent weakreferences. As these features become more widely used they will be revisited in subsequent updates of the ZODB book.

    While this is not an exhaustive list of what is to be considered out of scope for this book. I hope it sets peoples expectations of the types of things that are not initially considered for inclusion nor believed to be high priority. A complimentary post would discuss how to prioritize the work in the book. What aspects of the library should be given highest priority to focus on documentation.

    Tuesday, June 22, 2010

    Ready to accept donations

    Just added the Paypal widget. So if you want to donate. Send us some money. You can use Paypal or send us a check. Remember to review the contribution levels if you want a permanent placement in ZODB book. I would also like to take this time to solicit experienced ZODB developers who would like to write on this blog. I hope that this blog could host thoughts and lessons learned using various aspects of ZODB. If a post is particularly good - I believe Carlos has no problem pinching it and giving the author credit in the upcoming ZODB book.

    If you can contribute financially - there are no more road blocks. Make it happen.

    If you want to contribute by sharing your experiences with ZODB. Let me know. All you need is a blogger account and I will give you access to this blog.

    Carlos will be posting a revised book outline by the end of this week. Let's get him working. Contribute now.

    Monday, June 14, 2010

    ZODB Book Project

    The Z Object DataBase (ZODB) has been in production for over a decade. It is very useful outside of the Zope application server. The ZODB offers low friction reliable object persistence. ZODB is not SQL (it is hierarchical and does not require use of an ORM layer). Unlike many of its NoSQL cousins, the ZODB supports reliable transactions (ACID). Arguably the ZODB is the most widely deployed object database in the world. ZODB ships with every Plone, Zenoss, Grok, and Zope deployment.

    Unfortunately the lack of documentation for the ZODB does not reflect its maturity, robustness or feature set. The documentation is scattered. Most of the information is outdated. Many best practices have been learned over the past 10 years. There has not yet been a concerted effort to generate and maintain ZODB documentation. This initiative is set on fixing the first step, generating documentation. The following topical overview is a draft of what material may be covered:

    • Introduction to ZODB.
      What it is. Maybe some bits about the NoSQL craze, how the ZODB has been doing that for more than 10 years. When is the ZODB a good fit for your app?

    • Your first ZODB application.
      A short chapter dealing with installation and running the first app.

    • A more complex application.
      A bit more involved explanation of how it works and a more useful sample application.

    • Basic indexing and searching.
      The Catalog and indexes. I might use the repoze stuff, which is somewhat simpler.

    • A more in-depth look at the ZODB internals.
      A little more information about how the ZODB works. At least enough stuff to understand the later chapters about storages and debugging.

    • ZODB Storages.
      Details about the FS storage and discussion of RelStorage and maybe DirectoryStorage.

    • Other indexing and searching strategies.
      Other catalog implementations, third party indexes and using external indexing solutions, like Solr.

    • Advanced ZODB.
      Evolving objects and other problems...not really sure about this one yet.

    • The debugging FAQ: frequent problems and suggested solutions.
      General debugging strategies and then a FAQ with common problems.

    • Scaling.
      The ZODB cache, ZEO and replication services.

    • Maintenance.
      Packing, backups, etc.
    This project is set on collecting money from the community to fund a seasoned ZODB developer and published writer, Carlos de la Guardia to complete a Creative Commons licensed ZODB book. The effort is significant and we are looking to raise a relatively large amount of money. A professionally written and edited book will be a huge asset for the ZODB community. It will enable developers to evaluate the ZODB quickly and understand the modern patterns of software design applied by seasoned developers using ZODB in production.

    Update: Jim Fulton has volunteered to technically review the book!

    Individuals and organizations can contribute and will be recognized for their contribution in the following ways:
    • A printed version available on Amazon will have their name and/or logo's in the book.

    • The ZODB website will provide links to their website.

    • Their contribution will be documented in the main ZODB documentation and will permanently reside in the documentation.
    The following levels of support are available for sponsoring this project. The target goal is 10000 USD.
    • $20-149, Individual Name
    • $150-399, Company Name
    • $400-999, Company Name + Logo + URL; link on ZODB.ORG
    • $1000-2499, Company Name + .5 page "spread"; link on ZODB.ORG
    • $2500-, Company Name + full page "spread"; link on ZODB.ORG
    Some other aspects of this project:
    • Provide a modern unique logo/design for the ZODB project

    • We hope to flush out and clarify the meaning of all public interfaces and methods in ZODB source docstrings.

    • A Creative Commons licensed PDF will be available.

    • Put the new book online at zodb.org
    • Possibly provide a hard copy published on Amazon.
    The ZODB community gratefully appreciates your contribution. We hope to raise enough money to provide the advance to Carlos de la Guardia by end of July 2010. Once the advance is in place Carlos will start work. We will continue taking contributions. Depending on the amount of money we receive in total will dictate the depth and breadth of documentation. Several volunteers/participates have already signed up. We will be providing updates in the near future.

    The next steps:
    • Register the paypal account to take contributions.

    • Send email to various mailing lists and companies using ZODB soliciting donations

    • Get Carlos writing!