Another killer feature of the sphinx documentation is that Disqus has now been integrated into the HTML rendering of the pages. You can comment on each of the pages directly. Very cool. You can also see the magic which enables Disqus/Sphinx integration on github. http://github.com/cguardia/ZODB-Documentation/blob/master/book/_templates/page.html I have not seen Disqus being used with sphinx. But I hope it catches on.
Monday, November 1, 2010
Now Updated Daily on zodb.org
Another killer feature of the sphinx documentation is that Disqus has now been integrated into the HTML rendering of the pages. You can comment on each of the pages directly. Very cool. You can also see the magic which enables Disqus/Sphinx integration on github. http://github.com/cguardia/ZODB-Documentation/blob/master/book/_templates/page.html I have not seen Disqus being used with sphinx. But I hope it catches on.
Chapter 3 draft ready, book text online
Tuesday, September 21, 2010
Chapter 2 draft and some words about the plan
A word of warning, though. People with experience on the ZODB will most likely find the material and tone are way too entry-level to be useful for them. That is because the audience that we will try to reach with part one of the book are Python developers in general. We assume no knowledge of the ZODB, not even of its existence.
Wednesday, August 4, 2010
Introductory chapter draft available on GitHub
If you can spend a few minutes to read it and comment, that would be very helpful.
By the way, there is now a portlet at the top right of the blog that shows the latest commits to the repository. This may make it easier to keep track of the book's progress.
Wednesday, July 7, 2010
GitHub project created
Thursday, July 1, 2010
Tools for the writing process
Tuesday, June 29, 2010
Contributions
I have already been in contact with Adam Groszer and Matthias(Nitro) by adding them to the blog. Hopefully we will see some articles from these two guys in the near term. Anyone is welcome to post a mini-howto, recipe or just their experience with the ZODB. All material is welcome.
Financial contributions will fund Carlos to focus to develop the book to completion. We currently have the first round of money to get an advance to Carlos. We need significantly more money to finish the book. All donors will be listed on the blog. I have put the first batch of donors on the widget. This will be updated occasionally as we get more contributions.
If you have either money or experience with ZODB -- you can participate. Contact me.
All those who have contributed - thank you.
Hooking transactions
Prerequisites
This article assumes you are already familiar with the basics of the transaction module. If "commit", "abort", "savepoint" or "rollback" don't mean much to you, please make yourself familiar with the basics of transactions first.
You might also want to download the example code for this article from: http://pastebin.com/77H9ns28. It shows most of the things discussed here in action.
What are transaction hooks?
Transaction hooks are user-defined functions which are called at certain times during the processing of a transaction. For example you can register hooks which are called after a commit.
Transaction hooks can be categorized into three methods:
- transaction synchronizers
- before/afterCommit hooks
- data managers
Each method has different characteristics and suits different use cases. Sometimes it might also be necessary to register a combination of these hooks to achieve the desired effect.
Use cases
When do you want to hook transactions? Example use cases include:
- sending an e-mail notification every time a commits fails
- synchronizing transactions with an external data storage like an rdbms
- delaying object indexing until a transaction commits, rather than indexing every time an object is changed
- check invariants after a set of operations
- instrumentation of transactions
Choosing the right hook method for your use case
Transaction synchronizers
Registration:
Transaction synchronizers are registered and unregistered per transaction manager. Once registered they will be called for all subsequent transactions involving the transaction manager.
Points of invocation:
- beforeCompletion: In commit(), before transaction status changes to "committing". After the "beforeCommit" hooks have been called.
- afterCompletion: In commit(), after the commit has finished (either successfully or unsuccessfully, e.g. due to a conflict error)
- beforeCompletion: In abort(), before aborting the transaction
- afterCompletion: In abort(), after the transaction was aborted
- newTransaction: Invoked when transaction.begin() is called.
- The synchronizers are not invoked for savepoint/rollback actions
When to use:
Transaction synchronizers are useful when you want to execute code before and after commit()/abort() for each transaction. E.g. it could be used to send an e-mail notification for any commit that fails or is aborted. To check if a commit failed you can check the .status attribute of the transaction object which was passed in to the afterCompletion hook.
The ZODB.Connection class which you are likely using in your code is using the beforeCompletion/afterCompletion/newTransaction hooks to perform synchronization with the underlying storage for example.
Interfaces:
class ISynchronizer(zope.interface.Interface):
"""Objects that participate in the transaction-boundary notification API.
"""
def beforeCompletion(transaction):
"""Hook that is called by the transaction at the start of a commit.
"""
def afterCompletion(transaction):
"""Hook that is called by the transaction after completing a commit.
"""
def newTransaction(transaction):
"""Hook that is called at the start of a transaction.
This hook is called when, and only when, a transaction manager's
begin() method is called explictly.
"""
class ITransactionManager(zope.interface.Interface):
def registerSynch(synch):
"""Register an ISynchronizer.
Synchronizers are notified about some major events in a transaction's
life. See ISynchronizer for details.
"""
def unregisterSynch(synch):
"""Unregister an ISynchronizer.
Synchronizers are notified about some major events in a transaction's
life. See ISynchronizer for details."""
Before/after commit hooks
Registration:
Before/after commit hooks are registered per transaction. When the transaction has finished, the hooks are cleared and will not be called for the next transaction.
Points of invocation:
- before commit hook: In commit(), before transaction status changes to "committing". Before the synchronizer "beforeCompletion" hook is called.
- after commit hook:: In commit(), after the commit has finished (either successfully or unsuccessfully, e.g. due to a conflict error). Receives a "status" flag which tells you if the commit was successful or unsuccessful.
- The before/after commit hooks are not invoked for savepoint/rollback actions
- The before/after commit hooks are not invoked when aborting a transction
When to use:
Before/after commit hooks are useful when you want to execute code before and after the commit() of a specific transaction. For example if you want to send an e-mail if a highly critical transaction's commit() failed.
Interfaces:
class ITransaction(zope.interface.Interface):
def addBeforeCommitHook(hook, args=(), kws=None):
"""Register a hook to call before the transaction is committed.
The specified hook function will be called after the transaction's
commit method has been called, but before the commit process has been
started. The hook will be passed the specified positional (`args`)
and keyword (`kws`) arguments. `args` is a sequence of positional
arguments to be passed, defaulting to an empty tuple (no positional
arguments are passed). `kws` is a dictionary of keyword argument
names and values to be passed, or the default None (no keyword
arguments are passed).
Multiple hooks can be registered and will be called in the order they
were registered (first registered, first called). This method can
also be called from a hook: an executing hook can register more
hooks. Applications should take care to avoid creating infinite loops
by recursively registering hooks.
Hooks are called only for a top-level commit. A
savepoint creation does not call any hooks. If the
transaction is aborted, hooks are not called, and are discarded.
Calling a hook "consumes" its registration too: hook registrations
do not persist across transactions. If it's desired to call the same
hook on every transaction commit, then addBeforeCommitHook() must be
called with that hook during every transaction; in such a case
consider registering a synchronizer object via a TransactionManager's
registerSynch() method instead.
"""
def getBeforeCommitHooks():
"""Return iterable producing the registered addBeforeCommit hooks.
A triple (hook, args, kws) is produced for each registered hook.
The hooks are produced in the order in which they would be invoked
by a top-level transaction commit.
"""
def addAfterCommitHook(hook, args=(), kws=None):
"""Register a hook to call after a transaction commit attempt.
The specified hook function will be called after the transaction
commit succeeds or aborts. The first argument passed to the hook
is a Boolean value, true if the commit succeeded, or false if the
commit aborted. `args` specifies additional positional, and `kws`
keyword, arguments to pass to the hook. `args` is a sequence of
positional arguments to be passed, defaulting to an empty tuple
(only the true/false success argument is passed). `kws` is a
dictionary of keyword argument names and values to be passed, or
the default None (no keyword arguments are passed).
Multiple hooks can be registered and will be called in the order they
were registered (first registered, first called). This method can
also be called from a hook: an executing hook can register more
hooks. Applications should take care to avoid creating infinite loops
by recursively registering hooks.
Hooks are called only for a top-level commit. A
savepoint creation does not call any hooks. Calling a
hook "consumes" its registration: hook registrations do not
persist across transactions. If it's desired to call the same
hook on every transaction commit, then addAfterCommitHook() must be
called with that hook during every transaction; in such a case
consider registering a synchronizer object via a TransactionManager's
registerSynch() method instead.
"""
def getAfterCommitHooks():
"""Return iterable producing the registered addAfterCommit hooks.
A triple (hook, args, kws) is produced for each registered hook.
The hooks are produced in the order in which they would be invoked
by a top-level transaction commit.
"""
Further links:
Data managers
Registration:
Data managers are registered for each transaction. This is called "joining". When the transaction has finished, the data manager will not be automatically called for the next transaction.
Points of invocation:
- commit
- abort
- savepoint
- rollback
When to use:
Data managers are the most flexible kind of hook. They are the workhorses of the transaction module as they provide the implementations for the two phase commit. Putting the label "hook" on them does not describe their use cases well.
The ZODB.Connection class which you are likely using in your code is an implementation of a data manager.
Since data managers are so flexible, hook deep into the two phase commit and allow you to do many useful things, they deserve an article of their own.
Fortunately, this has already been done: http://repoze.org/tmdemo.html. While there are some repoze specific things in there, the article explains the gist of data managers very well. Please read the article if you want to know how to use data managers.
One thing that I'd like to see improved/made clear in regards to data managers is the order in which they are invoked. The sortKey() is currently only effective during commit whereas it is also useful to have a sorted data manager execution order for savepoints and rollbacks.
Interfaces:
class ITransaction(zope.interface.Interface):
def join(datamanager):
"""Add a data manager to the transaction.
`datamanager` must provide the transactions.interfaces.IDataManager
interface.
"""
class IDataManager(zope.interface.Interface):
"""Objects that manage transactional storage.
These objects may manage data for other objects, or they may manage
non-object storages, such as relational databases. For example,
a ZODB.Connection.
Note that when some data is modified, that data's data manager should
join a transaction so that data can be committed when the user commits
the transaction.
"""
transaction_manager = zope.interface.Attribute(
"""The transaction manager (TM) used by this data manager.
This is a public attribute, intended for read-only use. The value
is an instance of ITransactionManager, typically set by the data
manager's constructor.
""")
def abort(transaction):
"""Abort a transaction and forget all changes.
Abort must be called outside of a two-phase commit.
Abort is called by the transaction manager to abort
transactions that are not yet in a two-phase commit. It may
also be called when rolling back a savepoint made before the
data manager joined the transaction.
In any case, after abort is called, the data manager is no
longer participating in the transaction. If there are new
changes, the data manager must rejoin the transaction.
"""
# Two-phase commit protocol. These methods are called by the ITransaction
# object associated with the transaction being committed. The sequence
# of calls normally follows this regular expression:
# tpc_begin commit tpc_vote (tpc_finish | tpc_abort)
def tpc_begin(transaction):
"""Begin commit of a transaction, starting the two-phase commit.
transaction is the ITransaction instance associated with the
transaction being committed.
"""
def commit(transaction):
"""Commit modifications to registered objects.
Save changes to be made persistent if the transaction commits (if
tpc_finish is called later). If tpc_abort is called later, changes
must not persist.
This includes conflict detection and handling. If no conflicts or
errors occur, the data manager should be prepared to make the
changes persist when tpc_finish is called.
"""
def tpc_vote(transaction):
"""Verify that a data manager can commit the transaction.
This is the last chance for a data manager to vote 'no'. A
data manager votes 'no' by raising an exception.
transaction is the ITransaction instance associated with the
transaction being committed.
"""
def tpc_finish(transaction):
"""Indicate confirmation that the transaction is done.
Make all changes to objects modified by this transaction persist.
transaction is the ITransaction instance associated with the
transaction being committed.
This should never fail. If this raises an exception, the
database is not expected to maintain consistency; it's a
serious error.
"""
def tpc_abort(transaction):
"""Abort a transaction.
This is called by a transaction manager to end a two-phase commit on
the data manager. Abandon all changes to objects modified by this
transaction.
transaction is the ITransaction instance associated with the
transaction being committed.
This should never fail.
"""
def sortKey():
"""Return a key to use for ordering registered DataManagers.
ZODB uses a global sort order to prevent deadlock when it commits
transactions involving multiple resource managers. The resource
manager must define a sortKey() method that provides a global ordering
for resource managers.
"""
# Alternate version:
#"""Return a consistent sort key for this connection.
#
#This allows ordering multiple connections that use the same storage in
#a consistent manner. This is unique for the lifetime of a connection,
#which is good enough to avoid ZEO deadlocks.
#"""Further links:
Monday, June 28, 2010
Thinking about the structure of the book
This will be a very short chapter, just to get things going. What is the ZODB. Maybe some bits about the NoSQL craze, how the ZODB has been doing that for more than 10 years. Why is the ZODB a nice tool to keep in your Python developer's arsenal and when is it a good fit for your apps?
Installation and running the first app. The objective of this chapter is to let the reader do something that works immediately. Just the basics to get an app running. Not a lot of details here.
The ZODB depends on the transaction package and understanding this package is very important to working effectively with it. This chapter introduces transactions, shows what happens when you commit or abort, describes what a conflict error is and explains why it's a good idea to avoid long running transactions.
A bit more involved explanation of how the ZODB works and a more useful sample application. This chapter will build on our understanding of transactions.
The Catalog and indexes. I propose to use repoze.catalog here, which uses zope.index.
Packing, backups, etc.
Part two - advanced topics. This will be a more in-depth review of techniques and concepts for ZODB development.
A little more information about how the ZODB works. At least enough stuff to understand the later chapters about storages and debugging.
Details about the FS storage and discussion of RelStorage and maybe DirectoryStorage.
Some of the most important packages for the ZODB will be described here.
Other catalog implementations, third party indexes and using external indexing solutions, like Solr.
Evolving schemas, creating custom indexes, using ZODB in an asynchronous framework like twisted.
General debugging strategies and then a FAQ with common problems. For example, common traps like attempting to load an object state when the connection is closed
Here we go
Friday, June 25, 2010
Topics for Upcoming ZODB Book
Debugging approaches with the ZODB is vital. The majority of users reading this book will be familiar with RDBMS/SQL. Since ZODB is NoSQL this audience will need to understand how to use a new tool chain to accomplish debugging. An example of this would be: If you are using MSSQLServer you can run the SQL Profiler and watch the SQL statements being executed as your program sends them. The ZODB does not have a traditional client/server architecture such as SQL. This means that debugging requires a different tactic. (Looking for some experienced ZODB developers to blog on debugging approaches;)
Another important characteristic of the ZODB which will need to be documented thoroughly will be concurrency and conflict resolution. Currently the ZODB is transactional (rumors in making this optional?). The current transaction requirement is at odds with most of NoSQL and one of the reasons the ZODB community does not more closely associate itself with the movement. Real world applications have concurrent updates. Newbies who do not use conflict-aware data structures can run into problems. This is a well known problem set and can be made approachable with documentation incrementally introducing people to the concepts. It is easy for experienced developers to dive into the inner-workings of technology and turn off new users. This is probably the topic we want the most focus.
Partitioning databases, BLOBs, and Import/Export are native ZODB topics that will also be documented. What other features of the ZODB do you think should be documented? What priorities would you give each feature of the ZODB?
Thursday, June 24, 2010
Guiding Parameters for the Book
This post is an attempt to provide some constraints around the upcoming ZODB book. Specifically to talk about what will be omitted from the book. I believe it is as important to talk about what is out of scope than what will be covered. So here goes:
- Zope or Plone usage will not be covered. The goal of the ZODB book is to cover the principles and usage of the persistence system in any Python application. Many people use the ZODB in numerous framework contexts: django, repoze, twisted. Talking about the ZODB in the context of an application/framework shifts energy from the primary task, explaining how to use the ZODB and its supporting libraries from a batteries included Python distribution.
- Private implementation details of the ZODB. I believe there is some sense of what is public / private members of the classes. Although this clarity is not consistent through all of the packages. There are details in packages like ZEO such as the RPC mechanism which are certainly private implementation and will not benefit ZODB consumers from thorough explanation/documentation.
- Unused features. There are some aspects of the ZODB that exist and "could be used" but these features may not have wide adoption or consistent experiences from the broader community. Examples include persistent modules or persistent weakreferences. As these features become more widely used they will be revisited in subsequent updates of the ZODB book.
While this is not an exhaustive list of what is to be considered out of scope for this book. I hope it sets peoples expectations of the types of things that are not initially considered for inclusion nor believed to be high priority. A complimentary post would discuss how to prioritize the work in the book. What aspects of the library should be given highest priority to focus on documentation.
Tuesday, June 22, 2010
Ready to accept donations
Monday, June 14, 2010
ZODB Book Project
Unfortunately the lack of documentation for the ZODB does not reflect its maturity, robustness or feature set. The documentation is scattered. Most of the information is outdated. Many best practices have been learned over the past 10 years. There has not yet been a concerted effort to generate and maintain ZODB documentation. This initiative is set on fixing the first step, generating documentation. The following topical overview is a draft of what material may be covered:
- Introduction to ZODB.
What it is. Maybe some bits about the NoSQL craze, how the ZODB has been doing that for more than 10 years. When is the ZODB a good fit for your app? - Your first ZODB application.
A short chapter dealing with installation and running the first app. - A more complex application.
A bit more involved explanation of how it works and a more useful sample application. - Basic indexing and searching.
The Catalog and indexes. I might use the repoze stuff, which is somewhat simpler. - A more in-depth look at the ZODB internals.
A little more information about how the ZODB works. At least enough stuff to understand the later chapters about storages and debugging. - ZODB Storages.
Details about the FS storage and discussion of RelStorage and maybe DirectoryStorage. - Other indexing and searching strategies.
Other catalog implementations, third party indexes and using external indexing solutions, like Solr. - Advanced ZODB.
Evolving objects and other problems...not really sure about this one yet. - The debugging FAQ: frequent problems and suggested solutions.
General debugging strategies and then a FAQ with common problems. - Scaling.
The ZODB cache, ZEO and replication services. - Maintenance.
Packing, backups, etc.
Update: Jim Fulton has volunteered to technically review the book!
Individuals and organizations can contribute and will be recognized for their contribution in the following ways:
- A printed version available on Amazon will have their name and/or logo's in the book.
- The ZODB website will provide links to their website.
- Their contribution will be documented in the main ZODB documentation and will permanently reside in the documentation.
- $20-149, Individual Name
- $150-399, Company Name
- $400-999, Company Name + Logo + URL; link on ZODB.ORG
- $1000-2499, Company Name + .5 page "spread"; link on ZODB.ORG
- $2500-, Company Name + full page "spread"; link on ZODB.ORG
- Provide a modern unique logo/design for the ZODB project
- We hope to flush out and clarify the meaning of all public interfaces and methods in ZODB source docstrings.
- A Creative Commons licensed PDF will be available.
- Put the new book online at zodb.org
- Possibly provide a hard copy published on Amazon.
The next steps:
- Register the paypal account to take contributions.
- Send email to various mailing lists and companies using ZODB soliciting donations
- Get Carlos writing!