Post Image

Django Track Database Object Changes

When building a web application there are often times that you will need to the ability to revert back to an earlier state in time for an object in your database. A good example of this is a blog post, you make updates and changes over time and accidentally delete something or just wish to see what a previous revision looked like, it is beneficial to have all the revisions so you can revert back or view them. In this post I will demonstrate using Django's ORM how to preserve revisions of a database object.

 

Prerequisites

In writing this post I assume the following

  • You are familiar with Django applications
  • You are familiar with the Django ORM
    • You do not need to have expert knowledge (otherwise you likely wouldn't be reading this) just and understanding of creating models and various field types

 

Scenario

In this post I will use the example above, you are writing a blog application and need to keep revisions of posts over time. This will be utilizing 3 model objects and 2 database tables.

 

Starting Model

Lets say your blog application has the model defined below for your blog posts.

class Post(models.Model):
    
    author = models.ForeignKey(User, on_delete=models.CASCADE)
    title = models.CharField(max_length=500)
    content = models.TextField()
    published = models.BooleanField()

As you can see this is a very basic model of a post but it has most of the fields you would expect a post to have.

 

Now to create a revision of each post you may think that it would be best to create another independent model and define the same fields and each time a post is saved, save its contents in the revision. That would call for another model similar to below.

class Revision(models.Model):
    
    author = models.ForeignKey(User, on_delete=models.CASCADE)
    title = models.CharField(max_length=500)
    content = models.TextField()
    published = models.BooleanField()

 

This has some problems in regards to maintainability, first and foremost each update or change you make to the Post model you must make to the Revision model and each person that works on this will need to remember to do that. That scenario will for sure lead to inconsistencies in the future even if you are the only person working on the project. In addition there is no way of guaranteeing that these models are in any way related which decreases readability and increases complexity.

 

Solution

The solution to this is to utilize an abstract base class which is a Django model that will not create a table in the database, and instead serves as a base object for other related objects to inherit from.

Below I modify my Post model from above to be an abstract base class.

class BasePost(models.Model):

    author = models.ForeignKey(User, on_delete=models.CASCADE)
    title = models.CharField(max_length=500)
    content = models.TextField()
    published = models.BooleanField()
    
    class Meta:
        abstract = True

Notice I renamed the object to BasePost.

To be clear this model will NOT create a database table, and the reason is because we set abstract to True in the nested Meta class of the object. That is the only thing that needs to be done to make this an abstract model.

Now in order to define our post object we need a second object as defined below.

class Post(BasePost):
    pass

And now we have a Post model which has all 4 database fields that we defined in the BasePost object, however it will create a database table. This is functionally the same as our starting point but is more flexible from a code perspective.

 

Create the Revisions Object

Now we need to create the database model that will be our revision which will also need to inherit from BasePost so we can guarantee it has the required fields in the database.

class PostRevision(BasePost):
    post = models.ForeignKey(Post, related_name='revisions', on_delete=models.CASCADE)

In order to relate this to our Post object for querying later on in the application, I make a many to 1 relationship to the Post object. The key to note here is that this "post" column that is defined will only exist in the PostRevision table because it is not defined in the BasePost class.

At this point we now have our PostRevision object defined which will make a revision table in the database with all of the fields that a Post has.

 

Add Logic to Create a Revision Each Time a Post is Saved

At this point the models that we need exist and the database tables will be created when we run our migration, however we need to build the logic to create a revision each time a Post object is saved.

In order to do this we will need to override the Post objects save() method as shown below

class Post(BasePost):
    
    def save(self, *args, **kwargs):
        super().save(*args, **kwargs)
        
        revision_object = PostRevision()
        for common_field in BasePost._meta.get_fields():
            common_field_name = str(common_field.name)
            post_field_data = getattr(self, common_field_name)
            setattr(revision_object, common_field_name, post_field_data)
        
        revision_object.post = self
        revision_object.save()

To sum up what this does in plain English, whenever a post is saved we first save the post to the database. We then instantiate a PostRevision object, copy over all of the data from common fields from the Post to the PostRevision instance. Set the relationship of the PostRevision object to point at the post and lastly save the PostRevision object to the database.

If you did not understand the high level explanation or some parts just dont make clear sense lets take some time and step through the code here explaining each piece.

super().save(*args, **kwargs) - First things first we save the Post object to the database.

revision_object = PostRevision() - We instantiate an empty PostRevision object which we will later populate and save to the database.

for common_field in BasePost._meta.get_fields(): - This starts a for loop where we iterate over the database fields defined in the base object. This is the interesting piece, because we can guarantee that the Post and PostRevision object have at least the values in the BasePost object if we iterate over the fields in the BasePost object those are the fields that will be archived. That is a key point to understand also, only fields defined in BasePost will be archived, any fields defined directly in the Post object will not be copied over to the PostRevision object.

Another thing to note about this is even tho we are accessing the _meta attribute of the BasePost object which leading underscore would indicate it should not be accessed externally, it does have a stable API which you can read more on here.

common_field_name = str(common_field.name) - Because we are dealing with the field object, we only want the name, so we set a variable to the field name.

post_field_data = getattr(self, common_field_name) - We use the field name to get the contents of that field from the post object, which is self.

setattr(revision_object, common_field_name, post_field_data) - We set the corresponding field of our revision object to the field of the post object, so if we are currently iterating on the "title" field, we set the PostRevision objects "title" field to whatever the Post objects "title" field contents are. And remember because we are in a for loop we iterate over the author, title, content, and published fields on at a time, populating the PostRevision object.

revision_object.post = self - We set the relationship up in the PostRevision object to point to the post we are saving.

revision_object.save() - Lastly we save the PostRevision object

 

At this point all we need to know is that every time a Post object is created or updated, a PostRevision entry will be added to the database that is a snapshot of the Post object at the time of its save and no additional code or logic needs to be run anywhere for this to happen. Additionally a new PostRevision database entry will be added each time so no matter how many times you save a Post object you will be able to access any previous revision of it via one of the many PostRevision objects that are related to the Post

I did say this a few times earlier in the post but the key point here to understand is that the ONLY fields that will be saved in the PostRevision object are the fields that are defined in the BasePost object. If we define additional database fields in the Post model they WILL NOT be archived, so in the future any database fields that you want to be included in the archive must be defined in the BasePost model.

If you are curious why we don't copy over fields that are not defined in the BasePost model, it is because that is the model that guarantees that both the Post and PostRevision objects have same fields, and if we tried copying over everything from the Post model but had other fields that were defined directly there, it would throw an error.

 

Completed Model

Below I have the completed models as they would exist in your models.py file so it is easier to read or use in your own project.

class BasePost(models.Model):
    """
    Base model for posts to allow for revision history in the database. This object WILL NOT create
    a table in the database and instead is only used as a base object to inherit from in code.
    """

    author = models.ForeignKey(User, on_delete=models.CASCADE)
    title = models.CharField(max_length=500)
    content = models.TextField()
    published = models.BooleanField()

    class Meta:
        abstract = True

class Post(BasePost):

    def save(self, *args, **kwargs):
        """
        Overrides the save method to first save the Post object and then create a PostRevision object,
        copy over the data from the Post to the PostRevision and save the Post Revision object.

        *NOTE* Only fields that are defined in BasePost will be saved into the PostRevision object
        """
        super().save(*args, **kwargs)

        revision_object = PostRevision()
        for common_field in BasePost._meta.get_fields():
            common_field_name = str(common_field.name)
            post_field_data = getattr(self, common_field_name)
            setattr(revision_object, common_field_name, post_field_data)

        revision_object.post = self
        revision_object.save()


class PostRevision(BasePost):
    """
    Stores Revisions of posts as the Post object is saved.
    """
    post = models.ForeignKey(Post, related_name='revisions', on_delete=models.CASCADE)

 

 

And that is it, after running a database migration you will preserve every revision of the Post object in your application. This can also be modified to implement revisions on any database model you wish.

 

Any questions or if anything needs clarification please leave a comment below and I can clear up any confusion



Comments (0)
Leave a Comment