Optimisation Techniques

Let’s talk about some techniques that will help to optimise your code when using ORMs with SQLAlchemy. You can checkout the step-3-orms branch to follow along this interactive section.

Relationship Loading Techniques

SQLAlchemy provides extensive configuration over how related objects mapped using relationship() get loaded when querying. We can configure this by using relationship.lazy parameter of relationship().

There are 3 categories of relationship loading techniques: lazy, eager, and no loading.

Lazy Loading

Lazy loading occurs when objects are returned from a query without loading the related objects initially. When the related object is first accessed, additional SELECT statements are emitted to access that particular related object. This is the default loading technique when using SQLAlchemy ORMs.

Thus in lazy loading, for a Parent and Child relationship, if we perform select(Parent), only 1 SELECT statement will be emmitted and data will be fetched for attributes of Parent. If we then try to access the child, say print(parent.child), another SELECT statement will be emmitted to fetch the attributes of Child.

Lazy loading is helpful when you know that you will not use all the related or child objects, or even if you will not require them instantly. It will also be beneficial on the “collection side” of a many-to-one relationship. Otherwise, if you use eager loading, SQLAlchemy will try to JOIN the tables to fetch all data at once, which can be expensive. We’ll discuss eager loading in detail in a bit.

We’ve used eager loading for the Customer and Address relationship. In order to configure lazy loading, use lazy="select" in the relationship() construct. Let’s convert it to lazy loading to see what effect it can have when fetching all customers.

marketsvc/db/customer.py
class Customer(Base):
    ...

    address: Mapped["Address"] = relationship(
        back_populates="customer", lazy="select"
    )  # one to one
marketsvc/db/address.py
class Address(Base):
    ...

    customer: Mapped["Customer"] = relationship(
        back_populates="address",
        lazy="select",
    )  # one to one

Let’s add a print statement in get_customers() to access the related object, Address, for each of our customers within the session.

marketsvc/db_accessor.py
def get_customers():
    with Session(engine) as session:
        stmt = select(Customer)
        result = session.execute(stmt)
        customers = result.scalars().all()
        for customer in customers:
            print(customer.address)

        return customers

You can run this by using ./run.sh customers.

SQLAlchemy logs
BEGIN (implicit)
SELECT customer.id, customer.name, customer.address_id  FROM customer
[generated in 0.00023s] ()
SELECT address.id AS address_id, address.flat_number AS address_flat_number, address.post_code AS address_post_code FROM address WHERE address.id = ?
[generated in 0.00016s] (1,)
SELECT address.id AS address_id, address.flat_number AS address_flat_number, address.post_code AS address_post_code  FROM address WHERE address.id = ?
[cached since 0.001042s ago] (2,)
SELECT address.id AS address_id, address.flat_number AS address_flat_number, address.post_code AS address_post_code  FROM address WHERE address.id = ?
[cached since 0.001606s ago] (3,)
ROLLBACK

Here, the first SELECT statement only fetches the attributes linked to Customer i.e. id, name, and address_id. Since we added print to print address details for each customer, we have 3 more SELECT statements to fetch address data for a particular address_id, as seen with the WHERE clause and the bound parameter. This occurs for each customer.

What if?

What happens if there are 100 customers? What about 1000? Or, what if we have 100,000 customers? Some real world applications have millions of customers. How many SELECT statements would be emitted to fetch customer and address information?

This is a well-known problem known as the N+1 problem. Initially, one SELECT statement is emitted to load the result collection of parent objects. Then, for each parent object, an additional SELECT is emitted to load attributes of each related child object. Consequently, for N parent objects, we would have N + 1 SELECT statements.

Test Your Understanding

For this scenario to fetch customer information, we require the Address information of all customers. Lazy loading is not a good strategy here. It becomes beneficial when we know we wouldn’t be needing all the child objects. Is there a place in our service where lazy loading might be helpful?

Eager Loading

We can fix the N+1 problem by using eager loading, where the related objects are loaded with the parent object up front. The child objects are automatically loaded along with its parent object.

The ORM accomplishes this by consolidating a JOIN to the SELECT statement to load the related objects simultaneously, or by emitting additional SELECT statements to load the related child objects.

Eager loading is advantageous when we want to fetch information about all or numerous child objects along with its parent object. It’s also a good practice to use this to reduce further queries and alleviate further load to the database.

As a result, eager loading is the right choice for Customer and Address relationship.

Joined Eager Loading is the most prominent type of eager loading. It applies a JOIN to the given SELECT statement to that related objects are loaded in the same result.

Let’s update relationship() in both Customer and Address to lazy="joined", and run the request to get the customer information.

Once again, run this by using ./run.sh customers.

SQLAlchemy logs
BEGIN (implicit)
SELECT customer.id, customer.name, customer.address_id, address_1.id AS id_1, address_1.flat_number, address_1.post_code
FROM customer LEFT OUTER JOIN address AS address_1 ON address_1.id = customer.address_id
[generated in 0.00075s] ()
ROLLBACK

Here, you can see how we only emit 1 query to the database. The query has a JOIN to join the two tables and fetch all related information at once. We’ve also successfully avoided the N+1 problem.

No Loading

No loading means to disable loading on a relationship. The child object is empty and is never loaded, or an error is raised with its accessed. No loading is used to prevent unwanted lazy loads. You can trigger this with the lazy="raise" parameter.

Try It Yourself

There are more relationship loading techniques, but the ones described above are the most popular. Try experimenting with different queries yourself and see the generated SQL statements in the logs.

SQL Functions

You can also use SQL functions including aggregate functions while working with ORMs. We can create Function objects by using the func object, which acts as a factory.

Let’s use the SUM() function to get the total cost of an order in the SQL query itself, rather than querying the individual item prices and summing them up in Python.

marketsvc/db_accessor.py
from sqlalchemy.sql import func

...

def get_total_cost_of_an_order(order_id):
    with Session(engine) as session:
        result = session.execute(
            select(func.sum(Item.price * OrderItems.quantity))
            .join(Orders.order_items)
            .join(OrderItems.item)
            .where(Orders.id == order_id)
        )
        total_cost = result.scalar()

        return total_cost

Hit the /api/order_total endpoint by using the shell script with a argument that indicates the order_id of the order you wish to get the total of.

./run.sh ordertotal 1
SQLAlchemy logs
BEGIN (implicit)
SELECT sum(item.price * order_items.quantity) AS sum_1
FROM orders JOIN order_items ON orders.id = order_items.order_id JOIN item ON item.id = order_items.item_id
WHERE orders.id = ?
[generated in 0.00024s] (1,)
ROLLBACK

Here, we’re directly using the SUM() aggregate function and returning a single value from our SQL query. Similarly, we can use any SQL functions while using ORMs depending on our use case.

Hybrid Attributes

Hybrid attributes have different behaviours at class and instance level, hence the name “hybrid”. This is best explained with an example.

Hybrid Properties

Let us consider the order_items table. The quantity of an order is defined here and the price of an item is defined in the items table. Whenever we want to get the total cost of an item, we have to multiply item price and quantity for each item in the order.

Alternatively, we could introduce a hybrid property that represents the total cost of an item. We can do this with the @hybrid_property decorator.

marketsvc/db/order_items.py
from sqlalchemy.ext.hybrid import hybrid_property

class OrderItems(Base):
    ...

    @hybrid_property
    def item_total(self):
        return self.item.price * self.quantity

Here, the item_total property returns the product of item’s price and the quantity of that item bought. On instances of OrderItems, the multiplication occurs in Python.

Thus, in as_dict() method of Orders, where we return the information of an order, we can directly get the item’s total by order_item.item_total. This makes it much easier to access the item total.

marketsvc/db/orders.py
def as_dict(self):
    return {
        ...
        "items": [
            {
                ...
                "total": order_item.item_total,
            }
            for order_item in self.order_items
        ],
    }

Hybrid Expressions

To use hybrid attributes in the SQL query, we need to use the hybrid_property.expression() modifier. This is because its SQL expression must be differentiated from the Python expression.

marketsvc/db/order_items.py
from sqlalchemy.ext.hybrid import hybrid_property

class OrderItems(Base):
    ...

    @hybrid_property
    def item_total(self):
        return self.item.price * self.quantity

    @item_total.expression
    @classmethod
    def item_total(cls):
        return Item.price * cls.quantity

Above, we’ve also added an expression modifier to the item_total hybrid property. Notice how this function is a class method and operates on the OrderItems and Item classes, similar to how we might do in the select() query.

Consequently, our query to get the total cost of an order becomes even simpler. Here, we have directly accessed the hybrid expression in the query.

marketsvc/db_accessor.py
def get_total_cost_of_an_order(order_id):
    with Session(engine) as session:
        result = session.execute(
            select(func.sum(OrderItems.item_total))
            .join(Orders.order_items)
            .join(OrderItems.item)
            .where(Orders.id == order_id)
        )
        total_cost = result.scalar()

        return total_cost

OrderItems.item_total would be equivalent to Item.price * OrderItems.quantity used earlier.

You might notice hybrid properties are similar to Python’s @property decorator. We can also define setters on hybrid properties but it doesn’t make sense to ‘set’ the item total in our case.

 

We now have explored some ways how we can optimise our code using ORMs. There are many more techniques that would be hard to cover in a short span of time. What we’ve discussed about SQLAlchemy is just the tip of the iceberg. Feel free to browse the SQLAlchemy documentation to find ways to make your life easier!

Having said that, let’s switch gears and look at asyncio, and how we can utilise it in our service.

Kudos!

🙌 You have now reached the step-4-optimisatinos part of the tutorial. If not, checkout that branch and continue from there:

git checkout step-4-optimisations

 

results matching ""

    No results matching ""