This article considers the issues of one to many relationships from the JPA model, and looks at an alternative strategy to provide more efficient and fine grained data access, to build more robust and lightweight applications and web services.

A fairly typical use is to have one entity ‘owned’ by the other in such a way that one entity is said to ‘have’ many instances of the other one. A typical example would be customer and orders :

class Customer {

  @OneToMany(mappedBy="customer")
  private Set<Order> orders;
 
}

class Order {
  @ManyToOne
  private Customer customer;
}

In this trivial example, the order belongs to a customer, and the customer has a set of orders. We don’t have a problem with the ManyToOne relationship, especially as it is required in order to map the order back to the customer. When we load an order we will at most get a single reference to a customer.

No, our problem is with the value we get from customer.getOrders() as this set of order entities doesn’t really serve any useful purpose and can cause more problems than it solves for the following reasons :

  1. Dumb Relationship – It will contain every order for this particular customer when you usually only want a subset of the orders that match a set of criteria. You either have to read them all and filter the ones you don’t want manually (which is what SQL is for) or you end up having to make a call to a method to get the specific entities you are interested in.
  2. Unbounded dataset – How many orders a customer has could vary and you could end up with a customer with thousands of orders. Combined with accidental eager fetching and loading a simple list of 10 people could mean loading thousands of entities.
  3. Unsecured Access – Sometimes we may want to restrict the items visible to the user based on their security rights. By making it available as a property controlled by JPA we lose that ability or have to implement it further down in the application stack.
  4. No Pagination – Similar to the unbounded dataset, you end up throwing the whole list into the pagination components and letting them sort out what to display. In most cases, you need to treat each dataset like it will eventually contain more than 30 records so you really need to consider pagination early.
  5. Overgrown object graph – When you request an entity, how much of the object graph do you need? How do you know which pieces to initialize so you can avoid LIEs? This is often the case with JPA, but is also more relevant when you take account of the needs to serialize object graphs to XML or JSON. Sometimes you might need the relationships and sometimes you do not depending on the context you will be using the data in.
  6. Rife with pitfalls – Who saves and cascades what and how do you bind one to the other? You create an order, and assign the customer, do you need to then add it to the customers list of orders or not. What happens if you forget to add it to the customer and you save the customer? Whatever strategy you pick for dealing with this will no doubt end up being implemented inconsistently.

(Ok, the first four are really different facets of the same problem, that you can’t control the data you are getting back.)

So what use are they? Well, they make it really tempting just to use customer.orders in the view which is suitable for some sets of data. They also allow the relationship to be used in ejbql statements, although the inverse of the relationship can also be used in most cases. Specifying this relationship can also allow you to cascade updates/deletes from the customer to the order, but then so can your database.

Going Granular

The best alternative I’ve found is to provide additional methods to obtain the relational information separate from the model. This more granular approach gives you plenty of ways of obtaining data from the database without the dangers and temptations of bad practices. For example, the Order object still has the Customer reference on it, which we use to obtain lists of orders from the data access layer which can be constrained by customer, time frame, or other criteria depending on where it is being used. Also, it allows data to be fetched when needed without having to define a single initialization strategy using annotations or mapping files. The code that knows what pieces of data it needs will have access to facilities to fetch the specific data it needs. Alternatively, the methods to fetch the data can either be exposed as web services directly or DTO objects can be used to build a data payload to be returned from a single web service that consolidates the calls. Regardless, you don’t need to worry about setting the JPA fetch or XML/JSON serialization policy permanently in the model.

Some examples might be to fetch orders for a customer in different ways.

public List<Order> getOrders(Long customerId) {...}
public List<Order> getOrders(Long customerId,Date startDate,Date endDate) {...}
public List<Order> getOrders(SearchCriteria searchCriteria,int firstResult,int pageSize) {...}

What about @ManyToMany

Good question. In most cases I find that what starts as a many to many relationship can usually be modeled as a separate entity because when you create a many to many relationship, there is usually additional information stored with that relationship. For example, a Users and Groups ManyToMany relationship has many users belonging to many groups and vice versa. The membership however also probably has start and end dates and also maybe a role within that group. This also exhibits one of the earlier problems in that user.getGroupMemberships() would return all group memberships past and present whereas you probably only want the active ones. Modeling it as a separate entity means it becomes an entity with two OneToMany relationships.

While there are cases where the many to many relationship is literally just a pair of ids (think blog post tags, many tags to many posts), you could benefit at a later date by using an entity if you decide to add additional information into the relationship.

In summary, moving relational fetches out of the data model and into the data layer means you remove some of the temptations of bad practices and create a library of reusable functions for fetching the data that can be used from different code points.