Summary :
I am trying to minimize the number of queries my JPA based java application makes to the database. I specified the @BatchFetch(BatchFetchType.IN)
optimization hint, but I still see some extra queries that I would consider as redundant and unnecessary.
Details:
Considering a simple domain model: We have Invoice mgmt system. Invoice has OneToOne relationship with Order. We also have customer, which has OneToMany relationship with Orders. (Customer 1->M Order 1<-1 Invoice). Find more details here. Find the complete source code here.
Here is the entity definition, as it currently stands:
Client.java (Excluding getters and setter):
@Entity(name = "CUSTOMER")
public class Customer {
@Id //signifies the primary key
@Column(name = "CUST_ID", nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private long custId;
@Column(name = "FIRST_NAME", length = 50)
private String firstName;
@OneToMany(mappedBy="customer",targetEntity=Order.class,
fetch=FetchType.LAZY)
private Collection<Order> orders;
}
Order.java (Excluding getters and setter):
@Entity(name = "ORDERS")
public class Order {
@Id
@Column(name = "ORDER_ID", nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private long orderId;
@Column(name = "TOTAL_PRICE", precision = 2)
private double totPrice;
@OneToOne(fetch = FetchType.LAZY, optional = false, cascade = CascadeType.ALL, mappedBy = "order")
private Invoice invoice;
@ManyToOne(optional = false)
@JoinColumn(name = "CUST_ID", referencedColumnName = "CUST_ID")
private Customer customer;
@ManyToMany(fetch = FetchType.LAZY)
@JoinTable(name = "ORDER_DETAIL", joinColumns = @JoinColumn(name = "ORDER_ID", referencedColumnName = "ORDER_ID"), inverseJoinColumns = @JoinColumn(name = "PROD_ID", referencedColumnName = "PROD_ID"))
private List<Product> productList;
}
Invoice.java (Excluding getters and setter):
@Entity(name = "ORDER_INVOICE")
public class Invoice {
@Id
// signifies the primary key
@Column(name = "INVOICE_ID", nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private long invoiceId;
@Column(name = "AMOUNT_DUE", precision = 2)
private double amountDue;
@OneToOne(optional = false, fetch = FetchType.LAZY)
@JoinColumn(name = "ORDER_ID")
private Order order;
}
With this model in place, I ran a simple test to fetch all the orders of a customer.
EntityManagerFactory entityManagerFactory = Persistence.createEntityManagerFactory("testjpa");
EntityManager em = entityManagerFactory.createEntityManager();
Customer customer = em.find(Customer.class, 100L);
Collection<Order> orders = customer.getOrders();
for(Order order: orders){
System.out.println(order.getInvoice().getInvoiceId());
}
em.close();
Since everything was lazy fetched, we got four queries, as shown:
1398882535950|1|1|statement|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = ?)|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = 100)
1398882535981|0|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = 100)
1398882535995|1|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 111)
1398882536004|0|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 222)
Since I don't want N+1 calls for getting Invoices, I thought of using batch fetch and reduce the total queries to 4 (One query to fetch Invoices for all the Orders of Customer). To do the same, I updated my Order entity, as shown:
Updated - Order.java, Adding BatchFetch for Invoice. (Excluding getters and setter):
@Entity(name = "ORDERS")
public class Order {
@Id
@Column(name = "ORDER_ID", nullable = false)
@GeneratedValue(strategy = GenerationType.AUTO)
private long orderId;
@Column(name = "TOTAL_PRICE", precision = 2)
private double totPrice;
@BatchFetch(BatchFetchType.IN)
@OneToOne(fetch = FetchType.LAZY, optional = false, cascade = CascadeType.ALL, mappedBy = "order")
private Invoice invoice;
@ManyToOne(optional = false)
@JoinColumn(name = "CUST_ID", referencedColumnName = "CUST_ID")
private Customer customer;
@ManyToMany(fetch = FetchType.LAZY)
@JoinTable(name = "ORDER_DETAIL", joinColumns = @JoinColumn(name = "ORDER_ID", referencedColumnName = "ORDER_ID"), inverseJoinColumns = @JoinColumn(name = "PROD_ID", referencedColumnName = "PROD_ID"))
private List<Product> productList;
}
I ran the same test again, and was assuming there will be 3 queries to fetch the data.(One for customer, One for Orders and One to batch fetch Invoices). However, eclipselink
generates 5 queries for the same. Here are the queries:
1398883197009|1|1|statement|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = ?)|SELECT CUST_ID, APPT, CITY, EMAIL, FIRST_NAME, LAST_NAME, STREET, LAST_UPDATED_TIME, ZIP_CODE FROM CUSTOMER WHERE (CUST_ID = 100)
1398883197030|0|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (CUST_ID = 100)
1398883197037|1|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID IN (?,?))|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID IN (111,222))
1398883197042|1|1|statement|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (ORDER_ID = ?)|SELECT ORDER_ID, OREDER_DESC, ORDER_DATE, TOTAL_PRICE, LAST_UPDATED_TIME, CUST_ID FROM ORDERS WHERE (ORDER_ID = 222)
1398883197045|0|1|statement|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = ?)|SELECT INVOICE_ID, AMOUNT_DUE, DATE_CANCELLED, DATE_RAISED, DATE_SETTLED, LAST_UPDATED_TIME, ORDER_ID FROM ORDER_INVOICE WHERE (ORDER_ID = 222)
I didn't understand why the last two queries are being generated. Any help to explain what's going on would be helpful.
Thanks!