EAV Data Model

One problem many developers encounter while defining and analysing data requirements is the situation where a number of different attributes can be used to describe an object, but only few attributes actually apply to each one. One option is to create a table with a column representing each attribute; this is suitable for objects with a fixed number of attributes, where all or most attributes have values for a most objects. However, in our case we would end up with records where majority of columns would be empty, because attributes may be unknown or inapplicable.

To solve the above problem you can apply the EAV (Entity, Attribute, Value) model. In the EAV data model only non-empty values are stored in database, where each attribute-value (or key-value) pair describes one attribute of a given entity.

  • Entity : An Entity is a ‘thing’, and object. Examples could be ‘Person’, ‘Employee’, ‘Contract’.
  • Attribute : An Attribute describes a part of the Entity. If we take the Entity ‘Person’ as an example, attributes could be ‘Name’, ‘Birthday’, ‘Nationality’, etc.
  • Value : The Value of the Attribute, i.e. ‘John’, ’12-12-1970’, ‘English’.


Following table is a representation of a very simple EAV dataset.

Table: Entity

EntityID Entity
1 Employee
33 Contract

Table: Attribute

AttributeID Attribute
100 Name
101 Birthday
102 Nationality
360 Type
361 Startdate

Table: EAV_Data

ID EntityID AttributeID Value
1234 1 100 John
1234 1 101 12-12-1970
1234 1 102 English
5678 33 360 Fulltime
5678 33 361 01-01-2009

In the above example, we notice a few things.
First of all, in a relational datamodel we probably would have had a ‘Employee’ table and a ‘Contract’ table.
In our EAV example, most the data is stored in 1 single table. Both the tables (Entities) and columns (Attributes) are stored in a row-wise manner, therefore there is no ‘fixed’ datastructure. Second, all the ‘Values’ are stored in the same column, which implicates that we have 1 datatype to store everything (varchars, datetimes, ints, etc.).
To keep this example simple, ill ignore this ‘datatype-problem’ for now.

EAV-modelling works exceptionally well for systems that need a great amount of flexibility.
Both ‘tables’ and ‘columns’ can be dynamically added without changing the underlying model.
Systems that need this amount of flexibility are i.e. clinical applications.
So instead of having several (fixed) smaller tables, you end up with one (flexible) huge table.

While this is all fun and games for systems that storing information, things can get a bit challenging when you need to build reports.
First challenge: How do you retrieve EAV data?
To demonstrate the differences between a relational model and a EAV model when building queries, I’ll try to make an easy selection on all employee’s with the name ‘John’ that have the ‘English’ nationality.

The relation employee table in this example has the following columns: ID, Name, Last Name, Birthday, Nationality and Marriagal status.

Example 1: Relational table

SELECT * 
FROM Employee 
WHERE Name = 'John' 
AND Nationality = 'English'

Result

ID Name Last name Birthday Nationality Marriagal Status
1234 John Doe 12-12-1970 English Married
365 John Cook 01-08-1983 English Single

All the information from an Employee is stored in 1 record, which makes it a very easy task to select all the information we need.
In contrary to the Employee table, the EAV table has every attribute (column) stored in a new record.

Example 3: EAV table with selfjoins

SELECT  t1.ID as 'ID'
, t1.Value AS 'Name' 
, t2.Value AS 'Nationality'
, t3.Value AS Birthday
FROM EAV_Data t1
LEFT JOIN EAV_Data t2
ON t1.ID = t2.ID
LEFT JOIN EAV_Data t3
ON t1.ID = t3.ID
WHERE t1.Attribute = 'Name'
AND t1.Value = 'John'
AND t2.Attribute = 'Nationality'
AND t2.Value = 'English'
AND t3.Attribute = 'Birthday'

Result

ID Name Nationality Birthday
1234 John English 12-12-1970
365 John English 01-08-1983

While this works to some extent, you’ll notice that the query for the EAV-data is much more complex that the relational one. On top of that, in order to get the same amount of columns as the relational query, we need even more selfjoins! Needless to say, using lots and lots of selfjoins is stressfull for the database and has a negative impact on the performance.

Why is EAV used?

It’s used because of scalability. You can add almost anything into DB without changing it’s structure. That would be the main pron of it.

Major problem with EAV systems is that each of them is much slower than a custom made solution because of SQL complexity. You need quite a few joins on DB just to retrieve one single Entity (or “data item”) in opposite to, let’s say, one select query on a custom made solution.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s