Inheritance Strategy in Entity Framework Core 5

Prior to .NET 3.5, we (developers) often used to write ADO.NET code or Enterprise Data Access Block to save or retrieve application data from the underlying database. We used to open a connection to the database, create a DataSet to fetch or submit the data to the database, convert data from the DataSet to .NET objects or vice-versa to apply business rules. This was a cumbersome and error prone process. Microsoft has provided a framework called “Entity Framework” to automate all these database related activities for your application. Entity Framework (EF) Core is a lightweight, extensible, open source and cross-platform version of Entity Framework data access technology.

Entity Framework is an object-relational mapper (O/RM) that enables .NET developers to work with a database using .NET objects. It eliminates the need for most of the data-access code that developers usually need to write.

Obvoiusly today we’re not going to talk about Entity Framework Core or how to use it in your applications. Today we’re going to see what are the inheritance strategies that you can implement in Code First using EF Core 5.

According to the Code-First Conventions, EF creates database tables for each concrete domain class. However, you can design your domain classes using inheritance. Object-oriented techniques include “has a” and “is a” relationships, whereas SQL-based relational model has only a “has a” relationship between tables. SQL database management systems don’t support type inheritance. So, how would you map object-oriented domain classes with the relational database?

Before we start, you can get the full source code for the demo project from here.

Basic understanding of Dotnet Core and EF Core with SQL Server is expected from the readers.

Below are three different approaches to represent an inheritance hierarchy in Code-First:

  • Table per Hierarchy (TPH): This approach suggests one table for the entire class inheritance hierarchy. The table includes a discriminator column which distinguishes between inheritance classes. This is a default inheritance mapping strategy in Entity Framework.
  • Table per Type (TPT): This approach suggests a separate table for each domain class.
  • Table per Concrete Class (TPC): This approach suggests one table for one concrete class, but not for the abstract class. So, if you inherit the abstract class in multiple concrete classes, then the properties of the abstract class will be part of each table of the concrete class.

Eventhough Entity Framework 6 supports all 3 of these strategies, EF Core only (currently) supports two techniques for mapping an inheritance model to a relational database: Table-per-hierarchy (TPH) and Table-per-type (TPT).

The table-per-type (TPT) feature was introduced in EF Core 5.0. Table-per-concrete-type (TPC) is supported by EF6, but is not yet supported by EF Core.

But we will see how to implement TPC in EF Core as well using a small hack.

Let’s go through them one by one. Let’s take 3 entities with inheritance and apply these strategies to understand them.

Table per Hierarchy (TPH)

By default, EF maps the inheritance using the table-per-hierarchy (TPH) pattern. TPH uses a single table to store the data for all types in the hierarchy, and a discriminator column is used to identify which type each row represents.

Now if you were to create your models as follows and created a migration you will notice that EFCore has created one table TPHUserand added all the properties in TPHTeacher and TPHStudent into it, also a new property called, Discriminator which identifies which type of TPHUser is stored in each row.

EF added the discriminator implicitly as a shadow property on the base entity of the hierarchy. Shadow properties are properties that aren’t defined in your .NET entity class but are defined for that entity type in the EF Core model. The value and state of these properties is maintained purely in the Change Tracker. Since this discriminator is not listed in our model you might ask how can we access it.

Shadow property values can be obtained and changed through the ChangeTracker API:

// tphUser is returned from the database.
context.Entry(tphUser).Property("Disicriminator").CurrentValue = someValue;

Shadow properties can be referenced in LINQ queries via the EF.Property static method: [Read more]

var users = context.TPHUsers
.OrderBy(b => EF.Property<string>(b, "Disicriminator"));

Shadow properties cannot be accessed after a no-tracking query since the entities returned are not tracked by the change tracker.

You can configure the name and type of the discriminator column and the values that are used to identify each type in the hierarchy using fluent API as follows:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<TPHUser>()
.HasDiscriminator<string>("UserType")
.HasValue<TPHTeacher>("Teacher")
.HasValue<TPHStudent>("Student");
}

And resulting migration will be,

Migration
Generated query

If you want to keep this property in the Entity you can do this as well, it can be mapped to a regular .NET property in your entity so you can use it to filter rows etc:

This property can be configured like any other using fluent API like this.

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<TPHUser>()
.HasDiscriminator(b => b.UserType);
modelBuilder.Entity<TPHUser>()
.Property(e => e.UserType)
.HasMaxLength(200)
.HasColumnName("UserType");
}
Table created with table-per-hierarchy

Table per Type (TPT)

In the TPT mapping pattern, all the types are mapped to individual tables. Properties that belong solely to a base type or derived type are stored in a table that maps to that type. Tables that map to derived types also store a foreign key that joins the derived table with the base table.

To configure this there there are two ways. First one is using mapping attributes.

After adding the mapping attributes you can create a migration and see that it mapped each entity type to a different table. But I would recommend using fluent API to do this instead of mapping attributes.

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<TPTStudent>().ToTable("TPTStudents");
modelBuilder.Entity<TPTTeacher>().ToTable("TPTTeachers");
}

Above is the generated sql script for TPT mapping pattern, observe tables that map to derived types also store a foreign key that joins the derived table with the base table. Migration look like this.

Table created with table-per-type

Table per Concrete Type (TPC)

As I mentioned earlier eventhough the table-per-concrete-type (TPC) is supported by EF6, is not yet supported by EF Core. It may not even come to EF Core 6, Shay Rojansky from Entity Framework team comented on the issue saying it probably will roll out in EF Core 7.

But there is a way to do this. Without knowing, you might have done it as well in your previous projects.

First thing we have done here is make the base class (`TPCUser` in this case) an abstract class. Then using data annotations we made all the attributes in the base class Required and made Id as primary key. Now you should only include child classes in the DbContext , don’t include TPCUser

public DbSet<TPCStudent> TPCStudents { get; set; }
public DbSet<TPCTeacher> TPCTeachers { get; set; }

This will result in this migration and generated query is,

You can see that both tables have properties from the base entity/ parent class, Id, Username and Email. For now this is the simplest way to achieve TPC in EF Core 5.

Performance

The choice of inheritance mapping technique can have a considerable impact on application performance — it’s recommended to carefully measure before committing to a choice.

People sometimes choose TPT because it appears to be the “cleaner” technique; a separate table for each .NET type makes the database schema look similar to the .NET type hierarchy.

However, measuring shows that TPT is in most cases the inferior mapping technique from a performance standpoint; where all data in TPH comes from a single table, TPT queries must join together multiple tables, and joins are one of the primary sources of performance issues in relational databases.

As an example see this benchmark,

It sets up a simple model with a 7-type hierarchy; 5000 rows are seeded for each type — totalling 35000 rows — and the benchmark simply loads all rows from the database:

As can be seen, TPH is considerably more efficient than TPT for this scenario. Note that actual results always depend on the specific query being executed and the number of tables in the hierarchy, so other queries may show a different performance gap.

That’s it for now, full source code for the project can be found here. I have added some controller methods to create and retrieve Users as well. If intereseted you can go through them as well.

I’ll see you in the next. Stay safe. 🖖

References -

Systems Design • Social Innovation • Cloud • ML