- Functionality should be delivered first and we can optimize performance later
- We should constantly strive to test and improve the application performance throughout development. - this is particularly important when dealing with new or unproven technologies.
Performance approaches aside, one of my clients recently had an issue with performance of a system based on the Entity Framework 3.5. Many of the issues in general with EF performance are well documented and I will not detail them here - however there are some golden rules that apply to any database-driven application:
- Minimize the amount of data you bring across the network
- Minimize network "chattiness" as each round-trip has an overhead. You can batch up requests to resolve this issue.
- JOIN and Filter your queries to minimize the number of records that SQL needs to process in order to return results.
- Index your DB properly and use Indexed (SQL Server)/Materialized (Oracle) Views for the most common JOINS
- Cache Data and HTML that is static so you don't have to hit the database or the ORM model in the first place
- Denormalize your application if performance is suffering due to "over-normalization"
- Reduce the number of dynamically generated objects where possible as they incur an overhead.
- Explicitly loading entities rather than loading them through the ORM (e.g. via an ObjectQuery in Entity Framework) when the ORM outputs poor performing JOINS or UNIONs.
Using this concept, the most obvious step seemed to me to be:
- Remove the "Contents" field from the "SystemFile" entity so it didn't get automatically loaded when the EF entity was referenced in a LINQ2E query.
- Create an inherited entity "SystemFileContents" that just had the contents of the file so the application can load it up only when needed.
Error 3034: Problem in Mapping Fragments starting at lines 6872, 6884: Two entities with different keys are mapped to the same row. Ensure these two mapping fragments do not map two groups of entities with overlapping keys to the same group of rows.
After a little investigation, I found there are a few different approaches to this error:
- Implement a Table Per Hierarchy (TPH) as described at http://msdn.microsoft.com/en-us/library/bb738443(v=VS.90).aspx. This would mean I could just make some database changes and move the file binary contents into a separate table. After that I could just make the parent "SystemFile" class an abstract one, and only refer to 2 new child classes "SystemFileWithContents" and "SystemFileWithoutContents"
- I could simply split the table into 2 different entities with a 1:1 association rather than an inheritance relationship in the Entity Framework Model.
The designer in Visual Studio 2008 doesn't support this arrangement (though the designer in Visual Studio 2010 does as per http://thedatafarm.com/blog/data-access/leveraging-vs2010-rsquo-s-designer-for-net-3-5-projects/) - so you have to modify the Xml file directly and add a
"ReferentialConstraint" node to correctly relate the 2 entities:
We add the referential constraint to it to inform the model that the ids of these two types are tied to each other: <Association Name="SystemFileSystemFileContent"> <End Type="SampleModel.SystemFile" Role="SystemFile" Multiplicity="1" /> <End Type="SampleModel.SystemFileContent" Role="SystemFileContent" Multiplicity="1" /> <ReferentialConstraint> <Principal Role="SystemFile"><PropertyRef Name="FileId"/></Principal> <Dependent Role="SystemFileContent"><PropertyRef Name="FileId"/></Dependent> </ReferentialConstraint> </Association>
This reduced the load on SQL and the web server as it didn't have to drag across the data dynamically on each call to the SystemFile table anymore. Any performance improvement must be measurable - so the team confirmed this with scripted Visual Studio 2008 Load tests which has a customer-validated test mix based on their expected usage of the system.