Performance steadily degrading...

Last post Wed, Jan 14 2009 17:00 by Anonymous. 14 replies.

Page 1 of 1 (15 items)
	Sort Posts: Previous Next

Fri, Jan 9 2009 19:50

Steve Miller
Joined on Thu, Jan 1 2009
Portland, Oregon USA
Posts 18

Performance steadily degrading...

Reply Contact

Greetings,

I am entering a fairly significant database design (~150 tables and probably 10 times that in column numbers). I am only about 10% of the way through and am finding that entering new fact types and constraints, with each new entry, it is taking longer and longer and ...

I currently have 95 objects, 151 facts, and probably 3-5 times that many constraints.

I cannot figure out if these designs can/should be spread over multiple .orm files. If so, how can references in one file be made in another (for the purposes of getting the output SQL correctly synchronized?)

Thanks in advance,

Steve Miller (steve@miller.com)

Fri, Jan 9 2009 21:24 In reply to

Steve Miller
Joined on Thu, Jan 1 2009
Portland, Oregon USA
Posts 18

Re: Performance steadily degrading...

Reply Contact

I thought I should add that to add a new fact now takes 75 seconds and to add a unique constraint about an equal amount of time. I suspect that it is the maintenance of the supporting xml files is what is taking the brunt of the update slowdown...

Steve

Fri, Jan 9 2009 22:36 In reply to

Matthew Curland
Joined on Sat, Mar 8 2008
Posts 450

Re: Performance steadily degrading...

Reply Contact

Steve,

First, apologies for the performance hit. However, we've run significantly larger ORM models than this with no performace slowdowns. There is also signficant incremental work currently in progress that should alleviate any regeneration performance issues as the model gets larger.

Here are the things that take a long time:

The relational view is extremely slow as it gets larger. There is a reason it is marked as 'slow and temporary' in the Extension Manager dialog. Essentially, the diagram and line routing is fully rebuilt whenever the relational model changes, which happens on the changes you describe. Note that the 'Relational Schema' expansion in the 'ORM Model Browser' offers you all of the same selection opportunities as the relational view.
Very large diagrams have a tendency to slow down, but the biggest hit is generally on file load, not during normal editing. If you use multiple diagrams you should see an improvement. You can also use the new 'Diagram Management' extension to help manage your diagrams.
The xml files are regenerated when you save the .orm file or activate another document window (tool windows don't count). Generation is relatively slow the first time per VS session because the transform files are loaded and compiled. After that, however, you shouldn't see more than 2-3 seconds delay, even on large models. Often the largest delay is reloading the files in the VS text editors, so it will be faster if the generated files are left closed.
The absorption algorithm is hitting a very long chain of 1-1 relationships, which are the performance pinch point. Generally, you have to work pretty hard to get a model that is this pathological.
Something else I'm not guessing at.

I'd recommend copying the model file, opening it standalone (without the generated files), and selectively turning off extension models (Relational View, Map to Relational Model, Map to Abstraction Model) with a save and test in between removing each extension. This should give you can idea of which component is causing the problem.

If this is anything other than the Relational View, then I'd very much like to get the file and run it under the debugger to see where the problem is. There isn't much we can do with the Relational View until we get the incremental work done. This is the first case I've heard of this big of a slowdown on this size file, so I'm very interested in determining where the slowdown is.

Please let me know what you find.

-Matt

Fri, Jan 9 2009 22:57 In reply to

Steve Miller
Joined on Thu, Jan 1 2009
Portland, Oregon USA
Posts 18

Re: Performance steadily degrading...

Reply Contact

Matt, Thanks for such a quick response! First, I’m really impressed with what you’ve done versus the way it used to work in Visio EA. While VEA was still more refined than what is currently available in NORMA, I’m still very pleased. If there were only an Excel export/import utility, that would be the cat’s meow for me. I usually like to verbalize using Excel. Being able to have a “gang import” would just be the biggest time saver of all. The old Orthogonal tool was great but it isn’t being updated any longer, so, alas… I did have the Relational View turned off, but the other two extensions were turned on. I’ll do the toggling as you suggest and let you know the outcome. As for sharing the orm file, I’d be more than happy to do so – I’ll decide tomorrow what to do. Thanks again!

Steve

Sat, Jan 10 2009 7:46 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Performance steadily degrading...

Reply Contact

Hi Steve,
Thought you would like to know that I tested the file that you posted elsewhere and I found no problem in adding a unique constraint. (4GB Vista Laptop)
However, Visual Studio did throw an "out of memory" exception when I tried to generate the relational model.

It took about 25 seconds to get to the exception.
When I look at the memory usage in task manager, I see the memory in use climbing steeply until it hits 2.5 GB - which I guess is the practical limit for an "in memory" application on my machine. So the symptom of the problem is that your model is causing NORMA to gobble up memory until it hits the limit of what is available.

I'm sure that Matt will have an explanation (and solution) for this behaviour.

Ken

Sat, Jan 10 2009 10:42 In reply to

Matthew Curland
Joined on Sat, Mar 8 2008
Posts 450

Re: Performance steadily degrading...

Reply Contact

The issue we're having is with repeated patterns you're using on a general reference mode pattern. With one of these patterns (repeated use of an objectName general reference mode in 1-1 object types) I'm not surprised you're having problems. The other one similar pattern (description) I expected to clear up when I deleted the (apparently old, there is no shape for it) ElementTypeHasDescription FactType, but it didn't. In any case, you can get back to normal performance if you use popular reference modes instead of general with these two repeated patterns. Basically, you end up with a chain of about 30 1-1 FactTypes (note this includes the reference mode FactTypes that are collapsed by default) involving the objectName value (and even more for description), and the hangup is trying to figure out the most efficient direction to map the 1-1 relationships. Obviously, we need to take a serious look at the range of choices we attempt to optimize here.

To make the model more performant. I'll mark things that will change over time in [:

(To save time later) Go to the 'ORM Designer' page on the Tools/Options dialog.

Change 'Initial Data Type' to 'TextVariableLength' [We'll eventually add facet options (length/size/precision) here as well, the lack adds a step later]
Change 'Final Shape Deletion' to 'DeleteShapeOnly'
(optionally) Change 'Delete Key Behavior' to 'DeleteElement' (this makes the Delete key delete an element, and Ctrl-Delete delete a shape)
(optionally, tangent) You have a lot of definitions in place, so you might want to turn on 'Show Definition Tooltips'

Back in the model, delete the 'ElementTypeHasdescription' FactType using the ORM Model Browser
Add a new diagram (temporary, we'll delete it later)
Open the 'ORM Reference Mode Editor' tool window
Expand the 'Custom Reference Modes' branch and click on <add new>
Enter objectName (be sure to match the case) and then change the Kind to Popular. [You'll eventually be able to set an intial data type here as well.]
[The appearance of these shapes is a bug, but we'll leverage it to change the DataTypeLength]. There will be a big blob of shapes in the diagram.

Activate the diagram
Ctrl-A to Select All
Choose 'Auto Layout' on the context menu to get a stack of shapes.
Start a lasso select to the right of the left edge of the ValueType shapes and make sure you pick up all of the FactType shapes. Delete the shapes (make sure you don't delete the elements, the keystrokes depends on your key mappings).
Ctrl-A to get all of the ValueTypes and change the DataTypeLength to 50
Delete the ValueType shapes

Repeat steps 5-7 for description instead of objectName, setting DataTypeLength to 250 instead of 50
Delete the temporary diagram

Your model will now perform as you expect. You can see the remapping speed by expanding the 'Relational Schema/Schema/Tables', making a significant FactType change (expand a single role internal uniqueness to a spanning uniqueness, for example), and watching how long it takes for the Tables node to collapse. [This is test will not long-term because we won't be fully regenerating].

Basically, the issue here is that every time you added a new use of objectName or description as general reference mode patterns, the possibly mapping combinations increased exponentially, and the performance went the other direction. From a modeling perspective, by using a general reference mode pattern instead of a popular pattern, you've effectively stated that you want the objectName to be unique across most of your EntityTypes, which I don't think is what you were after. However, the pattern you used should not run the machine out of memory.

If you don't mind, I'll isolate the issue with a smaller file and keep the original (locally, not as a submitted test scenario) as a performance check.

-Matt

Sat, Jan 10 2009 12:16 In reply to

Steve Miller
Joined on Thu, Jan 1 2009
Portland, Oregon USA
Posts 18

Re: Performance steadily degrading...

Reply Contact

This post is chronologically out of sequence. This is a post of a response I made directly to the admin account...

Hi Matt,

I found that by turning off all of the items in the extension manager, suddenly I could enter facts and constraints with good performance. However, when I went to generate a SQL Server SQL file, I needed to turn at least the two Map to… items to get the SQL generation to work. Whenever I toggle these off and on, I get double and triple the xxx.DDIL.xml and xxx.DCil.xml files. ??? Also, now the model will not generate any SQL – it throws an out of memory error.

The duplicate files are mostly empty or just have the xml header elements.

I am using VS2008 with all of the most recent patches and service packs. I am also using Vista with all of the most recent service packs (VS just aborted…). I had downloaded the version ‘a’ you released today and installed it…

Hence, I just may be in the pathological state you mention below! I am sending the orm file to you.

What else should/do I need to do?

Thanks for your help!

Steve

BTW: I have the “Big Brown Book” and have been working through this as a learning exercise. I’ve been working on a “closet project” for the past six years looking to generate the next manufacturing “killer app” and know/believe that ORM is what I need to use to get the database correctly designed. I’m probably still miles away from knowing ORM to any helpful degree, but at least I’m trying!! What I have sent you is merely the tip of the iceberg.

Sat, Jan 10 2009 12:27 In reply to

Steve Miller
Joined on Thu, Jan 1 2009
Portland, Oregon USA
Posts 18

Re: Performance steadily degrading...

Reply Contact

Matt,

Wonderful! Thanks for being so detailed in your response. Us newbies will always find ways to break things in the must unexpected fashion! Here, I thought I had found a shortcut in defining data types. I had considered doing as you suggest above and set a default data type, but I found I still had to set length and there will be many times ahead where I'll need differing datatypes... So much for being "lazily clever".

I'll make your suggested changes and continue on!

Best regards,

Steve

Tue, Jan 13 2009 21:22 In reply to

Re: Performance steadily degrading...

Reply Contact

For what it's worth, the relational composition (mapper) of ActiveFacts takes a different approach from NORMA in regard to one-to-ones, by the sounds of Matt's descriptions.

In particular, it doesn't try to perform an exhaustive search and optimisation of the possible options, which is a O(2**N) problem. Instead, it takes an initial guess as to which way around to map each one, taking into account things like whether the one-to-one is used in identification, and creates a directed Reference from one to the other. Then, it iteratively passes over the objects for which a firm decision (table or not) has not yet been made, and applies a number of rules to see if it can make a firm decision one way or the other. In some of these rules, a Reference may be flipped to point in the reverse direction. If, after taking one complete pass, no new firm decisions can be made, the process finishes, even though some decisions that could have gone either way have no strong reason to go a particular way. Finally, if a Reference points to an object that is a table, a foreign key is generated. Otherwise, the object is absorbed into this object.

This algorithm is O(N*log(N)), which means it scales much better.

In addition, it seems to make the right decisions in a number of cases where NORMA gets it wrong. For example, I have a Blog model where an Author is identified by AuthorID (an Autocounter field), but also has a one-to-one Name (VariableLengthText). NORMA creates a table called "Name" instead of the obvious Author table.

Another error NORMA makes sometimes is to sometimes absorb two AutoCounter fields that identify two objects into a single table. This creates SQL which doesn't work. ActiveFacts has a rudimentary mechanism which usually prevents this, but I plan to extend that to allow more kinds of auto-allocated identifiers, such as Ordinal values inside a multi-part PK, and even maybe define a generic method for allowing user-defined data types to provide their own auto-allocation algorithms.

Tue, Jan 13 2009 22:46 In reply to

Matthew Curland
Joined on Sat, Mar 8 2008
Posts 450

Re: Performance steadily degrading...

Reply Contact

I don't have your blog model, so I can't comment on it. However, NORMA will prefer the preferred identifier and map towards Author unless Name is doing something else. For example, if Name has a 1-1 with both an Author and an Editor (to the same ValueType), then you are likely to get a Name table instead of separate Author and Editor tables (unless, of course, you mark for independence). We will be adding a KeyTotal property to complement the IsIndependent property, which will allow the user to force tables to be created that they currently cannot.

Obviously, there is a large amount of preprocessing that happens before we start evaluating chains of 1-1 elements, which is why you don't get into trouble very often. However, what we do not do is catch a pathological case and drop back the expectations on how much we can optimize. The order is also not nearly as high as you state. In normal cases, n is rarely above 3 or 4, and I believe the algorithm is bounded by n^4, and is rarely anywhere near that (the exponent is usually closer to 2). Steve had n at 87 (based on the debugger, not analysing the model), and this happened because he was using a general reference mode pattern to save on entering DataType information instead of using a popular reference mode pattern. Even when things were taking him 75 seconds he probably had n at 20-30, and it was still running. Clearly not great, but definitely not 2^n.

I'll freely admit that the multiple AutoCounter fields is a problem. If you encounter this at this point, just separate the tables. This is technically an issue with the generator, which could produce valid SQL with this construct but does not:

The IDENTITY DDL directive marker is really just a shortcut for a SEQUENCE, but not all of the DB's (SQL Server, for example) support SEQUENCE, so you have to look at other options. You don't need the UDT if you use SEQUENCE.
You can make a separate table to track just the ids or realize a view to do similar work. This could help somewhat to reduce the number of joins (the main reason to absorb subtypes), but basically you might as well just separate at this point.
One option is to just copy the primary identity column. However, you still need trigger code to do this when the other columns are set. This will be done as we add formal subtype definitions, etc.
IDENTITY columns are always mandatory, so you can't use them on optional (absorbed) columns anyway.
Another option is to add a broader increment to the autoincremented field and use a value offset from the primary identity value. This doesn't help much more than just use the primary value directly, though.
You can just ignore the AutoCounter and force the user to fill in their own value
All of this has to be tempered in DB design with future data migration scenarios. For example, if an absorbed table is later separated, then how well do the ids migrate (and vice versa), the 'based on the primary identifier' approach breaks down here because the existing and new identifiers might overlap.
etc

So, obviously this is a non-trivial problem. Right now, if you really need a separate IDENTITY column then separate the table. NORMA will generate foreign keys to the smaller table.

It's a work in progress.

-Matt

Tue, Jan 13 2009 23:10 In reply to

Re: Performance steadily degrading...

Reply Contact

Matthew Curland:
I don't have your blog model, so I can't comment on it.

As always, all my models are in the examples directory under my project homepage, at http://dataconstellation.com/ActiveFacts/examples

This one has Topic(.Id) with Topic Name as well as Author(.Id) with Author Name, which matches your scenario. In this case though, forming a Name table doesn't help, as Topic has its own Id anyhow.

Glad to hear that the algorithm performance isn't normally a problem for you.

Matthew Curland:
IDENTITY columns are always mandatory, so you can't use them on optional (absorbed) columns anyway.

Actually, I've seen it - not sure which model. I think it was a subclass with a separate reference Id.

Matthew Curland:
So, obviously this is a non-trivial problem.

Indeed. I'd like invite anyone to submit a case where they think my algorithm might not work however - it would give me a chance to improve it.

Wed, Jan 14 2009 4:20 In reply to

Matthew Curland
Joined on Sat, Mar 8 2008
Posts 450

Re: Performance steadily degrading...

Reply Contact

Clifford Heath:
As always, all my models are in the examples directory under my project homepage, at http://dataconstellation.com/ActiveFacts/examples

Great, thanks for the link. In the Blog model, the NORMA-generated DDL you're objecting to is illustrating an issue with your model. When you specify a popular reference mode (Author(.Id), Topic(.Id)), you get two separate ValueTypes (Author_Id and Topic_Id). If you want to emulate this pattern with additional identifiers, then you need to use separate ValueTypes (AuthorName and TopicName), not the shared Name ValueType. What you're model says is that if you call an Author 'Bob' and a Topic 'Bob', then the 'Bob' instances must be the same.

Author has no functional roles and is not independent, so it is an immediate candidate to be absorbed into another table. If your RMAP implementation is not creating a Name table, then you either cannot create tables for ValueTypes, or you're making an artificial ValueType/EntityType distinction.

Clifford Heath:

Actually, I've seen it - not sure which model. I think it was a subclass with a separate reference Id.

I'm not saying that NORMA doesn't incorrectly spit this, I'm just saying that IDENTITY columns must be mandatory. You can't have a nullable IDENTITY column because the DB will interpret it as mandatory. So, we actually have a problem even if the supertype identifier is not an IDENTITY, namely that it is not possible to make the identifier column of the absorbed subtype nullable if it is an identity. Given that absorbed subtype identifiers are always nullable (otherwise the subtype would always be a mandatory part of the supertype and there would be no proper subtype), I think we're concentrating on the wrong symptom here (two IDENTITY columns, as opposed to the bogus nature of having any IDENTITY column on an absorbed subtype identifier)

-Matt

Wed, Jan 14 2009 6:25 In reply to

Re: Performance steadily degrading...

Reply Contact

Matthew Curland:
you're model says is that if you call an Author 'Bob' and a Topic 'Bob', then the 'Bob' instances must be the same.

Yes, that's admittedly a non-semantic feature of this model. It's useful because it shows this difference in behaviour though.

Matthew Curland:
If your RMAP implementation is not creating a Name table, then you either cannot create tables for ValueTypes, or you're making an artificial ValueType/EntityType distinction.

Or (which is the case) I'm treating the AuthorId field as needing an Author table. I don't attempt to ensure that all distinct Names should be in one table, in just the same way that my Address model doesn't insist that all Addresses be in one table - and NORMA treats that the same way.

My MonthInSeason and OilSupply models show tables for ValueTypes. That's one of the schema modifications that's injected midway through the RMap process.

I create Reference objects for all functional relationships (but only one for a one-to-one or subtype, not one for each direction). Then I decide which objects will be tables, possibly flipping some one-to-one References. Then I apply schema transformations (ValueType column injection, and in future, surrogate key injection, and possibly automatic temporal modeling support), and finally I create the lists of columns, indices and foreign keys. So far I don't create mandatory role groups, but that's an isolated bit of functionality. Each column derives its name from one or more References, and each reference knows the to&from objects and roles, with special handling for the phantom roles of objectified fact types and unary fact types - so I never need to create explicit roles for those, which keeps the mapping between the relational and fact-based views as thin as reasonably possible. The entire RMap layer is only 900 lines of code, and I'm very happy with that.

At some point, I want to have automatic population of role values for types other than AutoCounter types. For example, in my Metamodel, a number of object include Ordinal in their identifiers. I'd like to be able to create a new object without specifying a value (just a value for the other parts of the identifier), and have a trigger allocate the next Ordinal.

Filed under: RMap

Wed, Jan 14 2009 15:37 In reply to

Ken Evans
Joined on Sun, Nov 18 2007
Stickford, UK
Posts 805

Re: Performance steadily degrading...

AuthorPost.png

Reply Contact

Clifford Heath:
Yes, that's admittedly a non-semantic feature of this model. It's useful because it shows this difference in behaviour though.

Hi Clifford,
A "non semantic feature" ??? Confused
Forgive me but isn't it the case that the whole purpose of what we are doing is to facilitate the creation of "semantically accurate" models?

You can see the "name" problem by inference from Matt's point: "your model says is that if you call an Author 'Bob' and a Topic 'Bob', then the 'Bob' instances must be the same.". Since we are trying group instances of things that are the same into sets, then Matt's point infers that the set of things called "Topic Names" is not at all similar to the set of things called "People Names". Matt's solution of using separate value entity types called "TopicName" and "PersonName" goes some way towards resolving the problem. My example takes this a step further in order to illustrate two main points.
1: The need for a skilled analyst to build object-role models
2: The risks associated with using what I think of as the "programmers syntactically focused mindset" to create object-role models.

In order to provide an example of what I mean, I studied the models on your website (great website!) and re-coded your "Blog" model. (shown below)

The need for a skilled analyst: During my analysis, it became clear to me that the semantic notion of "Topic Name" is quite different from the semantic notion of "Person Name" AFAIK, there is no particular convention regarding "Topic Names" whereas we do have conventions about the way we name people. (admittedly different in different parts of the world). For example it would be most unusual to have facts called "Topic has FirstName" and "Topic has LastName". Thus my model has facts called "Topic has Name" and "Author has First Name" and "Author has LastName"

Risks associated with using a programmers mind set to create object-role models: At the "non-semantic syntax level" - there is clearly no problem with lumping all the instances of "Name" together. After all - "Name" is just another text variable isnt it. And may allow more "efficient" and more "elegant" code to be written. But it seems to me that what is OK at a syntax level may not be OK at the semantic level. So (without any intention to cause offence) I was wondering if your term "a non-semantic feature" might be related to this issue. And before you go off the deep end, it is not my intention to "accuse you" of having a programmers mind-set. I'm just trying to understand meaning and etymology of the term "a non-semantic feature" within our present context.

Ken
Note: - if the model is fuzzy - click it and you should see a clearer image.

Wed, Jan 14 2009 17:00 In reply to

Re: Performance steadily degrading...

Reply Contact

This is not relevant to a discussion of the performance of a relational mapping algorithm. Please start a new thread, don't hijack this one.
CQL adopts the principles of SBVR in using compound names. In ORM, hyphen-binding is simply a lexical assistance to the verbaliser, whereas in CQL, it's an adjective that creates a compound term which is used for reference when later invoking a fact type. Here, "Name" is a "General Concept" (SBVR) and "Author Name" is an "Individual Concept". This usage reflects natural linguistics, though not the more limited terminology of ORM. In this respect, CQL has a richer metamodel than ORM, though not as rich as SBVR. You'll notice that both my uses of Name have such adjectives. I had this discussion with Terry some months back, where my contention is that an option role with a role name might be considered in some cases as creating a vestigial subtype (e.g. Director), where subtype migration can occur (the role can be played, or not, and this can change). The same subtype argument can apply to adjectival forms. This discussion was in the context of elevating role names to avoid ambiguity with concept names. The result was unresolved, and in CQL at present, a role name has scope (and must be distinct only within) a single statement; it cannot be used later to refer to a fact type as adjectives can.
By "non-semantic", I mean "doesn't describe as fully as possible the natural semantics". As Simsion showed, description is not a primary mode, nor is it the intention here, rather the pragmatic design of an effective model of part of the real world. In this context, my use (elsewhere) of Given Name and Family Name being derived from the same Name type is intentional; I wish to know if someone had given-name Heath, when it's my family-name. This is a valid design decision that you've removed. You've also removed the very reason I created this model, which was to propose a Blog where Comments may be made on a Paragraph, not just on a whole Post.
Your comments about programmers minds are accurate, but entirely spurious.
Please don't respond to this message. Start a new thread if you want to have this discussion. Perhaps even post a library of your models that we can critique? I'm sure that if you have any, we can find similar warts on them. After all, it's design, not description, isn't it?

Page 1 of 1 (15 items)

The ORM Foundation