Domain analysis of an application database

18 Mar 2012

A relational database is not only a source of content. It can also be used as a mine for domain information related to domain of an application that uses that database. The schema, if it is well maintained, will give you an overview of the problem domain. If the schema is not well maintained, you can factor out the conventions used in that particular database for relations and create a patched schema with the real relations. As you generate a graph of the database you can start filtering out some of the details (important for business logic) but not for the big picture. Some of the things you can get from the graph will be trivial. It might also give you a better understanding of the big picture. By plotting this graph, you can get a overview helpful for identifying subdomains.

Maybe if we have a sufficiently large body of text describing the domain we can use a statistical approach to identify words relevant to the domain. The identification of these words will probably be easier if we have some larger body of text to use as for comparison of the frequencies of sentence construction words in that language (that is to filter out words like: ‘the’, ‘for’, ‘he’, ‘she’ et.c. in english).

By combining the above simple domain mining techniques and talking to people fluent in the domain you can get a shallow technical understanding of the domain. This can be helpful for understanding the details of the domain.



Do you want to send a comment or give me a hint about any issues with a blog post: Open up an issue on GitHub.

Do you want to fix an error or add a comment published on the blog? You can do a fork of this post and do a pull request on github.