I’m about to start building a C# library for English and French morphology as a side project. The library will be later merged with other linguistic aspects (phonology, sentence parsing, etc). for other languages (Japanese and Russian).
Unfortunately, I have come across an issue in my design: my namespace hierarchy is a little crazy (in my opinion).
Here’s the namespace for English sentence parsing (lexing stage):
Linguistics.English.Syntax.Parsing.Lex
The above namespace would include the necessary classes for using an existing English lexicon to parse a particular text.
The Japanese equivalent would be:
Linguistics.Japanese.Syntax.Parsing.Lex
They would be separate because the lexing process for each language is different. A more extreme example of the indicative present for French verb conjugations (inflections) would be:
Linguistics.French.Inflection.Verb.Indicative.Present
I know that I can just do using Linguistics.French.Inflection.Verb
if I’m working on generating verb conjugation tables, but is the organization too specific?
More specifically, if/when do namespaces become too-specific? Am I being silly or is my concern actually a potential problem?
Your question is opinion-based and open-ended, so I’m not offering anything undoubtedly acceptable, but
-
certainly, good naming is hard ← emphasis.
I’ve found many times that coming up with good names or good system of names if they can’t be easily changed with the use of refactoring tools later on, once the pilot was delivered out to the wild, is more difficult then just writing code in between the coding convention hinted margins.
For instance renaming database tables and database fields while preserving the data and not breaking any stored procedures with dynamic queries is very tough technical task, best to be avoided. So good naming since the beginning can be very important sometimes.
-
That’s why my advice is: before fixing down a namespacing rules, draw some inspiration from the name finding work already done by other developers developing linguistic libraries in programming languages supporting namespaces like
Python
,Java
,C#
…Some lists of linguistic libraries:
- Stack Overflow: Pluralize – Singularize
- Stack Overflow: Java or Python for Natural Language Processing
From the above lists,
NLTK (Python)
andStanford NLP (Java)
look like libraries that should not be underestimated. -
From the software design perspective, if you hide the language-specific implementation details behind interfaces and behind a factory pattern, then the language-specific libraries can use whatever internal naming convention they like, without exposing the names to the outside and without the risk of breaking anything.
That’s what System.Data.Entity.Design.PluralizationServices.PluralizationService.CreateService Method does
2
Having the language (eg, English, French, Japanese) so far up in your namespace is a smell. If I were you I’d keep the Language identifiers in the class names only.
The recommended structure for a namespace is something like
<company>.<project>.<namespace>.<subNamespace>.<andSoOn>
So, in your case, you might consider something like:
CC.Linguistics.Syntax.Parsing.Lex
In which you would have the classes EnglishLexicon
, ItalianLexicon
, etc.
9