A Good Solution for "Magic String" Data


By Chris Falter
Printer Friendly Version
  

Dealing with vendor data (or your own) in the form of "codes" can pose significant challenges. You must ensure that your source code remains readable, that data are properly validated, and that data can be displayed as user-friendly descriptions. The built-in solutions (named constants and enums) help, but they have some significant shortcomings. If you derive a class of named constants from the MagicStringTranslator class, though, you can vanquish all 3 challenges in one fell swoop!



Most software systems have to deal with "magic strings."  A magic string is a code that represents, rather than describes, the state of some entity. Allow me to illustrate with an example: many insurance companies obtain (with a consumer's fully informed consent) some personal data about the insured so that they can more accurately quantify the risk of underwriting a policy (and thus set the appropriate premiums).  A vendor might return the marital status of the insured as a code from the following list:

Marital Status Codes
Code Description
M Married
D Divorced
X Separated
S Single
W Widowed
U Unknown

The code "M" would be the magic string that represents the "Married" status, and so forth.   

Three Problems Associated With Magic Strings

The first problem associated with magic strings is how to make the source code that handles them easy to understand. Woe to the lazy programmer who uses those magic strings in source code like this: 

switch (maritalStatus) // maritalStatus is a one-byte string we got from the vendor and stored in our DB
{
    case "X" :
        DoA();
        break;
     case "S" :
        DoB();
        break;
     // etc.
}

Could you understand this code at a glance?  Probably not.  Does "X" signify "Unknown" or "Divorced" (as in "My kid is with my ex this weekend") or "Separated"?  Does "S" signify "Single" or "Separated"?  If you have to read the vendor documentation in order to understand source code, you're in a bad place.  (And good luck finding the vendor documentation!)

Data validation is also a problem when you handle magic strings.  You might want, for example, to verify at compile time that you are passing appropriate string data to a method.  If you want to update an instance of the Insured class with the marital status from the vendor, you might write a method like this:

public class Insured
{
    
// object construction
    public static Insured CreateInstance() { return new Insured(); }
    
protected Insured() { }

    
public void UpdateStatus(string maritalStatus)
    {
        
// implementation goes here...
    }
}

You would hope that by naming the string parameter appropriately ("maritalStatus") you would keep the careful colleague from entering inappropriate data.  However, from the compiler's perspective it would be completely legal to misuse the method like this:

Insured ins = Insured.CreateInstance();
ins.UpdateStatus(
"Widowed");

Those of you who have unit tests with 100% code coverage and rich boundary-checking would catch this quickly.  The other 99% of you (and this unfortunately includes me too often) would, at best, waste a lot of time debugging this error in a test environment.  And who knows, it could even move into production by accident--oh, the horror!

Worse yet, you could misuse the method in a way that is very difficult to detect.  If you have a set of codes for workflow status, and they happen to resemble the codes for marital status, you could plug them in--and create a truly insidious bug.

Insured ins = Insured.CreateInstance();
string status = myWorkflow.GetStatus(); // S = started, W = work-in-process, U = unknown, D = deleted, X = completed
ins.UpdateStatus(status);

This code would compile and never throw an exception.  You could only pray to catch this one before it hits production.

And of course you will probably want to validate the data a vendor sends you at the point at which you receive it, as well.  If you are receiving data in XML format and it is required to conform to a comprehensive XML schema, you can simply validate the data against the schema.  Unfortunately, even in this age of XML web services, your vendors may not publish an XML schema--and they may not even use XML.  Certainly ours do not.

A third problem associated with data encoded as a magic string is that your system somehow has to translate it into an understandable format in order to display it to a user.  Twenty years ago, green screen systems could get away with displaying something like "Marital Status: X."  The cost of training users to understand the meaning of 'X' was regarded as a cost of doing business.  Today businesses know that they do not have to accept such inferior application design; they expect systems to be easy to learn and use.

The Standard Solutions

A typical recommendation is to create (and use) named constants to represent the magic strings.  Ideally, you would group related codes into a common class, so in C# you might create the following class:

public static class MaritalStatus
{
    
public const string Married = "M";
    
public const string Divorced = "D";
    
public const string Separated = "X";
    
public const string Single = "S";
    
public const string Widowed = "W";
    
public const string Unknown = "U";
}

Then you would use the named constants in place of the magic strings to make your code more readable:

switch (maritalStatus) // maritalStatus is a one-byte string we got from the vendor and stored in our DB
{
    case MaritalStatus.Separated :
        DoA();
        break;
     case MaritalStatus.Single :
        DoB();
        break;
     // etc.
}

While named constants solve the readability problem, they do nothing to help us validate data or provide a human readable description.  So let's continue our search for a solution by taking a look at a datatype built into the .NET Framework: enums.  You could define an enum...

public enum MaritalStatus
{
  Married = 1,
  Divorced = 2,
  //....
}

...then store enum values in your database.  When you want to display a description, the enum can help there as well; you can obtain the name associated with an enum's value by simply calling its ToString() method.  The following code will write "Married" to the console:

MaritalStatus ms = MaritalStatus.Married;
Console.WriteLine(ms.ToString());

An enum still has some drawbacks, however.  You must write some additional logic that translates between the magic string and the appropriate enum value, for one thing: 

public class VendorCodeTranslator
{
  public static MaritalStatus MaritalStatusVendorToOurs(string vendorCode)
  {
    switch (vendorCode)
    {
      case "D": return MaritalStatus.Divorced;
      case "S": return MaritalStatus.Single;
      // and so forth
    }
  }
}

In addition, if you want your description to include a blank, or a character (such as "&" or "*") which is illegal in a programming token, you will not be able to get what you want from an enum.

Kill Three Birds With One Stone

As a programmer, you want to be able to define a class that associates a set of user-friendly names to a set of magic strings, and then let the class handle the 3 responsibilities of code clarity, data validation, and user-friendly display.  In fact, this is possible if your class inherits from the MagicStringTranslator class that I am about to define.  

    [Serializable()]
    
public abstract class MagicStringTranslator: ISerializable
    {
        #region fields

        
private string m_magicCode;
        
private object m_syncRoot = new object();

        #endregion

        #region
construction / initialization

        
public MagicStringTranslator(string code) : this(code, false) { }

        
public MagicStringTranslator(string initString, bool parmIsDescription)
        {
            
// initialize the lookup dictionary using double-checked locking pattern
            if (LookupDict == null)
            {
                
lock (this.m_syncRoot)
                {
                    
if (LookupDict == null)
                    {
                        
// instantiate the lookup hashtable
                        LookupDict = new Dictionary<string, string>();
                        InitializeDictionary();
                    }
                }
            }

            
// set the magic code for this instance
            this.m_magicCode = null;
            
if (!parmIsDescription)
            {
                
// verify that the code is one of the codes in our dictionary
                if (!LookupDict.ContainsKey(initString))
                    
throw new MagicStringBadValueException(initString);
                
this.m_magicCode = initString;
            }
            
else
            {
                
// find the matching description in our list, then set this.magicCode to the desc's corresponding code
                foreach (KeyValuePair<string, string> kvp in LookupDict)
                {
                    
if (initString == kvp.Value)
                    {
                        
this.m_magicCode = kvp.Key;
                        
break;
                    }
                }
                
if (this.m_magicCode == null)
                    
throw new MagicStringBadDescException(initString);
            }
        }

        
protected virtual void InitializeDictionary()
        {
            
// populate the hashtable with key-value pairs.  Value = name of public const string var, Key = string assigned to the var
            FieldInfo[] fields = this.GetType().GetFields();
            
foreach (FieldInfo fi in fields)
            {
                
string key = (string)fi.GetValue(this);
                
string val;
                
object[] attribs = fi.GetCustomAttributes(typeof(CustomDescriptionAttribute), false);

                // use the CustomDescription attribute if it exists 
                
if (attribs.GetLength(0) != 0)
                {
                    val = ((
CustomDescriptionAttribute)attribs[0]).Description;
                }
                // else use the name of the field 
                
else
                    val = fi.Name.Replace("_", " "); // Substitute a blank for an underscore in order to improve readability
                LookupDict.Add(key, val);
            }
        }

        #endregion

        #region
properties

        
public string Value
        {
            
get { return this.m_magicCode; }
        }

        
public string Description
        {
            
get { return LookupDict[this.m_magicCode]; }
        }

        
protected virtual Dictionary<string, string> LookupDict
        {
            
get
            {
                
Dictionary<string, string> myDict;
                m_dictTable.TryGetValue(
this.MyKey, out myDict);
                
return myDict;
            }
            
set
            {
                m_dictTable.Add(
this.MyKey, value);
            }
        }

        
private string MyKey { get { return this.GetType().Name; } }

        #endregion

        #region
ISerializable Members

        
protected MagicStringTranslator(SerializationInfo info, StreamingContext context)
        {
            
this.m_magicCode = info.GetString("magicCode");
        }

        
void ISerializable.GetObjectData(SerializationInfo info, StreamingContext context)
        {
            info.AddValue(
"magicCode", this.m_magicCode);
        }

        #endregion

        #region
overrides

        
public override string ToString()
        {
            
return this.Description;
        }

        #endregion
    }

When you subclass MagicStringTranslator, you declare a set of named constants, similar to the constants we saw in the MaritalStatus class.  The base class uses reflection to discover the named constants and values, and populates a dictionary of keys (magic strings) and values (descriptions) with them.  The base class then uses the dictionary to validate data (by verifying that they are in the dictionary) and to map the relationship between magic strings and descriptions.  In this fashion, MagicStringTranslator allows you to address all three of the magic string responsibilities (code clarity, data validation, and user-friendly display) simply by declaring a set of named constants.  Let's take a look at some sample code to see how this works.

Sample Code

First, define the MaritalStatus class as a subclass of MagicStringTranslator:

    public class MaritalStatus : MagicStringTranslator
    {
        [CustomDescription("Living happily ever after")]
        
public const string Married = "M";
        
public const string Divorced = "D";
        
public const string Separated = "X";
        
public const string Single = "S";
        
public const string Widowed = "W";
        
public const string Unknown_Status = "U";

        
public MaritalStatus(string code) : base(code) { }
    }

You just need to remove the static qualifier on the previous MaritalStatus class,  make the class inherit from MagicStringTranslator, and define a constructor.  Note that we have decorated the Married name with a CustomDescriptionAttribute; this means that the description associated with the magic string "M" is "Living happily ever after" rather than the default (the constant name ["Married"]).  In addition, the description associated with magic string "U" is "Unknown Status," since MagicStringTranslator replaces any underscore in the constant name with a blank.

This example assumes that the vendor is passing data as codes, not descriptions.  If you receive data in the form of descriptions (believe it or not, we do), you would need to provide a second constructor with an extra boolean parameter that indicates whether the first parameter is a description or a value.

Now you can strongly type the parameter of your UpdateStatus method by defining it as an instance of class MaritalStatus:

public class Insured
{
    
// object construction
    public static Insured CreateInstance() { return new Insured(); }
    
protected Insured() { }

    
public void UpdateStatus(MaritalStatus status)
    {
        
// implementation goes here...
    }
}

And the code that calls the method can hardly go wrong:

Insured ins = Insured.CreateInstance();
MaritalStatus status = new MaritalStatus(MaritalStatus.Widowed);
ins.UpdateStatus(status
);

Bear in mind that we haven't lost the ability to use the named constants for the sake of code clarity.  This code still works fine:

switch (maritalStatus.Value) // maritalStatus is an instance of the MaritalStatus class
{
    case MaritalStatus.Separated :
        DoA();
        break;
     case MaritalStatus.Single :
        DoB();
        break;
     // etc.
}
Trace.WriteLine("Magic string value is {0} and description is {1}", maritalStatus.Value, maritalStatus.Description);

The last line of code shows that we are also able to use the built-in capabilities of MagicStringTranslator to display a user-friendly description in place of a magic string, without having to write any additional logic.

As a bonus, the MagicStringTranslator class has other useful capabilities:

  • MagicStringTranslator implements the ISerializable interface in such a way that only the code/magic string is passed.  This makes using a subclass as a remote method's parameter type both simple and efficient.
  • The class overrides object.ToString() by returning the description associated with the magic string used to initialize the class.  This would prove very useful when the polymorphic ToString() method is used to display the descriptions of an object's properties, when one or more of the properties is typed to a subclass of MagicStringTranslator

Make Wrong Code Look Wrong

So we have achieved code clarity, data validation, and the ability to translate a magic string into a user-friendly description.  Of course, it is still possible to fall asleep at the wheel and write a bug:

Insured ins = Insured.CreateInstance();
MaritalStatus status = new MaritalStatus(myWorkflow.GetStatus());
ins.UpdateStatus(status);

Yes, this will compile.  However, the mix-up between marital status and workflow status should hit you right between the eyes; this source code obeys the "Make Wrong Code Look Wrong" principle propounded by my favorite software development blogger, Joel Spolsky.  A developer who would write this bug needs to take a long vacation, for sure.

Implementation Options

Performance Improvements.  The base MagicStringTranslator class manages all the dictionaries of subclasses via an internal dictionary of dictionaries.  This allows the subclass to be coded as simply and clearly as possible.  If you are trying to squeeze every last cycle out of your CPU, though, your subclass can override the default LookupDict property by referencing its own static dictionary:

    public class MaritalStatus : MagicStringTranslator
    {
        
public const string Married = "M";
        
public const string Divorced = "D";
        
public const string Separated = "X";
        
public const string Single = "S";
        
public const string Widowed = "W";
        
public const string Unknown = "U";

        
public MaritalStatus(string code) : base(code) { }
        // we are maintaining our own Dictionary<string, string> to increase processing efficiency
        private static Dictionary<string, string> myDict;

        
// overrides
        protected override Dictionary<string, string> LookupDict
        {
            
get {return myDict; }
            
set {myDict = value; }
        }
    }

At run-time, the subclass' key/value dictionary is a simple return statement away; the base class' dictionary of dictionaries (and its reliance on discovering the name of the subclass by reflection) is bypassed.  In my testing, this has improved the performance of using the subclass' Value or Description property by over 100%.  However, the performance of the default implementation is already quite good; on my modest desktop box, test code can instantiate and use the properties of the MaritalStatus class (in a loop) 1000 times per millisecond.  Increasing this performance to 2400 times per millisecond does not justify the increased code complexity, in my opinion.  Of course, your situation may differ, so I offer the option just in case.

Dictionary Re-use.  You may also want to re-use a subclass' Dictionary to populate a drop-down list with descriptions and values.  Why should you duplicate any effort, anywhere in your application?  Since the responsibility for knowing how to populate lists would complicate the design of a MagicStringTranslator subclass, you would write a helper method in a utility class instead:

public class UiHelper
{
    
/// <summary>
    /// Uses a Dictionary of key/value pairs as the source of ListItems for a drop-down list
    /// </summary>
    public static void PopulateDropDownList(DropDownList ddl, Dictionary<string, string> dict)
    {
        
foreach (KeyValuePair<string, string> kvp in dict)
        {
            
ListItem li = new ListItem(kvp.Value, kvp.Key);
            ddl.Items.Add(li);
        }
    }
}

Then add a static property to your subclass that furnishes a copy of its Dictionary.  (Of course you wouldn't hand the original to a stranger, which might add or remove key/value pairs at its own whim.)

    public class MaritalStatus : MagicStringTranslator
    {
        
public const string Married = "M";
        
public const string Divorced = "D";
        
public const string Separated = "X";
        
public const string Single = "S";
        
public const string Widowed = "W";
        
public const string Unknown = "U";

        
public MaritalStatus(string code) : base(code) { }

        
public static Dictionary<string, string> KeyValuePairs
        {
            
get
            {
                
MaritalStatus ms = new MaritalStatus(MaritalStatus.Married);
                
Dictionary<string, string> result = new Dictionary<string, string>(ms.LookupDict.Count);
                
foreach (KeyValuePair<string, string> kvp in ms.LookupDict)
                {
                    result.Add(kvp.Key, kvp.Value);
                }
                
return result;
            }
        }
    }

Dictionary Initialization.  You may choose to override the InitializeDictionary method, perhaps by consulting a lookup table in the database.  However, you would forfeit the ability to define and use named constants if you follow this route.

Conclusion

Dealing with magic strings can pose a significant challenge when developing a .NET application.  The built-in solutions (named constants and enums) certainly help, but they have some shortcomings. If you derive a class of named constants from the MagicStringTranslator abstract class, though, you can get data validation and the mapping of magic string data to user-friendly descriptions at essentially no extra cost.

Appendix: Source Code for Helper Classes

    #region CustomDescriptionAttribute

    [
AttributeUsage(AttributeTargets.Field)]
    
public class CustomDescriptionAttribute : Attribute
    {
        
private string desc;

        
public CustomDescriptionAttribute(string desc)
        {
            
this.desc = desc;
        }

        
public string Description
        {
            
get { return desc; }
        }
    }
    #endregion

    #region
Exceptions

    
public class MagicStringException : Exception
    {
        
public MagicStringException(string data) : base(data + " passed to MagicCodeTranslator class") { }
    }

    
public class MagicStringBadValueException : MagicStringException
    {
        
public MagicStringBadValueException(string val): base("Unrecognized value (" + val + ")") { }
    }

    
public class MagicStringBadDescException : MagicStringException
    {
        
public MagicStringBadDescException(string desc) : base("Unrecognized description (" + desc + ")") { }
    }

    #endregion

Acknowledgements

I would like to thank the members of the Lean Programming Yahoo Group who helped me sharpen my analysis by offering feedback on an earlier draft of this article that I posted to the group.  I encourage anyone who is interested in improving his or her ability to write good code in today's business environment to come join us.


Biography
Chris has spent about 14 years in software development, most recently serving as a lead developer for an insurer in South Carolina. He previously served as an Escalation Engineer and as an Application Development Consultant at Microsoft. A graduate of Princeton University's class of 1983, Chris has also taught high-school students, baked bagels, and lived among the West African poor.

button
 
Article Discussion: A Good Solution for "Magic String" Data
Chris Falter posted at 06-Mar-08 01:51
Original Article

 
Yo, Chris!
Peter Bromberg replied to Chris Falter at 06-Mar-08 07:32
Nice article! I was worried that you might have fallen off the face of the earth, or were working in Kenya to resolve political problems.

 
a few suggestions
Rune Funch Søltoft replied to Chris Falter at 24-Apr-08 05:23
First of all nice article. We've been using the named consts and enum solutions quite often but this might be our way or solving Magic String problems in the future.
Reading the article I came to think of a few things that i might change when we're using it.
I would make the construtor and the constants private and expose a few "named singletons" like:
 private static readonly MarriedStatus _married = new MarriedStatus(MarriedConst);
 public static MarriedStatus Married{ get {return _married;}}

I dont see any reason why it should be possible to call the constructor since we know already which calls would be appropriate and with this change we have now change a possible runtime error to a compile time error.

The above will of cause only work in a scenario as the one given in the example where all calls to the constructor will be of the format MarriedStatus(namedconstant). It will not work if calls like MarriedStatus(AMethodReturningStatusMagicString()) in scenarios where that's need the above change will of cause only complicate things.

second change i would make is to the part of code that exposes the dictionary i think i would write that with a yield return in stead (and exposing an IEnumerable<KeyValuePair<x,y>> instead) this way we'll only need to iterate the inner dictionary once (when the third party is using it) instead of at least twice; when created and when used.

as I started saying I like the idea and we might where well use it so thx for the heads up on this approach

 
Re: a few suggestions
Chris Falter replied to Rune Funch Søltoft at 24-Apr-08 01:05
Rune,

Thanks for the kind words.  I'm delighted that you found the MagicStringTranslator to be useful.  One of the things I love about publishing articles is that I get to see terrific ideas, like yours, that improve and extend what I have done.  I hope you will add another comment with your actual code when you finish the IEnumerable<KeyValuePair<x,y>> implementation.  As for the named singletons concept, the preference in our shop has been to keep the MagicStringTranslator subclasses as simple as possible; they do nothing but declare the named constants.  I certainly see the appeal of your named singletons idea, though, and would encourage you to try it out and let us know how it works.



 
IEnumerable<KeyValuePair<string, string>> implementation
Rune Funch Søltoft replied to Chris Falter at 25-Apr-08 04:02
The implementation with yield return could look like this

public static IEnumerable<KeyValuePair<string, string>> KeyValuePairs {
                get {
                    MaritalStatus ms = new MaritalStatus(MaritalStatus.Married);
                    foreach (KeyValuePair<string, string> kvp in ms.LookupDict)
                        yield return kvp;
                }
}
There's two main differences to the above implementation. It's documented more clearly that it's "readonly" there's  no way to change the elements in the IEnumerable (unless u really try hard, figuring out the actual implementation, casting/using reflection and then ur on ur own :-) ) the second difference is that the implementation here use lazy evaluation.

writing code like:
var pairs = MaritalStatus.KeyValuePairs;
will not actually execute the foreach loop.
if u later write something like

var enumerator = pairs.GetEnumerator();
var firstKey  = enumerator.MoveNext() ? enumerator.Current.Key : "not found";
u still havent iterated the entire LookupDict but only the first element

hope this clearify my point from the above post