[Previous] [Next]

Maps

Of all the MFC collection types, maps might be the most interesting. A map, also known as a dictionary, is a table of items keyed by other items. A simple example of a map is a list of the 50 states keyed by each state's two-letter abbreviation. Given a key such as CA, the corresponding state name (California) can be retrieved with a simple function call. Maps are designed so that given a key, the corresponding item can be found in the table very quickly—often with just one lookup. Maps are ideal containers for large amounts of data when lookup performance is of paramount importance. MFC uses maps to implement handle maps (tables that correlate HWNDs to CWnds, HPENs to CPens, and so on) and other internal data structures. It also makes its map classes public, so you can use them to create maps of your own.

The MFC Map Classes

In addition to the template-based map class CMap, which can be specialized to handle specific data types, MFC provides the following type-specific (and non-template-based) map classes. Each class includes member functions for adding and removing items, retrieving items by key, and enumerating all the items in the map.

Type-Specific MFC Map Classes

Class Name Description
CMapWordToPtr Stores void pointers keyed by WORDs
CMapPtrToWord Stores WORDs keyed by void pointers
CMapPtrToPtr Stores void pointers keyed by other void pointers
CMapWordToOb Stores CObject pointers keyed by WORDs
CMapStringToOb Stores CObject pointers keyed by strings
CMapStringToPtr Stores void pointers keyed by strings
CMapStringToString Stores strings keyed by other strings

To demonstrate the semantics of map usage, let's use CMapStringToString to build a simple English-French dictionary containing the names of the days in the week. The following statements build the map.

CMapStringToString map;
map[_T ("Sunday")]    = _T ("Dimanche");
map[_T ("Monday")]    = _T ("Lundi");
map[_T ("Tuesday")]   = _T ("Mardi");
map[_T ("Wednesday")] = _T ("Mercredi");
map[_T ("Thursday")]  = _T ("Jeudi");
map[_T ("Friday")]    = _T ("Vendredi");
map[_T ("Saturday")]  = _T ("Samedi");

In this example, the items stored in the map are the French names for the days of the week. Each item is keyed by a string specifying its English-language equivalent. The [] operator inserts an item and its key into the map. Because CMapStringToString stores keys and items in CString objects, inserting an item copies both its text and the key text to CStrings.

With the map initialized like this, a simple lookup retrieves the French word for Thursday. You perform lookups by calling the map's Lookup function and specifying the key:

CString string;
if (map.Lookup (_T ("Thursday"), string))
    TRACE (_T ("Thursday in English = %s in French\n"), string);

A nonzero return from Lookup indicates that the item was successfully retrieved. A 0 return means that no such item exists—that is, that no item is keyed by the key specified in Lookup's first parameter.

You can remove items from a map with RemoveKey and RemoveAll. GetCount returns the number of items in the map, and IsEmpty indicates whether the map contains any items at all. GetStartPosition and GetNextAssoc permit you to enumerate the map's contents item by item:

POSITION pos = map.GetStartPosition ();
while (pos != NULL) {
    CString strKey, strItem;
    map.GetNextAssoc (pos, strKey, strItem);
    TRACE (_T ("Key=%s, Item=%s\n"), strKey, strItem);
}

Run on the CMapStringToString object shown above, this code produces the following output:

Key=Tuesday, Item=Mardi
Key=Saturday, Item=Samedi
Key=Wednesday, Item=Mercredi
Key=Thursday, Item=Jeudi
Key=Friday, Item=Vendredi
Key=Monday, Item=Lundi
Key=Sunday, Item=Dimanche

As this listing shows, items aren't necessarily stored in the order in which they are added. This is a natural consequence of the fact that maps are not designed to preserve order, but to enable items to be retrieved as quickly as possible. Map architecture is described in the next section.

Incidentally, if you insert an item into a map and that item has the same key as an item that was previously inserted, the new item will replace the old one. It's not possible for an MFC map to contain two or more items identified by the same key.

How Maps Work

Maps wouldn't be very remarkable if it weren't for the fact that lookups are so fast. The key to maximizing performance is minimizing the number of items examined during the search. Sequential searches are the worst, because if the map contains n items, up to n individual lookups could be required. Binary searches are better but require ordered items. The best algorithm is one that can go directly to the requested item without having to do any searching, regardless of the number of items present. Sounds impossible? It's not. If a map is set up properly, MFC's Lookup function can normally find any item with a single lookup. Rarely, in fact, are more than two or three lookups required. Here's why.

Soon after a map is created (usually at the moment the first item is added, but occasionally before), it allocates memory for a hash table, which is actually an array of pointers to CAssoc structures. MFC uses CAssoc structures to represent the items (and keys) that you add to a map. CAssoc is defined this way for CMapStringToString:

struct CAssoc
{
    CAssoc* pNext;
    UINT nHashValue;
    CString key;
    CString value;
};

Whenever an item is added to the map, a new CAssoc structure is created, a hash value is computed from the item's key, and a pointer to the CAssoc structure is copied to the hash table at index i, where i is computed using the following formula:

i = nHashValue % nHashTableSize

nHashValue is the hash value computed from the key; nHashTableSize is the number of elements in the hash table. The default hash table size is 17 entries; I'll discuss how (and why) you change the size in just a moment. If perchance the element at index i already holds a CAssoc pointer, MFC builds a singly linked list of CAssoc structures. The address of the first CAssoc structure in the list is stored in the hash table. The address of the second CAssoc structure is stored in the first CAssoc structure's pNext field, and so on. Figure 5-1 illustrates how the hash table might look after 10 items are added. In this example, five of the items' addresses are stored at unique locations in the hash table, and five others are split between two linked lists whose lengths are 2 and 3, respectively.

Click to view at full size.

Figure 5-1. A hash table containing a combination of unique items and linked lists.

When a map's Lookup function is called, MFC computes a hash value from the input key, converts the hash into an index into the hash table using the formula described in the previous paragraph, and retrieves the CAssoc pointer from the corresponding location in the hash table. Under ideal conditions, there is just one CAssoc pointer at that location, and not a linked list of CAssoc pointers. If that's the case, the item has been found with just one lookup in the map, and its value is retrieved from the CAssoc object. If the CAssoc pointer retrieved from the hash table is the head of a linked list, MFC walks the list until it finds the key it's looking for. A properly built map will never have more than two or three items in a list of CAssoc structures, which means a lookup should never require more than two or three items to be examined.

Optimizing Lookup Efficiency

The efficiency with which lookups are performed depends on two factors:

The hash table size is important. If a map contains 1,000 items but the hash table has room for only 17 CAssoc pointers, the best case is that each entry in the hash table stores the address of the first CAssoc structure in a linked list of 58 or 59 CAssoc structures. This arrangement greatly impedes lookup performance. The hashing algorithm is important too, because no matter how many CAssoc pointers the hash table can hold, if the hashing algorithm generates only a handful of different hash values (and therefore a handful of different hash table indexes), lookup performance is similarly diminished.

The best way to optimize lookup efficiency is to make the hash table as large as possible to minimize the number of collisions. A collision occurs when dissimilar input keys yield the same hash table index. Microsoft recommends setting the hash table size to a value 10 to 20 percent larger than the number of items in the map to strike a reasonable balance between memory consumption and lookup efficiency. To specify the hash table size, call the map's InitHashTable function:

// Assume the map will hold about 1,000 items.
map.InitHashTable (1200); // 1200 = 1000 + 20 percent

For statistical reasons, using a prime number for the hash table size also helps to minimize collisions. Therefore, an even better way to initialize a hash table for 1,000 items is to call InitHashTable this way:

map.InitHashTable (1201);

You should call InitHashTable before adding any items to the map. Attempting to resize the hash table when the map contains one or more items causes an assertion error.

Although the algorithms that MFC uses to generate hash values are adequate for most purposes, you can replace them with your own if you want to. To hash an input key, MFC calls a global template function named HashKey. For most data types, HashKey is implemented this way:

AFX_INLINE UINT AFXAPI HashKey(ARG_KEY key)
{
    // default identity hash - works for most primitive values
    return ((UINT)(void*)(DWORD)key) >> 4;
}
For strings, however, it's implemented this way:
UINT AFXAPI HashKey(LPCWSTR key) // Unicode strings
{
    UINT nHash = 0;
    while (*key)
        nHash = (nHash<<5) + nHash + *key++;
    return nHash;
}

UINT AFXAPI HashKey(LPCSTR key) // ANSI strings
{
    UINT nHash = 0;
    while (*key)
        nHash = (nHash<<5) + nHash + *key++;
    return nHash;
}

To implement your own algorithm for a particular data type, simply write a type-specific HashKey function. You can use the string versions of HashKey shown above as a model.

Creating Type-Safe Map Classes with CMap

As you probably suspected, you can use MFC's CMap template class to create maps for data types that aren't supported by the type-specific map classes. The following example creates a collection of CPoint objects keyed by CStrings and then performs a lookup:

CMap<CString, CString&, CPoint, CPoint&> map;
map[CString (_T ("Vertex1"))] = CPoint (  0,   0);
map[CString (_T ("Vertex2"))] = CPoint (100,   0);
map[CString (_T ("Vertex3"))] = CPoint (100, 100);
map[CString (_T ("Vertex4"))] = CPoint (  0, 100);

CPoint point;
if (map.Lookup (CString (_T ("Vertex3")), point))
    TRACE (_T ("Vertex 3 = (%d,%d)\n"), point.x, point.y);

Because CString is used as a key, this code won't compile unless you override HashKey with a version that is specifically designed to hash CStrings. Here's one possibility:

UINT AFXAPI HashKey(CString& string)
{
    LPCTSTR key = (LPCTSTR) string;
    UINT nHash = 0;
    while (*key)
        nHash = (nHash<<5) + nHash + *key++;
    return nHash;
}

After converting the CString reference into a conventional string pointer, this code hashes the string the same way MFC's LPCSTR/LPCWSTR HashKey functions hash a string.

Like the CList class's Find function, CMap::Lookup uses the CompareElements template function to compare elements. Because CompareElements uses the == operator to perform comparisons, the default implementation is fine for primitive data types and classes that overload the == operator. If you use classes of your own devising as keys in a map, however, you must either overload the == operator in those classes or override CompareElements for individual data types. Refer to the section "Creating Type-Safe List Classes with CList" earlier in this chapter for examples of how to do both.

The CHM file was converted to HTML by chm2web software.