C and C++ are block-structured programming languages that allow nested scopes and shadowing: an identifier declared in an inner scope hides any identifier of the same name declared in an outer scope. For every use of an identifier, a compiler has to determine which declaration it corresponds to:
1 int alpha; 2 int beta; 3 int* p1 = α // the alpha declared in line 1 4 int* p2 = β // the beta declared in line 2 5 int* p3 = γ // Error! gamma hasn't been declared. 6 void f() { // Enter a scope 7 int beta; // This beta shadows the one declared in line 2. 8 int gamma; 9 alpha = 0; // the alpha declared in line 1 10 beta = 0; // the beta declared in line 7 11 gamma = 0; // the gamma declared in line 8 12 { // Enter a scope 13 int alpha; // This alpha shadows the one declared in line 1. 14 int beta; // This beta shadows the one declared in line 7. 15 int beta; // Error! beta already declared in this scope. 16 alpha = 0; // the alpha declared in line 13 17 } // Exit a scope 18 alpha = 0; // the alpha declared in line 1 19 beta = 0; // the beta declared in line 7 20 { // Enter a scope 21 int beta; // This beta shadows the one declared in line 7. 22 beta = 0; // the beta declared in line 21 23 } // Exit a scope 24 } // Exit a scope 25 int* p4 = α // the alpha declared in line 1 26 int* p5 = β // the beta declared in line 2 27 int* p6 = γ // Error! gamma is not in scope. 28 int main() { // Enter a scope 29 int beta; // This beta shadows the one declared in line 2. 30 beta = 0; // the beta declared in line 29 31 f(); // the f declared in line 6 32 } // Exit a scope
Every compiler has a symbol table, a component that keeps track of the identifiers in a program as well as information about them, such as their types. Your assignment is to write a good implementation of a symbol table class. For our purposes, the only information you need to keep track of for the identifiers are the numbers of the lines in which they are declared.
An object of type SymbolTable is intended to be used as follows as the compiler processes the text of your program from top to bottom:
Whenever a scope is entered, the enterScope member function is called.
Whenever a scope is exited, the exitScope member function is called. It returns true unless there have been more calls to exitScope than prior calls to enterScope. (I.e., you can’t leave a scope that hasn’t been entered.)
Whenever an identifier is declared, the declare member function is called to record the line number associated with that declaration. It returns true unless that identifier has already been declared in the current scope or that so-called identifier is the empty string. We’ll consider any non-empty string to be a valid identifier.
Whenever an identifier is used, the find member function is called to determine the line number of the declaration corresponding to that use. It returns that line number, or -1 if the identifier has no active declaration.
For the example above, if the compiler created a SymbolTable named st, it would make the following sequence of calls. We will use assert to show you what the function return values are.
assert(st.declare("alpha", 1)); assert(st.declare("beta", 2)); assert(st.declare("p1", 3)); assert(st.find("alpha") == 1); // the alpha declared in line 1 assert(st.declare("p2", 4)); assert(st.find("beta") == 2); // the beta declared in line 2 assert(st.declare("p3", 5)); assert(st.find("gamma") == -1); // Error! gamma hasn't been declared assert(st.declare("f", 6)); st.enterScope(); assert(st.declare("beta", 7)); assert(st.declare("gamma", 8)); assert(st.find("alpha") == 1); // the alpha declared in line 1 assert(st.find("beta") == 7); // the beta declared in line 7 assert(st.find("gamma") == 8); // the gamma declared in line 8 st.enterScope(); assert(st.declare("alpha", 13)); assert(st.declare("beta", 14)); assert(!st.declare("beta", 15)); // Error! beta already declared assert(st.find("alpha") == 13); // the alpha declared in line 13 assert(st.exitScope()); assert(st.find("alpha") == 1); // the alpha declared in line 1 assert(st.find("beta") == 7); // the beta declared in line 7 st.enterScope(); assert(st.declare("beta", 21)); assert(st.find("beta") == 21); // the beta declared in line 21 assert(st.exitScope()); assert(st.exitScope()); assert(st.declare("p4", 25)); assert(st.find("alpha") == 1); // the alpha declared in line 1 assert(st.declare("p5", 26)); assert(st.find("beta") == 2); // the beta declared in line 2 assert(st.declare("p6", 27)); assert(st.find("gamma") == -1); // Error! gamma is not in scope assert(st.declare("main", 28)); st.enterScope(); assert(st.declare("beta", 29)); assert(st.find("beta") == 29); // the beta declared in line 29 assert(st.find("f") == 6); // the f declared in line 6 assert(st.exitScope());
Here is a declaration of the interface for the SymbolTable class:
class SymbolTable { public: SymbolTable(); ~SymbolTable(); void enterScope(); bool exitScope(); bool declare(std::string id, int lineNum); int find(std::string id) const; };
We have written a correct but inefficient SymbolTable implementation in SymbolTable.slow.cpp. Your assignment is to write a more efficient correct implementation. If you wish, you can do this by starting with our implementation, and changing the data structures and algorithms that implement the class and its member functions. You may add new classes and/or functions if you need to. Your implementation, though, must behave just like the one given, except that it should be faster. Correctness will count for 40% of your score, although if you turn in a correct implementation that is no faster than the inefficient one we gave you, you will get zero correctness points (since you could just as well have turned in that same inefficient implementation).
Of course, we have to give you some assumptions about the way clients will use the SymbolTable so that you know what needs to be faster. There may be thousands of declarations and tens of thousands of uses of identifiers. Scopes may be nested several dozen layers deep. Most identifiers used inside a deeply nested scope are declared many layers outside of that scope.
Performance will count for 50% of your score (and your report for the remaining 10%). To earn any performance points, your implementation must be correct. (Who cares how fast a program is if it produces incorrect results?) This is a critical requirement — you must be certain your implementation produces the correct results and does not do anything undefined. Given that you are starting from a correct implementation, this should be easier than if you had to start from scratch. The faster your program performs on the tests we make, the better your performance score.
You’ll have to come up with your own test cases to assure yourself your program is correct. (The example above is the basic correctness test that the testSymbolTable.cpp file contained in our correct but inefficient implementation performs.) To build your confidence that your program is correct for much larger data sets, the thorough correctness test intestSymbolTable.cpp reads a file named commands.txt that you provide. Each line of that file requests a call to one of the SymbolTable member functions; the four kinds of lines are
{, which requests a call to enterScope
}, which requests a call to exitScope
identifier number, which requests a call to declare(identifier, number)
identifier, which requests a call to find(identifier)
Because the tool we’re about to describe generates large files like these for you, you don’t really need to understand their format, but as an example for the curious, here’s a test filethat produces the same sequence of calls as the example above.
The thorough correctness test compares your functions’ return values against those of a correct (but slow) solution. To run the test, create a project with our SymbolTable.h, ourtestSymbolTable.cpp, and yourSymbolTable.cpp.
The file generateTests.cpp produces an executable that generates a random test file suitable for use for correctness and performance testing. You can specify the file name it should produce and the number of lines that file should have. (You’d copy that file to commands.txt to run the tests.) For example, one run we did specifying 200000 lines produced this test file (1.6MB).
Our performance test takes the same commands.txt file and times how long your implementation takes to process its commands. As an example, we used that test file (1.6MB) to time our correct but inefficient implementation of SymbolTable. We then replaced that SymbolTable.cpp with a better one. By the evening of June 1, this italicized sentence will be replaced by timing results.
Around line 18 of testSymbolTable.cpp is a declaration that specifies the name of the test file, commands.txt. If you leave that string alone, then be aware of where that file must be located: If you launched your program from Visual Studio, it must be in the same folder as your C++ source file; if you’re using Linux, the file must be in the current directory; if you’re using Xcode, the file must be in theyourProject/DerivedData/yourProject/Build/Products/Debug or …/Products/Release folder.
So many students waste time on trying to figure out where to put input files, so it might be easier to just change the “commands.txt” string literal to a full path name like “C:/CS32/P4/commands.txt” (note the forward slashes) or “/Users/yourUserName/CS32/P4/commands.txt”. You don’t even have to use the name commands.txt for the file.
By the evening of June 1, this italicized sentence will be replaced by instructions on how to build your program to run faster than the default build configuration.
Here are some requirements you must meet:
You must not change SymbolTable.h in any way. (In fact, you will not turn that file in; when we test your program, we’ll use ours.) You can change SymbolTable.cpphowever you want, subject to this spec. (Notice that by giving SymbolTable just one private data member, a pointer to a SymbolTableImpl object (which you can define however you want in SymbolTable.cpp), we have given you free rein for the implementation while still preventing you from changing the interface in any way.
You may design interesting data structures of your own, but the only containers from the C++ library you may use are vector, list, stack, and queue (and string). If you want anything fancier, implement it yourself. (It’ll be a good exercise for you.) Although we’re limiting your use of containers from the library, you are free to use library algorithms (e.g., sort).
If you choose to implement a hash table, it must have no more than 20000 buckets. A simple way to test that you handle collisions correctly is to set your number of buckets to 2 or 3 and run a small correctness test.
A project consisting of your SymbolTable.cpp and our SymbolTable.h and testSymbolTable.cpp from our inefficient implementation must build correctly, and when executed, must run to completion without error.
During execution, your program must not perform any undefined actions, such as dereferencing a null or uninitialized pointer.
Expert Answer
// SymbolTable tester
//
// The file commands.txt should contain a sequence of lines of the form
// { which requests a call to enterScope
// } which requests a call to exitScope
// identifier number which requests a call to declare(identifier,number)
// identifier which requests a call to find(identifier)
#include “SymbolTable.h”
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <cstdlib>
using namespace std;
const char* commandFileName = “/Users/pat/Desktop/command.txt”;
class SlowSymbolTable
{
public:
void enterScope();
bool exitScope();
bool declare(const string& id, int lineNum);
int find(const string& id) const;
private:
vector<string> m_ids;
vector<int> m_lines;
};
struct Command
{
static Command* create(string line, int lineno);
Command(string line, int lineno) : m_line(line), m_lineno(lineno) {}
virtual ~Command() {}
virtual void execute(SymbolTable& st) const = 0;
virtual bool executeAndCheck(SymbolTable& st, SlowSymbolTable& sst) const = 0;
string m_line;
int m_lineno;
};
void extractCommands(istream& dataf, vector<Command*>& commands);
string testCorrectness(const vector<Command*>& commands);
void testPerformance(const vector<Command*>& commands);
int main()
{
vector<Command*> commands;
// Basic correctness test
// Notice that this test string contains an excess “}” command at the
// end. This tests whether you correctly return false if you call
// exitScope more times than you call enterScope.
istringstream basicf(
“alpha 1nbeta 2nalphanbetangamman{nbeta 7ngamma 8nalphanbetan”
“gamman{nalpha 13nbeta 14nbeta 15nalphan}nalphanbetan{n”
“beta 21nbetan}n}nalphanbetangamman}n”
);
extractCommands(basicf, commands);
cout << “Basic correctness test: ” << flush;
cout << testCorrectness(commands) << endl;
for (size_t k = 0; k < commands.size(); k++)
delete commands[k];
commands.clear();
// Thorough correctness and performance tests
ifstream thoroughf(commandFileName);
if ( ! thoroughf)
{
cout << “Cannot open ” << commandFileName
<< “, so cannot do thorough correctness or performance tests!”
<< endl;
return 1;
}
extractCommands(thoroughf, commands);
cout << “Thorough correctness test: ” << flush;
cout << testCorrectness(commands) << endl;
cout << “Performance test on ” << commands.size() << ” commands: ” << flush;
testPerformance(commands);
for (size_t k = 0; k < commands.size(); k++)
delete commands[k];
}
struct EnterScopeCmd : public Command
{
EnterScopeCmd(string line, int lineno)
: Command(line, lineno)
{}
virtual void execute(SymbolTable& st) const
{
st.enterScope();
}
virtual bool executeAndCheck(SymbolTable& st, SlowSymbolTable& sst) const
{
st.enterScope();
sst.enterScope();
return true;
}
};
struct ExitScopeCmd : public Command
{
ExitScopeCmd(string line, int lineno)
: Command(line, lineno)
{}
virtual void execute(SymbolTable& st) const
{
st.exitScope();
}
virtual bool executeAndCheck(SymbolTable& st, SlowSymbolTable& sst) const
{
return st.exitScope() == sst.exitScope();
}
};
struct DeclareCmd : public Command
{
DeclareCmd(string id, int lineNum, string line, int lineno)
: Command(line, lineno), m_id(id), m_lineNum(lineNum)
{}
virtual void execute(SymbolTable& st) const
{
st.declare(m_id, m_lineNum);
}
virtual bool executeAndCheck(SymbolTable& st, SlowSymbolTable& sst) const
{
return st.declare(m_id, m_lineNum) == sst.declare(m_id, m_lineNum);
}
string m_id;
int m_lineNum;
};
struct FindCmd : public Command
{
FindCmd(string id, string line, int lineno)
: Command(line, lineno), m_id(id)
{}
virtual void execute(SymbolTable& st) const
{
st.find(m_id);
}
virtual bool executeAndCheck(SymbolTable& st, SlowSymbolTable& sst) const
{
return st.find(m_id) == sst.find(m_id);
}
string m_id;
};
Command* Command::create(string line, int lineno)
{
istringstream iss(line);
string field1;
if (!(iss >> field1))
return NULL;
if (field1 == “{“)
return new EnterScopeCmd(line, lineno);
if (field1 == “}”)
return new ExitScopeCmd(line, lineno);
int field2;
if (!(iss >> field2))
return new FindCmd(field1, line, lineno);
return new DeclareCmd(field1, field2, line, lineno);
}
void extractCommands(istream& dataf, vector<Command*>& commands)
{
string commandLine;
int commandNumber = 0;
while (getline(dataf, commandLine))
{
commandNumber++;
Command* cmd = Command::create(commandLine, commandNumber);
if (cmd != NULL)
commands.push_back(cmd);
}
}
string testCorrectness(const vector<Command*>& commands)
{
SymbolTable st;
SlowSymbolTable sst;
for (size_t k = 0; k < commands.size(); k++)
{
// Check if command agrees with our behavior
if (!commands[k]->executeAndCheck(st, sst))
{
ostringstream msg;
msg << “*** FAILED *** line ” << commands[k]->m_lineno
<< “: “” << commands[k]->m_line << “””;
return msg.str();
}
}
return “Passed”;
}
//========================================================================
// Timer t; // create a timer and start it
// t.start(); // (re)start the timer
// double d = t.elapsed(); // milliseconds since timer was last started
//========================================================================
#if __cplusplus == 201103L // C++11
#include <chrono>
class Timer
{
public:
Timer()
{
start();
}
void start()
{
m_time = std::chrono::high_resolution_clock::now();
}
double elapsed() const
{
std::chrono::duration<double,std::milli> diff =
std::chrono::high_resolution_clock::now() – m_time;
return diff.count();
}
private:
std::chrono::high_resolution_clock::time_point m_time;
};
#elif defined(_MSC_VER) // not C++11, but Windows
#include <windows.h>
class Timer
{
public:
Timer()
{
QueryPerformanceFrequency(&ticksPerSecond);
start();
}
void start()
{
QueryPerformanceCounter(&m_time);
}
double elapsed() const
{
LARGE_INTEGER now;
QueryPerformanceCounter(&now);
return (1000.0 * (now.QuadPart – m_time.QuadPart)) / ticksPerSecond.QuadPart;
}
private:
LARGE_INTEGER m_time;
LARGE_INTEGER ticksPerSecond;
};
#else // not C++11 or Windows, so C++98
#include <ctime>
class Timer
{
public:
Timer()
{
start();
}
void start()
{
m_time = std::clock();
}
double elapsed() const
{
return (1000.0 * (std::clock() – m_time)) / CLOCKS_PER_SEC;
}
private:
std::clock_t m_time;
};
#endif
void testPerformance(const vector<Command*>& commands)
{
double endConstruction;
double endCommands;
Timer timer;
{
SymbolTable st;
endConstruction = timer.elapsed();
for (size_t k = 0; k < commands.size(); k++)
commands[k]->execute(st);
endCommands = timer.elapsed();
}
double end = timer.elapsed();
cout << end << ” milliseconds.” << endl
<< ” Construction: ” << endConstruction << ” msec.” << endl
<< ” Commands: ” << (endCommands – endConstruction) << ” msec.” << endl
<< ” Destruction: ” << (end – endCommands) << ” msec.” << endl;
}
void SlowSymbolTable::enterScope()
{
// Extend the id vector with an empty string that
// serves as a scope entry marker.
m_ids.push_back(“”);
m_lines.push_back(0);
}
bool SlowSymbolTable::exitScope()
{
// Remove ids back to the last scope entry.
while (!m_ids.empty() && m_ids.back() != “”)
{
m_ids.pop_back();
m_lines.pop_back();
}
if (m_ids.empty())
return false;
// Remove the scope entry marker itself.
m_ids.pop_back();
m_lines.pop_back();
return true;
}
bool SlowSymbolTable::declare(const string& id, int lineNum)
{
if (id.empty())
return false;
// Check for another declaration in the same scope.
// Return if found, break out if encounter the scope
// entry marker.
size_t k = m_ids.size();
while (k > 0)
{
k–;
if (m_ids[k].empty())
break;
if (m_ids[k] == id)
return false;
}
// Save the declaration
m_ids.push_back(id);
m_lines.push_back(lineNum);
return true;
}
int SlowSymbolTable::find(const string& id) const
{
if (id.empty())
return -1;
// Search back for the most recent declaration still
// available.
size_t k = m_ids.size();
while (k > 0)
{
k–;
if (m_ids[k] == id)
return m_lines[k];
}
return -1;
}
SymbolTable.cpp
// SymbolTable.cpp
// This is a correct but inefficient implementation of
// the SymbolTable functionality.
#include “SymbolTable.h”
#include <string>
#include <list>
using namespace std;
#define NUM_BUCK 500
// This class does the real work of the implementation.
struct Node
{
Node(const string& id, int lineNum, int scope)
{
m_id = id;
m_line = lineNum;
m_scope = scope;
}
string m_id;
int m_line;
int m_scope;
};
int hashFunc(const string &id)
{
int total = 0;
for(int i=0; i<id.length(); i++)
total = id[i];
int index = (total % NUM_BUCK);
return(index);
}
class SymbolTableImpl
{
public:
SymbolTableImpl() {m_AtScope = 0;}
void enterScope();
bool exitScope();
bool declare(const string& id, int lineNum);
int find(const string& id);
private:
list<Node> m_buckets[NUM_BUCK];
int m_AtScope;
};
void SymbolTableImpl::enterScope()
{
m_AtScope++;
}
bool SymbolTableImpl::exitScope()
{
if (m_AtScope == 0)
return false;
for(int j=0; j<NUM_BUCK; j++)
{
if(!m_buckets[j].empty())
for (list<Node>::iterator n = m_buckets[j].begin(); n != m_buckets[j].end();)
{
if(n->m_scope == m_AtScope)
n = m_buckets[j].erase(n);
else
n++;
}
}
m_AtScope–;
return true;
}
bool SymbolTableImpl::declare(const string& id, int lineNum)
{
int slot = hashFunc(id);
list<Node>* b = &m_buckets[slot];
Node* n = new Node(id, lineNum, m_AtScope);
if(b->empty())
{
b->push_front(*n);
return true;
}
for(list<Node>::iterator p = b->begin(); p != b->end(); p++)
{
if(n->m_id == p->m_id && p->m_scope == m_AtScope)
return false;
}
b->push_front(*n);
return true;
}
int SymbolTableImpl::find(const string& id)
{
if (id.empty())
return -1;
int slot = hashFunc(id);
list<Node>* b = &m_buckets[slot];
for(list<Node>::iterator p = b->begin(); p != b->end(); p++)
{
if(p->m_id == id && p->m_scope <= m_AtScope)
return p->m_line;
}
return -1;
}
//*********** SymbolTable functions **************
// For the most part, these functions simply delegate to SymbolTableImpl’s
// functions.
SymbolTable::SymbolTable()
{
m_impl = new SymbolTableImpl;
}
SymbolTable::~SymbolTable()
{
delete m_impl;
}
void SymbolTable::enterScope()
{
m_impl->enterScope();
}
bool SymbolTable::exitScope()
{
return m_impl->exitScope();
}
bool SymbolTable::declare(const string& id, int lineNum)
{
return m_impl->declare(id, lineNum);
}
int SymbolTable::find(const string& id) const
{
return m_impl->find(id);
}
SymbolTable.h
#ifndef SYMBOLTABLE_INCLUDED
#define SYMBOLTABLE_INCLUDED
#include <string>
class SymbolTableImpl;
class SymbolTable
{
public:
SymbolTable();
~SymbolTable();
void enterScope();
bool exitScope();
bool declare(const std::string& id, int lineNum);
int find(const std::string& id) const;
private:
SymbolTableImpl* m_impl;
// SymbolTables can not be copied or assigned. We enforce this
// by declaring the copy constructor and assignment operator private and
// not implementing them.
SymbolTable(const SymbolTable&);
SymbolTable& operator=(const SymbolTable&);
};
#endif // SYMBOLTABLE_INCLUDED