OHMS Lessons Learned

July 10th, 2016

Note: I found the following post as an almost complete draft as I was reading some of my unpublished posts. I wrote it around October 1st, 2013, at the beginning of what would end up being my last year at Stanford. Later that quarter I learned a lot more lessons from OHMS, including--the hard way, at 12:30AM the night before a homework was due--not to run SQLite over a network file system. Wanting to write up some of those additional lessons probably contributed to my never publishing this until now. I've corrected typos or completed sentences in four or five places to make this post publishable, but the rest of it is exactly as it was in October 2013:

Recently I've had the pleasure and aggravation of building an interactive course website for the class I'm TAing for this quarter, with my friend and classmate Dennis Sun. We made a lot, a lot of mistakes, but also managed to do a few things right. I thought I'd write down some of the lessons I've learned the hard way over the past few weeks as a reminder to myself and some data for others.

First, a little background: I'm the head TA for the introductory-level statistics class at Stanford, called STATS 60. STATS 60 has about 150 students, five TAs, three graders, and one instructor. Our website is a Python Flask application with a SQLite backend and with SQLAlchemy as our ORM. We write homeworks in XML, with a schema we've defined, and then parse these questions into the database. The course website is hosted on servers run by Stanford, the reason being that this allows us to authenticate students with their Stanford credentials. Students authenticate with their Stanford credentials, and then can view their homeworks and submit answers to them.

1. Long-distance collaboration is hard

Dennis was in France for an internship and then in England for a conference while we were developing the main framework. Meanwhile, I was living in Palo Alto and working in San Francisco at Twitter during the same time. It wasn't until the end of the first week of classes, in fact, that Dennis returned--we actually shipped the website for the first time before Dennis got back.

Collaborating at a distance was quite a challenge, and I definitely missed the benefits of working physically together. We would have most of our discussions over email, GChat, or Bitbucket comments, though sometimes over Google Hangouts. When we did try to use Google Hangouts, inevitably there would be connection issues, and in the spells between connection issues, not being able to write something on a whiteboard or look through code together would make things hard. When I needed to contact Dennis, inevitably it would be 4AM in France.

It's possible that long-distance collaboration would have worked better had the project been further along, and large enough that we could work on separate parts without stepping on each others' toes. But we were separated from when we started the project all the way through to when we first made the website available to students.

2. Complementary strengths are awesome

Dennis and I have some complementary strengths: He has far more experience than me as a TA, knows his way around JavaScript better, and has more experience working with the Stanford servers. I think I edge him out on Python, SQLite, and Git. The fact that we have these complementary skills is awesome--it means that together we could do a much better job than I would have done with a clone of myself, or Dennis would have done with a clone of himself. It is really a tremendous feeling when you've been stumped for a while on something, and then your partner comes along and solves your problem in a few minutes. Some problems that were really hard for me were easy for Dennis, and vice versa.

3. It's okay to ship something imperfect

...because believe me, when I pushed the ship button our website was far, far from perfect. Most amusingly, a few hours before we shipped I realized that when we rendered the HTML of a homework, it would have <item> and <question> tags in it, left over from the original XML homeworks by a bug. Despite these non-HTML tags, the homework page still rendered fine in Chrome and Firefox on my and Dennis's machine. We were under time-pressure to get the first homework out, so instead of fixing it we shipped it anyway.

There were a number of other issues as well: It's reasonably likely that our application has some security vulnerabilities in it, as we didn't spend much time testing or trying to break it. But with only 150 students enrolled, only Stanford affiliates able to access the course website, and database backups every half hour, we have not had any security issues so far.

4. ...but it depends what part is imperfect

In our rush to get the first assignment out, we left some confusing elements in the user interface. Even worse, a few of the answers to the homework problems were actually flat-out wrong. This was a major source of confusion for the students and felt just terrible for me--our first responsibility was to help students to learn and in that case we blew it.

5. Test your code in a production-like environment!

One of the nice things about Flask is that you can run a local development server very easily on your machine. But frankly, this lulled us into a false sense of security, because the local development server ended up being very, very different for us than the environment we got on Stanford's machines. When we ssh'd into the Stanford servers and ran code there, it would execute with the permissions associated with our personal users, like "naftali". But when we ran our webserver, it would run as the "reallylongname.cgi" user, which had substantially less permissions than the "naftali" user.

Moreover, the code would actually execute on a different machine that had far less software installed on it. In particular, the webserver machine had Python 2.6, (when we depended heavily on some features in 2.7), and was far more challenging to install Python packages on. In the end, we managed to figure out how to get Flask deployed in cgi mode, and managed to install the Python packages we needed ourselves.

I'm not saying that I don't like Flask's local development mode, because I actually love it and think it's awesome to be able to test out your code locally. It's just that testing locally can never capture the integration difficulties that may arise when you try to deploy on production servers.

6. Don't let yourself be blocked by things outside your control

...in this case, Stanford's servers and IT. We got lucky on this project--it turned out that on our own we were able to hack Stanford's systems into doing roughly what we needed them to do. But it's also very possible to imagine a situation where we would have needed Stanford's IT people to make some kind of change for us, and that we would not be able to convince them to do it for us. Such a scenario would have meant project failure. Being in a situation where the success of your project depends on factors substantially outside of your control is not fun, and you should strive to avoid it.

To be fair, we got some good use out of Stanford's systems--running our application there handled authentication and some security issues for us. But this use came at the cost of sacrificing our independence.