CS124 at Harvard

name: cs124.md
category: school
date: 7/13/2023

I'm generally of the opinion that the best way to learn things is to be thrown in the deep end. When I was in like 4th grade I was pretty decent at piano, but no where near great. The thing was, one of my friends was learning the Kreutzer Sonata on the violin and I thought it'd be cool if I could accompany him. The Kreutzer Sonata is an incredibly difficult piece that wasn't really suited for 12 year old me, but that didn't stop me. I very slowly learned (the first 8 mins/35 of) the piece and that experience I believe took me from being okay at piano to being pretty good.

I took Data Structures and Algorithms at Harvard in my Junior year of highschool and it was pretty brutal.

I've told this story before

in fact I wrote my common app essay about it. It's definitely one of if not the most pivotal moments in my CS career at least thus far. It's definitely a story of throwing myself in the deep end, considering until that point the most difficult CS class I had taken until then was APCS in my Freshman year.

I didn't go into it thinking like that though--up until this point there hadn't been a single algorithm I could think of that I couldn't implement with relative ease, though that's more a testament to the difficulty of code I had been writing rather than my own skill as a programmer. The first homework assignment I did manage to complete on my own, but honestly every single one of the subsequent assignments I completed with my fiend who was taking the class with me, Pratyush.

prim's algorithm

The first thing to pose a hurdle was the first of three programming assignments. We had to first randomly generate a graph in 1D, 2D, 3D and 4D coordinate spaces in the unit... space? Unit line, unit square, unit cube and unit hypercube respectively. We would then run a minimum spanning tree algorithm on it--your choice of Prim's or Kruskal's. I chose Prim's and was off to the races. We were advised to use a fast language since we would be running the MST algorithm on 2^18 points. I chose C. I did not know C.

learning C

Until that point all my experience had been with higher-level, object oriented languages. It was almost comical how out of my depth I was at first. When I was creating the min-heap I tried doing it by creating a variable-length array where I would just reallocate the entire array every time I modified the length (lol) and then wondered why it was taking 3 years to run on 1024 points.

This was all happening in the span of two weeks, by the way, while I was also balancing two APs and Linear Algebra at Harvard as well. I don't think I could do this now--I feel like highschoolers have almost a limitless tolerance for sleep deprivation.

writing the algorithm

My first try at the algorithm was essentially to directly transcribe the pseudocode into C. I would go through every edge in E and if either the first or second vertex in E was the same as the vertex from extract_min and also wasn’t visited, it would update the dist of the new vertex and add it to the heap.

Once this version of Prim’s algorithm worked for smaller values, I went to slightly higher values. It preformed devastatingly bad at anything above 100. This is because my runtime was absolutely abhorrent. I was doing O(n^2) iterations through the edges, and O(n) for every insert, since I was calling realloc each time I inserted onto the heap which I later realized was an O(n) function.

I'm going to skip most of the optimizations because I eventually hammered it into shape as far as time-complexity goes, bringing it down to the O(n log n) that it's supposed to be. I got it working for up to 2^17 points, albeit very slowly.

I believe the slowdown came from the ram being completely used at all times. At this point I realized it would take 134GB of ram to be able to store n(n+1)/2 edges as doubles so I had to look towards alternate solutions for storing the edges.

At that point, I believed I had two option:

I didn't really like the former because for whatever reason I was convinced I couldn't represent the graph as an adjacency matrix anymore which would make accesses O(n) which would be a no-go

The latter would actually work because computers are very fast at doing simple calculations. My only issue was that this wouldn't work with a 1-dimensional graph because you couldn't translate that into a coordinate plane, so instead of actually calculating a real distance I'd just generate it randomly lol. This is legitimately working code.

double get_distance(const double *vertices, unsigned int p1, unsigned int p2, int dim) {
  if (dim == 0) return drand48();
  double total = 0;
  for (int i = 0; i < dim; i++) {
    double difference = vertices[dim*p1+i] - vertices[dim*p2+i];
    total += difference * difference;
  }
  return sqrtl(total);
}

After all that I brought it down to a space complexity of O(n). Once I got all this fixed I realized my heap code was completely broken and so fast forwards another day or two and I've finally got something that I believe is working perfectly

of course it wasn't working perfectly

What more would you expect? I knew what the correct result should be for this project, it should be ~1.2 as the number of points approaches infinity for dimension 0. It was not--it was always some small factor of 0.0002620

See, until now I had been writing all my code on Windows. I used Cygwin to compile C. See, rand() on windows returns a value from 0 to 2^15-1 (15 bits), which I would then divide RAND_MAX to get my double between 0 and 1. The issue was, with massive amounts of points came massive chances to roll incredibly small distances. This resulted in the minimum distance found by the MST algorithm to be 0 or close to it (most likely 1/(2^15-1)) for each edge it selects.

While now I know I could get a better random by bitwise or-ing a bunch of rand()s together, but at the time I had barely two weeks of experience in C and didn't even know what bitwise operators were. Today I'd do it something like this:

double num = (rand() << 32 | rand() << 16 | rand()) / 1 << 48;

But back then I didn't know I could even do that. My one saving grace was that I read on a forum somewhere that I could get 48 random bits with drand48(), but I couldn't seem to get that on Windows. So naturally with like 2 days left before the deadline, at 10:00PM I'm dual-booting my laptop with ubuntu.

I replaced the code and it worked. It was as simple as changing my entire operating system. I ended up getting a 97/100, which is basically a perfect score since the extra 3pts were for completely wowing the instructors with something incredibly unique. And I wasn't complaining

Everything else

I really don't feel like condensing an entire semester into a single markdown file, but the general trend was I slightly overperformed on the programming assignments and slightly underperformed on the homeworks. It's like that to this day, actually--I'm generally better at implementing the algorithms than creating them, but I can proudly say I'm a little better at the academic bit now than in my Junior year of highschool.

In the end I got a B- which didn't count towards my highschool GPA (though it probably wouldn't have made too much of a difference seeing as I had a pretty bad GPA my Freshman and Sophomore years anyways) and honestly that's about all I could ask for

This class definitely made it easier for me to coast through my Freshman classes at UMass Amherst, since CS187 was basically just "this is what a hashmap is", and CS250, while significantly more difficult, had a lot of the content already covered by CS124 minus all the programming assignments so I was definitely able to cruise for most of the class (except for the bit where I got a 65 on one of the midterms before regrade requests, where I proceeded to fill out a regrade request for basically every single question I got points off on, and got 26 points back bringing it up to a 91, but that's a story for another markdown file).

generic conclusion paragraph

This class was not only the first class I actually found crushingly difficult basically ever. I've gotten Bs in classes before and tried hard in classes before, but until that point I'd never gotten a bad grade in a class I actually tried in (and I don't think I have since?). It felt pretty great, to be honest. Don't get me wrong I definitely hated it in the moment and if I could have, like, fast forwarded through it, I definitely would have. But it's not that moment, huh. Sucks to be me two years ago lol, massive L for junior year me