gretchenleigh
@gretchenleigh
parttimerobot
@parttimerobot asked:

What's engineering at the Internet Archive like?

This is going to be purposefully vague because 1) a lot of obvious reasons and 2) we aren't a monolithic engineering organization and practices vary a lot between teams and departments. This is all stuff you could largely glean from a little sleuthing.

That being said:

  • The overwhelming majority of our infrastructure is run in our own data centers. It's very different from working with cloud-based infra.
  • I work on Archive-It, which is an earned income service best summed up as a SaaS version of Wayback Machine. I'm the team lead and wear a lot of different hats: I handle a lot of the traditional full-stack web development, but I also spend a lot of time working on crawling, processing pipelines, databases, and operations.
  • Archive-It is built on a lot of open-source projects that Archive either leads or makes significant contributions to, most notably two web crawler projects: Heritrix and Brozzler.
  • My team's web stack is pretty standard: We use Django and Flask for a lot of our web backend, but there's some other stuff mixed in there. It's a hodgepodge of frontend, but we like TypeScript + Web Components + Lit.
  • My team currently deploys with Ansible straight to VMs. Some other teams are using different tooling and more containerization, but we really only use containers for local dev environments.

You must log in to comment.