I’ve been doing this Coursera Specialization called Bioinformatics (Link to course page)
The course shows us how to apply computational solutions to non-computational, but highly complex problems. The complexity of Bioinformatics comes from the huge amount of data (huge meaning billions), and how different permutations and combinations of those data points produce vastly difference outcomes.
The data points in question are called Nucleotides, which form the basis of DNA. (Check out the videos here for a quick crash course in DNA and stuff). A lot of the problems are solved using Pattern-Finding algorithms, with a huge emphasis on Big-O and optimization over time. Given a huge collection of Nucleotides, we want to find patterns in them to tell us interesting stories. An example would be trying to find a pattern within the DNA that translates to producing the protein that regulates your sleep cycle.
Some of the Computer Science concepts used in the course include Graph Theory (Euler Cycles, Paths, DeBruijn) and Matrix Manipulations (Laplace Rule of Succession). Aside from those, there are other complex solutions that are specific to Bioinformatics (Motif Finding, Profile Generation, Mass Spectrum Consistency).
The choice of language I chose to was Python3 (emphasis on 3) mainly due to flexibility and familiarity. I believe my life would have been much easier if I adopted data science packages such as Numpy and Pandas for large data manipulations.
The coding problems were fun, and extremely challenging, which was what I was looking for. I’ve been building Web applications, Java applications, Kernel modules and a little bit of Mobile Programming, all of which had very similar applications: Building an end-product for consumers. Bioinformatics on the other hand, showed me a whole new set of problems which required me to take a very different thought approach.
It showed me that Python can not only be used for scripting to automate mundane tasks, or build a web framework like Django. Instead, it can also be used to find mutations in your DNA, or how to sequence antibiotics, which is all pretty darn cool.
I’ve completed the first course: Finding Hidden Messages in DNA (Bioinformatics I), and i’m finishing up with the second course: Genome Sequencing (Bioinformatics II). There are 7 courses in total, but I don’t think i’ll be doing all of them. Those two courses alone were enough to satisfy my curiosity in finding out other uses of Python.
My codes that I wrote are on my Github page here: Bioinformatics
I must admit, its not very clean or optimized code, but only because I did it in one iteration. In actual practice, code is meant to be destroyed and rewritten, because the first iteration gives you the idea, and the subsequent ones optimizes over it.
I get lazy sometimes 🙂