This page contains some advice for non-CS majors who are interested in programming or have some need for programming in their coursework or research. Note that this advice also applies to CS majors who are working on interdisciplinary projects.
The most popular (open-source) programming languages for science and mathematics are currently Python and R. Both are also popular currently because they are used a great deal in data science. There are a great many other languages which have been popular in the past or are still useful in particular situations. But if you are starting out fresh it is likely that Python or R would be your best bet for getting started. Another language/tool of note is GNU Octave, which is an open-source free alternative to MATLAB.
For people new to python (or programming in general) there are many tutorials at python.org's Beginner's Tutorial Page. Python scripts are grouped into "packages" for easy distribution. The most used packages for data analysis are listed here as well as some links to places to learn them:
- numpy: Numpy Absolute Beginner's Guide
- scipy: Scipy Documentation
- pandas: 10 Minutes to Pandas
- matplotlib: Pyplot Tutorial
- pytorch: Tutorials
When working with large datasets, the bottleneck is usually around efficiently iterating and structuring through the data. It useful to learn and understand basic Algorithms and Data Structures as well. The information at Tutorialspoint is a good starting point on learning basic Data Structures and Search/Sort algorithms.
Mention Bioconductor, R information linked from GH 101 page, R for Data Science
R is a programming language used primarily by mathematicians and data analysts. It's used for statistic computing and graphics, though it can be used for data mining as well. R can be used by command line or in third party programs like RStudio. A good tutorial of the language can be found at Tutorialspoint.
Other Resources for R: