Should I learn Python before R.


so that is now a little longer.

I acquired my first BASIC knowledge when computers were still from Commodore and had 64k memory and that was enough for all tasks. Except for me, because my father didn't allow me to do such a bad thing. (The whole computer kits later become sociopathic and die from lack of sunlight, fresh air and coffee poisoning - that was foreseen in the 80s and warned about the risks accordingly.) Then a completely different BASIC ran on my Amiga 500: that came without line numbers and you actually didn't need a GOTO command and I rubbed my eyes. That's why there were these chic "functions" that I also got to know in PASCAL at school. For my Amiga I reached into the then rather flat pocket and bought a C compiler. That was something completely different and also felt a lot better. During my studies I had other things to do and programming was no longer an issue. I also no longer had a compiler or interpreter for any language and I missed programming. Then as now, I developed allergic reactions to Java and for C there was something complicated command line-related from the Linux world for free, but that was too unwieldy for me and for the short time I was too close to the machine.


The German Python Forum does not lose its data and so I can reconstruct that I registered there in 2006. My first awkward and unsuccessful attempts to walk with R go back around this time. At that time I had misunderstood R as a calculator for statistical tests and not as a programming language, which was a mistake. On the other hand, I learned Python with enthusiasm, but it would never have occurred to me to do statistical data analysis with it. At that time I didn't really have any data to evaluate either. Then, inspired by Python, I started working on weird programming languages ​​and haven't touched Python for a long time, but only because I found the other languages ​​temporarily more interesting. Not because I was dissatisfied with Python. If I had something serious to program that also needs to be waited longer, I would probably rehash my Python knowledge and rely on it.


Not for data analysis, but with the eyes of a man who would like to learn a language that is as versatile as possible, that is powerful but not overloaded, has a clear syntax and guides you towards good habits that really have many batteries included and in the form of the above Forums a really good German-speaking community that has reasonable GUI toolkits and guarantees a long learning curve, i. H. doesn't get boring. Everything is an object, but whoever wants to program imperatively is allowed to program imperatively, whoever wants to program object-oriented is allowed to program object-oriented and whoever wants to program functionally can do that. Only those who are sloppy when indenting the code will immediately get something on their fingers from the language. On the other hand: If you keep a little order when you indent, you don't need curly brackets and everyone here knows how stupid they are on the German keyboard.
I now know that much of the beauty of syntax has been stolen from Haskell. In its pure form in Haskell this is even more elegant, but who uses Haskell? Python has arrived in real life. Python has exceeded the critical mass of users. Learning Python is a future-proof investment. Point. Python is great.


After this ultimate adulation - and I could go on like this for quite a while - why then R? Quite simply because I don't need a general purpose programming language now, but want to deal with data processing. And that's what R is made for and that goes smoothly in R. I don't want to import the scipy.stats module for every t-test and then function calls like

Code: Select all

to type. After all, I don't write structure.conditional.if () for every decision, but simply if (), so I would also like to write t.test () ...
Ok, that was just petty. How about the following example. A question on Stackoverflow:

What is python's equivalent of R's NA?
To be more specific: R has NaN, NA, NULL, Inf and -Inf. NA is generally used when there is missing data. What is python's equivalent?

Legitimate question. The fact that every data type can also have the value NA shows that Rs is deeply rooted in statistical data analysis. What is the accepted answer to that?

Scikit-learn doesn't handle missing values ​​currently. For most machine learning algorithms, it is unclear how to handle missing values, and so we rely on the user of handling them prior to giving them to the algorithm. Numpy doesn't have a "missing" value. Pandas uses NaN, but inside numeric algorithms that might lead to confusion. ... t-of-rs-na
Apparently there is a data value nan in Numpy and a NaN in pandas. This is handled about as elegantly as the object orientation in R, except that I am much more interested in dealing with missing values ​​than with object orientation. Machine learning without NA data? Do I have to do everything myself? I don't really know my way around, but from an outsider perspective it all looks very grafted and inharmonious with these data processing modules.
What is probably much more important to me than these language aspects - but you don't care - is that I learned statistics and R at the same time. There is simply a lot of study material for statistics and R and I get the impression that with Python this is more difficult. When I look at CrossValidated, there have been 18 questions written with the R-tag in the last 24 hours since now, but only one with the Python-tag. Enter "Statistics with Python" and "Statistics with R" on Amazon. The result speaks volumes. Personally, my bottleneck is more the understanding of statistics and machine learning than the programming and I believe that with R I am betting on the right horse. Especially since I need real statistics rather than "data science" with my small data sets.

For the time being, I will stick to R strictly in terms of data analysis. In view of the unmistakable trend towards computationally intensive procedures, I could imagine learning a compiled language again as the next statistical language. After reading half a book about it, I like Scala a lot, but I don't know of any statistical developments. Clojure is certainly more than just a short hype, but it is probably not really suitable for the mainstream. Julia is still too young and immature for me, but a language that has been geared towards data analysis from the start that can be compiled into really fast code, that sounds very tempting. Somebody makes interesting comments about Javascript here: ... am-i-nuts / JavaScript has already been criticized as much as it is unsuitable, that certainly still has a long future. Haskell would be great for mathematical calculations and something like NA could be elegantly represented using the Maybe Monad, for example. Haskell is also really fast, but an integration in R, a "RHaskell" is unlikely to exist.


Always program in such a way that the maxim of your programming style could be the basis of general legislation