Discussion replies.
I’m studying for my Computer Science class and need an explanation.
/0x4*
Reply for below 2 posts. APA format and 100-150 words each post
Discussion 1:
Statistical programming languages provide really a useful need which is used to represent the model, manipulate and think about large quantities of data. A statistical programming is a part of data and statistical sciences and is responsible for providing statistical programming services to the global pharmaceutical research and development division. These services provide summarization and reporting of data, also includes a summarization of aggregate data and submission of documents which in turn regulate the agencies.( https://www.nceas.ucsb.edu/files/scicomp/Dloads/RProgramming/BestFirstRTutorial.pdf)
Data sciences is an exciting field to work in, combining advanced statistical and quantitative skills with real-world programming ability. There are many potential programming languages that aspire the data scientist, the data scientist consider one of the programming languages and start specializing in it. The success of data scientist depends on specificity, generality, productivity, and performance. Among many statistical programming languages, R programming language is widely used in data sciences. R is a programming language and software environment for statistical analysis, graph representations, and reporting. R can be used like a calculator and its principles use to undertake complex mathematical and statistical calculations. It can perform simply as well as complex calculations at ones. (Everitt, B. S., and Hothorn, T. ,2005)
Advantages-
Provides high-quality open source packages. It provides packages for almost every quantitative and statistical application.
-The base installations come with in-built statistical functions and method.-Data visualization is a key strength with the use of libraries.
Disadvantages
-Domain specificity, R is fantastic for statistics and data science purpose but less so for general purpose programming.
-R commands give little tough to memory management, so R consumes all available memory.-Compare to other statistical programming languages, in R quality of some packages is less than perfect.
Works Cited
Everitt, B. S., and Hothorn, T. (2005). A Handbook of Statistical Analysis Using R. Retrieved from http://www.ecostat.unical.it/tarsitano/Didattica/L…
https://www.nceas.ucsb.edu/files/scicomp/Dloads/RP…
Discussion 2:
Importance of statistics in Data science:
The field of statistics is the science of learning from data. Statistics is one of the most important disciplines to provide tools and methods to find structure in and to give deeper insight into data, and the most important discipline to analyze and quantify uncertainty.
Cao (2017) gave a comprehensive definitions of Data Science as the formula shown herein;
“Data science = (statistics + informatics + computing + communication + sociology + management) | (data + environment + thinking).”
This shows that statistics is an important and key component in Data Science world. Finding structure in data and making predictions are some of the most important steps in Data Science. Therefore, statistical methods are essential since they are able to handle many different analytical tasks. Hypothesis testing, Regression and Time series analysis are the few important examples of statistical data analysis methods which are used in Data Science analytics. An example of the use of statistics in data science is whereby most of us have some kind of insurance, whether it is medical, home or any other insurance. Based on an individual application some businesses use statistical models to calculate the risk of giving insurance and that how the companies will decide the premiums and will deliver their services.
What is R:
R is a programing language made by statisticians and data miners for statistical analysis and graphics supported by R for statistical computing. R also provides high-quality graphics and also has some popular libraries which help in analytical parts such as R Markdown and Shiny.
Characteristics of R:
- Open source.
- Interpreted language.
- Data analysis.
- Large community.
- Object-oriented.
- Advance analytics.
- High-end graphics.
Advantages of R over SAS:
- R is open-source software which means anyone can use and change it whereas SAS is an expensive tool.
- Algorithms used in R Programming are open to the public so you can do research on that whereas SAS procedures are not open to the public.
- R has advanced graphical capabilities. Supports various professional graphics templates.
- New statistical and machine learning techniques implemented in R much more quickly than SAS.
- R has most comprehensive statistical analysis packages and new ideas often appear in R first as a good number of statisticians are contributing to its community whereas in SAS it may take some time.
- There are commercial implementations like one from Oracle and Teradata that allow the core R routines to be executed in the database which can eliminate the need for extra coding.
Disadvantages of R over SAS:
- SAS is easy to learn compared to R programing language.
- It is easy to debug code in SAS as error messages are usually more comprehensible than R.
- SAS has dedicated Customer Service Support.
- R is not good programing language for big data analysis.
- R programing language does not provide tabular or spreadsheet view of data whereas SAS can display the data in tabular format.
- One of the limitations of R is that objects generally, be stored in physical memory meaning one may not be able to work huge datasets.
Works Cited
Cao, L. (2017). Data science: a comprehensive overview. ACM Computing Surveys (CSUR), 50(3), 43.
Paradis, E. (2005, September 12). R for Beginners. Retrieved from https://cran.r-project.org/doc/contrib/Paradis-rde…
Peng, R. D. (2016). R programming for data science (p. 471). Leanpub.