Обучение R Basics and Statistical Learning in Action

895.00  excl. VAT

Duration: 30 training hours (4 days);

Instructor: Borislava Borisova;

Format: Intensive, hands-on, in business hours, 8 training hours per day;

Delivery dates:  09 – 12.10.2018 (TUE – FRI);

Registration deadline: 01.10.2018 (TUE);

Maximum participants: 12;

Classroom location: TBA;

Regular Price: 895 EUR excl. VAT . EU registered businesses in EU countries other than Bulgaria will not be charged VAT;

Discount for early payment: YES, until June 1st 2018.

Description

Overview

The aim of the course is to build basic knowledge and skills how to program in R and how to use R for effective data analysis.

Participants will learn generic programming language concepts as they are implemented in the high-level statistical language R. The course covers practical issues in statistical computing which include programming in R, reading data into R, accessing R packages, writing R functions, debugging, profiling R code, and organizing and commenting R code. The second part of the course (R in action) covers topics in statistical data predictive analysis based on examples. Participants will be in touch with regression concepts: basic linear regression, its extensions as logistic regression and the very trendy ideas of Lasso and Ridge penalization. Classification and Regression Tree will also be represented with an example.

The course will continue with guiding the participants in the unsupervised aspect of statistical learning – clustering. Practice examples will cover two of the fundamentals here:K-means and Hierarchical approaches. To draw the line the training will end with a recap of the material to strengthen the knowledge learned and discuss new ideas and perspectives for future learning.

All parts and topics are complimented with in-class examples.

Prerequisites

  • Basic programming knowledge;
  • Understanding of data structure is recommended;
  • Recommended familiarity with the data preparation/ exploratory analysis section of the predictive task;

Course Outline

Introduction to R programming. (7 h)

  • History of R.
  • The RStudio project.
  • Starting and quitting R.
  • Basic features of R.
  • Built-in functions and online help.
  • Logical vectors and relational operators.
  • Data input and output.

Programming statistical graphics (3 h)

  • High-level plots.
  • Choosing a high-level graphic
  • Low-level graphics functions

Programming with R. Loops. (4 h)

  • The for() loop.
  • The if() statement
  • The while() loop
  • The repeat loop, and the break and next statements

Managing complexity through functions. (3 h)

  • What are functions?
  • Scope of variables.
  • Fixing functions.

Some general programming guidelines (1 h)

  • Documentation using #
  • Some general programming guidelines.
  • Top-down design.

Debugging and maintenance (1.5 h)

  • Recognizing that a bug exists.
  • Make the bug reproducible. Identify the cause of the bug.
  • Fixing errors and testing. Look for similar errors elsewhere
  • The browser () and debug () functions.

Efficient programming (0.5 h)

  • Learn your tools.
  • Use efficient algorithms.
  • Measure the time your program takes.
  • Be willing to use different tools and Optimize with care

Predictive analysis – case-based approach (6 hours)

  • Linear regression or Predicting the price of used cars (2 h)
    • Data visualization and Preprocessing;
    • Train and test concept;
    • Assessing linear regression models;
    • Residual analysis;
    • Comparing different regression models;
    • Test set performance;
    • Problems with linear regression;
    • Feature selection;
    • Regularization – Ridge and Lasso;
    • Final conclusion.
  • Logistic regression or Predicting heart disease (2 h)
    • Data visualization and Preprocessing;
    • Classifying with linear regression;
    • Assumptions of logistic regression;
    • Assessing logistic regression models;
    • Test set performance;
    • Regularization with the lasso;
    • Classification metrics and final conclusion;
    • Extensions of the binary logistic classifier;
  • Tree-based Methods or Predicting the authenticity of banknotes (2 h)
    • The idea of tree models;
    • CART;
    • Data visualization and Preprocessing;
    • Tuning model parameters in CART trees;
    • Variable importance in tree models;
    • Final conclusion.

Clustering the data or Customer Segmentation (3 hours)

  • K-means clustering
    • Data visualization and Preprocessing;
    • Clustering process;
    • Number of clusters;
    • Visualization of the final solution.
  • Hierarchical Agglomerative
    • The dendogram;
    • Cut the tree;
    • Visualization of the final solution.
  • Validating cluster solutions.
  • Exercise – Study case: Social Network Clustering Analysis.

Recap (1 h)

About the Instructor, Borislava Borisova

Skilled in R, SQL, complete analytics cycle – descriptive, diagnostic, predictive and prescriptive, and Applied Mathematics. Strong problem-solving skills complimented by a Bachelor’s degree focused in Statistics.

Highly passionate about working with all kinds of data – structured and unstructured. Always on the hunt for discovering new ways to make data analysis both more effective and efficient. Passionate about delivering complicated results the right way to the right audience.

About The Authors, Deyan Lazarov and Nikolay Nikolov

This course is developed by SeedSet Ltd. No part of this course can be reproduced without the written permission of the company and its authors.

Deyan Lazarov, PhD, Author

Deep theoretical knowledge (PhD) in performing advanced data analysis. Turning complex problems into achievable analytical solutions. Extensive hands-on experience in the entire life-cycle of data analysis: business understanding, data gathering, exploratory data analysis, data cleaning, advanced statistical modelling, model validation, technical implementation and deployment, communication of results to end client, ongoing post-implementation monitoring of the analytical deliverables. Highly skilled at delivering given only a business goals. Advanced user of R (7+ years), SPSS (15+ years), Amos, SAS, STATISTICA (15+ years), SQL and MySQL (7+ years) and others.

Nikolay Nikolov, MS, Author

Senior analytics professional helping the business make strategic and day-to-day decisions leveraging data. 10 years of extensive experience in turning data into relevant and actionable insight. Strong, practical knowledge of SAS (5+ years). Deep knowledge of R. Experience with customer credit life cycle strategies. Comprehensive knowledge of Scorecard models development and monitoring. Highly passionate about using advanced technologies within the field of data science – keen on development of innovative solutions and their promotion. Client facing – consultancy experience; client workshops participation; trainings preparations and delivery. Holds a Master’s Degree in Data Science.