DATAmAInd: June 2020

Monday, June 15, 2020

Which Machine Learning requires Feature Scaling(Standardization and Normalization)? And Which not?

Hi folks,

The feature scaling is the most important step in data preparation. Whether to use feature scaling or not depend upon the algorithm you are using.

Many of us, still wondering why feature scaling requires? Why we need to scale the variables?

1. Having features on same scale that can contribute equally to the result. Can enhance the performance of machine learning algorithms.

2. If you don't scale features then large scale variables will dominate the small scale features.
Example: Suppose, the dataset contains X variable(might be of 2 digit number) and Y variable(might be 5-6 digit number) variables. There is huge gap in scale. As we don't want our algorithm to be biased towards one feature.This will effect the accuracy of model or towards the performance of algorithm or might get wrong predictions.

Certain machine learning algorithms such as distance based algorithms , curve based algorithms or matrix factorization, decomposition or dimensionality reduction or gradient descent based algorithms are sensitive towards feature scaling (standardization and normalization for numerical variables).

And there are certain tree based algorithms which are insensitive towards feature scaling as they are rule based algorithms such as Classification and Regression trees, Random Forests or Gradient Boosted decision Trees.

Cons of Feature Scaling: You will lose the original value will transforming to other values. So, there is loss of interpretation of the values.

Standardization v/s Normalization

Standardization:
The idea behind using standardization before applying machine learning algorithm is to transform you data such that its distribution will have mean value 0 and standard deviation as

Mu=0
Sd=1

Normalization:
This method will scale/shift/rescale the data between the range of 0 and 1. So, this is also called as Min-Max scaling.

Cons: This method will make us to lose some of the information of the data such as outliers.

For most of the applications, Standardization method performances better than Normalization.
**Note: For the best possible results, you need to start fitting the actual whole model(default), normalized and standardized and compare the results.

Hope! This is useful.

Happy Learning!😊🙋

Saturday, June 6, 2020

Tips for getting SAS Certified Specialist: Base Programming Using SAS 9.4 (The latest updated exam: Performance Based)

Hi folks,

Are you preparing for SAS Base exam?

Then here, I am...

I appeared for SAS BASE exam and in my first attempt itself score 969 out of 1000 points.

Yeah! Now I’m SAS certified Specialist.

Here, I would like to share my experience to those who are preparing for this exam.

Initially, they will verify your details i.e your id, the place.

Ask you to take the pictures of your current location. i.e your desk, id proof especially driving license or passport will work.

Please do this mandatory stuff before the time of exam schedule.

At the dot time, the proctor will arrive and ask you with some details. And verify whether your compliance issue is there or not. In between the exam, there is no issue of internet, mic or webcam.

You will get two and half an hour for this exam.

Then, proctor will give you 5 mins to explore the virtual machine, they will provide for online examination and ask you to read the guidelines. If any questionnaire then you can raise question here only.

** I would suggest better to adjust the screen of SAS coding window before staring the exam.

Watch this video for more details: https://www.sas.com/en_us/certification/credentials/foundation-tools/base-programming-specialist.html

You may come across around 40-42 questions in total.

These questions are divided into Lab questions and MCQ question.

Lab question holds for about 2-3 marks and that of MCQ as 1 mark each. There is NO negative marking.

So, this makes sense that lab questions are important to score good marks.

Watch this video for more details: https://www.youtube.com/watch?v=OpQ0SMNXiYE&t=1s

They will provide 4 folders named as CERT, INPUT, OUTPUT, RESULT.

You must have to use LIBNAME LIFREF function for the reference of the input data file. Without good command on libname function, you are in serious mess then.

For this get good command over libname, Proc import function.

Even they might ask to import excel file of particular sheet. And for that you don’t have right to download and check within your system.

It is mandatory to use SAS code to figure out the name of those sheets within the excel file.

So, for such questions use Proc dataset function.

Find the sheet name using 2-level file naming method.

** Questions were quite simple.

Eg. what is the value of observation 42 or customer_id =11011 etc

Important is data manipulation.

They will ask you on match-merging of two datasets, one to one merge datasets, concatenation datasets.

Even ask you IF-THEN DO based questions to subset the dataset based on conditions.

There are some ERROR solving questionnaires. So, be prepared on syntax and execution/semantic errors and how to debug that. So, using putlog or put function will be helpful.

They have asked me to write the encoding, label, file size of the imported excel file.

So, use proc contents data=’ ’;

Most of the questions are using proc means, proc freq where maxdec=2;

So, one lab question comprises on 2-3 questions and some questions are in continuous to the previous one.

**The examiner/proctor will check your coding style as well.

**You are allowed to search on SAS documentation and their forums which are on their bookmarks. But, don’t relay on this because it might take longer period to get the desired website page. But, yes you can use if you are stuck in between.

**Be prepared some of their functions were not working as desired. For me, it was FIRST. And LAST. Function.

They will ask you approx. 20 lab questions and 20 MCQ questions.

This MCQ questions were straight forward.

**I would highly recommend to go through this practice exam: https://vle.sas.com/course/view.php?id=436

You can go through SAS documentation too for more in-depth knowledge.

**Get completely familiar with SAS Certified Specialist Prep Guide Base Programming using SAS 9.4

E-Book is also available on SAS website.

Do practice over to the questions and practice set at the end of the book.

This E-book and Practice Questions are more than enough to crack this SAS Base Certification Exam using SAS 9.4

All the Best….

Hope this guidance help you in cracking this exam.

Mastering Algorithmic Programming: Know Which Algorithm to use WHEN

Hi folks,

I'm sharing this generic and practical approach that can be applied to most of the machine learning algorithms...

Don't get confuse!!!

Road map towards the understanding of WHICH Algorithm to use WHEN

First of all need to understand, The Nature of Data:

There are major two types of data:
1. Quantitative: i.e Numeric (comprises of Integer, Continuous)
2. Qualitative i.e Categorical
This Categorical further divided into 2 types: 1. Nominal 2. Ordinal

According to the nature of data, algorithm is chosen:

Hope! This is useful to you.

DATAmAInd