Archive for September 2015

Learning SAS: Character Functions

Like various numeric functions that help us to process numeric data, SAS has a bunch of Character Functions to easily process Characters.

Problem:

Clean the input data where Weight & Height are characters but remove
the units after the numbers & type convert them to numbers

Solution:

/* Clean the input data where Weight & Height are characters but remove 
the units after the numbers & type convert them to numbers */
 
data A15001.A01_nuts;
	input weight $ height $;
	datalines;
	100kgs. 59cm
	180KG 60CM.
	88kg. 160cms
	50kgs 100cm
	;
 
Title 'Source Data';	
proc print data=A15001.A01_nuts; run;Title;
* find() to find 'kg' and 'cm' in the input set
returns the position of the matched string;
 
data A15001.A01_fixit;
	set A15001.A01_nuts(rename = (weight = char_wt
						height = char_ht));				
	if find(char_wt,'kg','i') then	
			weight = input(compress(char_wt,,'kd'),8.);
	if find(char_ht,'cm','i') then	
			height= input(compress(char_ht,,'kd'),8.);
	drop char_:; 
run;
 
Title 'Formatted Data';
proc print data=A15001.A01_fixit; run;Title;

Output:
Learning: 


  1. Understanding Basic character functions 
  2. Converting Character variable to Numeric variable using input (put for the opposite of it)
Saturday, September 19, 2015
Posted by Netbloggy

Learning SAS: Tweaking PROC PRINT to display the data

PROC PRINT in SAS has a lot of options which can help us tweak the way we display our data.

Options like n = caption of summary, firstobs = no. of first observations & obs = last observation to be processed.

Problem 1:

List the first 6 observations in data set Blood. Include only the variables Subject,
WBC (white blood cell), RBC (red blood cell), and Chol. Label the last three
variables “White Blood Cells,” “Red Blood Cells,” and “Cholesterol,” respectively.
Omit the Obs column, and place Subject in the first column. Be sure the column
headings are the variable labels, not the variable names.

Solution: 
 
proc print data=A15001.A01_Blood 
			(obs=6) 
			n = 'No. of Observations = '
			label; /* printing only 6 obs */
	label WBC = 'White Blood Cells'
		RBC = 'Red Blood Cells'
			Chol = 'Cholesterol';
		ID Subject;	 /* omit obs & place the id variable at first */
		var WBC RBC Chol;
run;
 

Output:
Learning:

  1. Different options of PROC PRINT

Problem 2:

List the four observations from data set Blood (3-6). Also group the output report by
BloodType.

Solution:

/* sorting before using grouping in PROC PRINT */
proc sort data=A15001.A01_Blood; by BloodType; run;
 
proc print data=A15001.A01_Blood 
			(firstobs=3 obs=6) /* printing only 4 obs */
			n = 'No. of Observations = '; 
	 	by BloodType;
		ID Subject;	 /* omit obs & place the id variable at first */
		var WBC RBC Chol;
run;
 

Output:


Learning:

  1. Using BY to group the output in PROC PRINT
  2. Using FirstObs to select a start point while displaying the output
Posted by Netbloggy

Learning SAS: Arrays in SAS

Unlike many other programming languages, Arrays in SAS don't store the values but just the pointers to the actual values (very much like C Pointers).

Problem:

Using the SAS data set Scores, create a new, temporary SAS data set (reversed)
where the values of the variables test1–test5 are reversed as follows: 1 -> 5; 2
-> 4; 3 -> 3; 4 -> 2; 5 -> 1.

Solution:


data A15001.A01_scores;
 input ID $ test1-test5;
 datalines;
 001 90 88 92 95 90
 002 64 64 77 72 71
 003 68 69 80 75 70
 004 88 77 66 77 67
 ;
run;
 
data A15001.A01_reversed;
 set A15001.A01_scores;
 array test{5};
 array ques{5} ques1-ques5;
 j = 5;
 do i = 1 to 5;
  ques{j} = test{i};
  put j= i= ques{j}= test{i}=;
  j + (-1);
 end;
 
 do i = 1 to 5;
  test{i} = ques{i};
 end; 
 drop ques1-ques5 i j;
run;
 
Title 'The actual scores';
proc print data=A15001.A01_scores;run;
title;
Title 'The reversed scores';
proc print data=A15001.A01_reversed; run;
title;

Output:
Learning:


  1. How to efficiently use Arrays to reverse a series of numbers
  2. Drop not-required Arrays


Problem:

The passing score on each of five tests is 65, 70, 60, 62, and 68. Using the data here, use a temporary array to count the number of tests passed by each student.

Solution:

data A15001.A01_passing;
 set A15001.A01_scores;
 array passscore{5} _temporary_ (65,70,60,62,68); 
 array test[5];
 rawscore = 0;
 do tst = 1 to 5;
  rawscore + (test{tst} ge passscore{tst});
 end;
 drop tst;
run;
 
proc print data=A15001.A01_passing; run;

Output:
Learning:


  1. Creating temporary array using _temporary_


Download the code here
Posted by Netbloggy

Learning SAS: Numeric Functions

Sometimes it's just easier to use in-built SAS functions while processing the data rather than writing many complex statements in the DATA step. This is a simple peek into some of the numeric functions of SAS.

Problem:

Count the number of missing values for WBC, RBC, and Chol in the Blood data set.
Use the MISSING function to detect missing values.

Solution:
 
data A15001.A01_MissingVals;
 set A15001.A01_Blood;
 if missing(WBC) then Miss_WBC + 1; /* missing() to detect missing and + sum function to increment */
 if missing(RBC) then Miss_RBC + 1;
 if missing(Chol) then Miss_Chol + 1;
run;
 
proc print data=A15001.A01_MissingVals noobs; run;


Output:


Learning:

How to use Missing() function to processing numeric data with missing values

Problem:

Create a new, temporary SAS data set (Miss_Blood) based on the SAS data set
Blood. Set Gender, RBC, and Chol to a missing value if WBC is missing. Use the
MISSING and CALL MISSING functions in this program.

Solution:

data A15001.A01_Miss_Blood;
 set A15001.A01_Blood;
 if missing(WBC) then CALL Missing(of Gender RBC Chol);
run;
 
proc print data=A15001.A01_Miss_Blood; run;

Output:

Learning: 

How to use CALL Routine with Missing function

Download the code here

Learning SAS: Subsetting & Combining SAS Data sets

As we've learnt how to process our input data with conditions and loops, it's also equally important for us to process our SAS datasets - more prominent of which are Subsetting & Combining SAS Data sets.

Problem:

Using the SAS data set Blood, create two temporary SAS data sets by selecting all
subjects with cholesterol levels (Chol) below 100.  Do this using a single DATA step.

Input:

libname A15001 '/folders/myfolders/iSAS/Assignment';
 
data A15001.A01_Blood;
   infile '/folders/myfolders/iSAS/blood.txt' truncover;
   length Gender $ 6 BloodType $ 2 AgeGroup $ 5;
   input Subject 
         Gender 
         BloodType 
         AgeGroup
         WBC 
         RBC 
         Chol;
   label Gender = "Gender"
         BloodType = "Blood Type"
         AgeGroup = "Age Group"
         Chol = "Cholesterol";
run;

Solution:

data A15001.A01_LowCholMale A15001.A01_LowCholFemale;
 set A15001.A01_Blood;
 where Chol lt 100 and not missing(Chol); *only entries of Blood with low chol are processed;
  if Gender = 'Female' then output A15001.A01_LowCholMale;
 else if Gender = 'Male' then output A15001.A01_LowCholFemale; /* else if because it's not mutually exclusive */
run;
 
title 'List of Low Cholestrol Male'; 
proc print data=A15001.A01_LowCholMale noobs;run;
title;
 
title 'List of Low Cholestrol Female'; 
proc print data=A15001.A01_LowCholFemale noobs;run;
title;

Output:




Learning:

  1. Using WHERE to subset SAS Datasets
  2. Creating multiple SAS Datasets in a single DATA step
  3. Controlling the implicit OUTPUT to write in specific dataset based on specific conditions



Learning SAS - Working with Dates

SAS offers a lot of flexible functions in reading and writing Dates. Let's explore some of it.

Problem 1: 

Compute a person's age in years

Input:

001 10/21/1950 05122003 08/10/65 23Dec2005

002 01/01/1960 11122009 09/13/02 02Jan1960

Solution:

data A15001.A01_Four_Dates;
infile '/folders/myfolders/iSAS/a01_dates.txt';
input @1 Subject $3.
@5 DOB mmddyy10.
@16 VisitDate mmddyy8.
@26 TwoDigit mmddyy8.
@34 LastDate date9.;
 
format DOB VisitDate LastDate mmddyy10.
  TwoDigit date9.;
run;
 
proc print data=A15001.A01_Four_Dates; 
*format DOB VisitDate LastDate mmddyy10.
  TwoDigit date9.;
run;
*yearcutoff is to cutoff all those previous years;*options yearcutoff=1910; 
data A15001.A01_Ages;
 set A15001.A01_Four_Dates;
 age1 = yrdif(DOB,VisitDate,'Actual'); *as of visitdate;
 age2 = yrdif(DOB,'05SEP2015'd,'Actual');*as of this date - date constant; 
 age3 = yrdif(DOB,today(),'Actual'); *as of today;
run;
 
title "Listing of Age";
Proc print data=A15001.A01_Ages;
 id subject;
 var DOB VisitDate age1 age2 age3; 
run; 

Output:
Learning:


  • Reading date with date informat and write it in the dataset in a different date format
  • Understanding the global option YEARCUTOFF to change the default yearcutoff to handle two-digit years
  • Calculating Age (difference of years) with yrdif function
  • Understanding Date constant - '19SEP2015'd
  • Understanding today() function that outputs today's date
Problem 2:

Use mdy() & other such function to combine & extract month,week and day from a date and also calculate intervals between two dates.

Solution:

*extracting weekday, day, month and year;
 
data A01_Extract;
 set A15001.A01_Four_Dates;
 day = weekday(DOB);
 dayofmonth= day(DOB);
 Month = month(DOB);
 year = year(DOB);
run;
 
proc print data=A01_Extract noobs;
 var DOB Day -- Year;
run;  
 
*creating date from month,day,year values;
 
data A01_MDY_Ex;
 set A01_Extract;
 dob_date = mdy(month,dayofmonth,year);
 format dob_date mmddyy10.;
run;
 
proc print data=A01_MDY_Ex;
var dob_date;
run;
 
*INTCK to calculate interval bw two dates;
 
data A01_Interval;
 set A01_MDY_Ex;
 yr_interval = INTCK('year',dob_date,today());
 qtr_interval = INTCK('qtr',dob_date,today());
run;
 
proc print data=A01_Interval;;
var dob_date yr_interval qtr_interval;
run; 
 
Output:
Learning:


  • Understanding date functions like mdy(), day(), month(), year()
  • Using INTCK function to calculate intervals (qtr, year, day, week) between two dates


Download the code from here
Friday, September 18, 2015
Posted by Netbloggy

Learning SAS - Looping with Condition (DO WHILE & DO UNTIL)

We learnt about Conditional processing and Iterative Processing, but what if we could combine both of them and use effectively? Hence comes DO WHILE & DO UNTIL.

Problem 1:

Generate a table of integers and squares starting at 1 and ending when the square
value is greater than 100. Use either a DO UNTIL or DO WHILE statement to
accomplish this.

Solution:

data A01_intsqrs;   
 integer = 1;
 do while (square <= 100); /*do until (square > 100);*/
   square = integer ** 2;
   output;
   integer + 1;
 end;
run;
 
proc print data=A01_intsqrs; run;   
 


Output:
Learning:

  • How DO-WHILE & DO-UNTIL works
  • Controlling the loop with a condition

Download the code here

Learning SAS - Iterative Processing : Looping

The primary objective of computer is to compute mathematical computations of big scale faster than human beings. We humans can write down multiplication table of number 5 till 10th level or  maybe 100th level, but we can make computer write down the same till 1000th or more too. And Iterative processing (looping) opens doors for such computing.

Problem 1: 

Using this (Vital) data set, create a new data set (NewVitals) with the following new
variables:
For subjects less than 50 years of age:
If Pulse is less than 70, set PulseGroup equal to Low;
otherwise, set PulseGroup equal to High.
If SBP is less than 130, set SBPGroup equal to Low;
otherwise, set SBPGroup equal to High.
For subjects greater than or equal to 50 years of age:
If Pulse is less than 74, set PulseGroup equal to Low;
otherwise, set PulseGroup equal to High.
If SBP is less than 140, set SBPGroup equal to Low;
otherwise, set SBPGroup equal to High.


libname A15001 '/folders/myfolders/iSAS/Assignment';
 
data A15001.A01_vitals;
 input ID : $3.
 Age
 Pulse
 SBP
 DBP;
 label SBP = "Systolic Blood Pressure"
 DBP = "Diastolic Blood Pressure";
datalines;
001 23 68 120 80
002 55 72 188 96
003 78 82 200 100
004 18 58 110 70
005 43 52 120 82
006 37 74 150 98
007 . 82 140 100
;
 
Solution:


 
data A15001.A01_new_vitals; 
 set A15001.A01_vitals;
 if missing(age) then delete;
 else if age lt 50 then do;
  if pulse lt 70 then PulseGroup = 'Low';
  else PulseGroup = 'High';
  if SBP lt 130 then SBPGroup = 'Low';
  else SBPGroup = 'High'; 
 end; 
 else if age ge 50 then do;
  if pulse lt 74 then PulseGroup ='Low';
  else PulseGroup = 'High';
  if SBP lt 140 then SBPGroup= 'Low';
  else SBPGroup = 'High';
 end; 
run;
 
Title "Listing of New Vitals";
proc print data=A15001.A01_new_vitals noobs;
run;  
Title;

Output:
Learning: 


  • How to read input from another dataset
  • How to use missing() function to identify missing value
  • Using DO-END along with IF-ELSE-ELSE IF to do a set of tasks in more than one step





Problem 2: 

Modify this program so that a new variable, SumSales, representing Sales to date, is
added to the data set. Be sure that the missing value for Sales in month 3 does not

result in a missing value for SumSales.

Solution:

data A01_monthsales;
 input month sales @@;
 SumSales + Sales; * sum function ignores missing value;
datalines;
1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500
9 5100 10 5700 11 6500 12 7500
;
 
Title "Monthly Sales Report";
proc print data=A01_monthsales; run;
Title;

Output:





Learning:


  • An Arithmetic computation can result in Missing value if at least one missing value is encountered
  • Sum function -> Variable + Increment can increment the Variable without any initialization and also can handle missing value
  • Sum function is also a good counter to use
  • @@ - Double Trailing to hold more than one line to receive input



Problem 3: Create a temporary SAS data set from these three lines of data. Each observation should contain Method (A, B, or C), and Score. There should be 30 observations in this data set. Use a DO loop to create the Method variable and remember to use a single trailing @ in your INPUT statement. Provide a listing of this data set using PROC PRINT.

Solution:
data A15001.A01_speed;
 do method='A','B','C';
  do i = 1 to 10;
    input score @;
    output;
   end; 
 end; 
 drop i;
datalines;
250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
350 350 340 290 377 401 380 310 299 399
;  
run;

Output:
Learning:


  • Looping with Categorical Variable (Characters) as Index
  • Nested Looping (Loop inside another Loop)
  • @ - Single Trailing to hold the line for input
  • Controlling the implicit output in the DATA step with OUTPUT statement


Download the code here

Learning SAS - Conditional Processing - IF-THEN-ELSE

The most important part of any programming language is conditional processing. The general syntax goes like this:

if condition then
   --statements;
else if condition then
    -- more statements;
else if condition then
    -- more statements;
...
else
    -- other statements;

Problem 1 : 

Create a temporary SAS data set called School. Using IF and ELSE IF statements, compute two new variables as follows: Grade (numeric), with a value of 6 if Age is 12 and a value of 8 if Age is 13. The quiz grades have numerical equivalents as follows: A = 95, B = 85, C = 75, D = 70, and F = 65. Using this information, compute a course grade (Course) as a weighted average of the Quiz (20%), Midterm (30%) and Final (50%).

 Input Data:

12 A 92 95
12 B 88 88
13 C 78 75
13 A 92 93
12 F 55 62
13 B 88 82

Solution:


data A01_school;
 input Age Quiz : $1. Midterm Final;
 if Age = 12 then Grade = 6;
 else if Age = 13 then Grade = 8;
 Select(Quiz);
  when ('A') Quiz_Grade = 95; 
  when ('B') Quiz_Grade = 85;
  when ('C') Quiz_Grade = 75;
  when ('D') Quiz_Grade = 70;
  when ('F') Quiz_Grade = 65;
  otherwise;
 end;
 Course_Grade = Quiz_Grade*.2 + Midterm*.3 + Final*.5;
 drop Quiz_Grade;  
datalines;
12 A 92 95
12 B 88 88
13 C 78 75
13 A 92 93
12 F 55 62
13 B 88 82
;
run;
 
proc print data=A01_school;
run;

Output:


Learning:



  • Setting Length of a character when it's less than 8 bytes to save memory
  • How to use IF ELSE
  • How to efficiently use SELECT to reduce repetitive IF ELSE
  • Necessary to drop Not-Required Variables 

Time to learn SAS with Free SAS University Edition & Ron Cody's book

In the age of Rs and Pythons, SAS is still the unconquered leader at least in the domain of Banking and Finance. SAS as a statistical package shares many similarities with Mainframe systems. Even after the rise of cloud computing and Distributed systems, Banking & Finance domain couldn't completely move away from Mainframe systems and the same is true with SAS too.

Hence it's important for an aspiring analyst to learn SAS (at least Base SAS) but how do we do that. SAS is an expensive package and that actually makes it tough to get your hands-on with SAS. But as open source counterparts like R & Python are growing, SAS has come down with a Free University Edition to attract help students. Let's use it and get started with SAS.

University Edition Up to Date

Learn How to install SAS University Edition on VirtualBox or VM Player

Fine, we've got the tool the play with but what to do with it? That's exactly what Ron Cody's book teaches us.
Learning SAS by Example: A Programmer's Guide


This is an excellent book with amazing flow of chapters. There couldn't be any better book other than this to get started with SAS. Happy learning :-)


Solving "InternetOpenUrl failed: 'A connection with the server could not be established'" Error in RStudio

Just when I was trying to install a new package in my RStudio using install.package() or the GUI way, my RStudio suddenly threw a long error where "InternetOpenUrl failed: 'A connection with the server could not be established'" repeated often. That was funny because my internet connection was perfect and so googling about the same ended up in a Stack overflow page that had a huge set of instructions which made me worry more. Well, there is always a second search result when the first one is not working and that really helped.

To Solve this error, do this in your RStudio: 

Tools -> Global Options -> Packages and unchecking the "Use Internet Explorer library/proxy for HTTP" option.



It was simply my internal proxy settings to be blamed. After all what would happen if you wanted to watch a cricket match live when it wasn't available in your country :-/







Sunday, September 13, 2015
Posted by Netbloggy

Popular Post

Blogger templates

Total Pageviews

Powered by Blogger.

- Copyright © nulldata -Metrominimalist- Powered by Blogger - Designed by Johanes Djogan -