Importing several files into SAS

I have a folder containing several CSV files. I wish to write a small SAS program (based on macro) that reads all the files into SAS datasets. My input is a folder with CSV files, my output should be the same set of SAS datasets, stored in a library called, let's say c, or any other name (not in WORK as I want them permanent).

I have tried something but stuck in the middle. First I read all the names of the files in the folder into a datset, which is the list of all file names. Then I create macro variables giving each file an index and another macro variable counting the total number of files. Now I need to write a small macro that will use proc import to import each file into SAS and store it in a dataset. I have a bit of trouble defining the different names (as it is fairly easy to run in a loop and just import a file).

Re: Importing several files into SAS

I've made a few modifications to your code. If the names of your CSV files are valid SAS dataset names, you can omit the VALIDMEMNAME option and the NLITERAL function. I have tested the code with the following "ugly" file names:

Re: Importing several files into SAS

I've made a few modifications to your code. If the names of your CSV files are valid SAS dataset names, you can omit the VALIDMEMNAME option and the NLITERAL function. I have tested the code with the following "ugly" file names:

Re: Importing several files into SAS

I inserted that line in order to prepare names for the output datasets. This could be simplified if you were happy with dataset names such as DS1, DS2, etc. In this case one could write the original CSV file names into dataset labels in order to provide the link between the dataset and the corresponding raw data file.

My suggestion aimed at taking the CSV file names as dataset names (i.e. abc.csv --> abc.sas7bdat), as far as possible, so that it is obvious which raw data file belongs to a given dataset. But there is the potential issue that some of the CSV file names do not comply with the rules for valid SAS dataset names. Problematic cases would include names which start with a digit, contain special characters other than the underscore or are longer than 32 characters.

The expression substr(File,1,min(32, length(File)-4)) results in the CSV file name without the ".CSV" suffix, truncated to 32 characters, if the original name was longer. The NLITERAL function converts this string to a so called SAS name literal. Examples: NLITERAL('abc')='abc' (i.e. valid SAS names are left unchanged), NLITERAL('ab$c')='"ab$c"N' (in general, the dollar sign is not allowed in a dataset name, but in the special syntax "..."N, it is acceptable). If datasets with non-standard names are created, the option setting VALIDMEMNAME=extend is necessary to let SAS accept them. Honestly, I have never used such non-standard dataset names in practice and I would rather recommend to avoid them. So, in your situation I would actually look through the list of CSV file names and see how non-compliant names (if any) could be transformed into valid SAS names. For example, if blanks were the only special characters in the names, I would simply replace them with underscores in the dataset names.

Yes, if the first line in each CSV file contains the variable names, GETNAMES=yes would instruct SAS to name the variables correspondingly. SAS would automatically modify the names appropriately if they were not valid variable names (depending on the setting of system option VALIDVARNAME).

If you have licensed SAS/ACCESS Interface to PC File Formats, you can specify dbms=excel (or dbms=xls) in the PROC IMPORT step to import Excel 97-2003 files. In addition, there are SHEET= and RANGE= options to specify the spreadsheet (if there are more than one) and a range of cells (if only part of the spreadsheet is to be imported). Alternatively, you could use a LIBNAME statement of the form libname myxls excel path="&path\Data\test.xls" to access an Excel workbook as if it was a SAS library. For more details please seethe documentation: http://support.sas.com/documentation/cdl/en/acpcref/67382/PDF/default/acpcref.pdfor some of the papers on this subject, e.g. http://support.sas.com/resources/papers/proceedings10/144-2010.pdf.

I don't have that SAS/ACCESS license (and I don't have Excel installed on my SAS workstation), so I can't test this. When I imported Excel sheets in the past (primarily with SAS 6 and SAS 8), I mostly used DDE (dynamic data exchange), which is an "old", partly outdated technology, but very flexible. For example, I was able to let SAS detect the text color used in an Excel sheet (which carried some information), but that was extremely complicated.

Re: Importing several files into SAS

One of many cautions with using Proc Import and reading variable names from column headers: The discussion that FreelanceReinhard mentions for data set names also appies to variables with the added complication that if you have two or more column headings that start with the same text that proc import may truncate the first occurance and the next could get a very generic varaible name of VARxx where xx refers to the column number.

Example:

Column 10: This is a very long column header related to the annual values of Product X

Column 11: This is a very long column header related to the annual values of Product Y

With maximum variable name length of 32 characters these tow column headers "look the same" to the Proc Import processor.

Re: Importing several files into SAS

@Babloo: The macro processor resolves expressions like &&name&i in two passes:

&& resolves to & and &i is resolved to the content of macro variable i, e.g., to 1 in the first iteration of the "%do i=1 %to ..." loop. Result (in the first iteration of the %DO loop): &name1

&name1 is resolved to the content of macro variable name1.

So, whenever you loop through a list of numbered macro variables with a %DO loop you will probably use this pattern.

If the expression was written as &name&i, the macro processor would try to resolve &name to the content of a macro variable name in the first pass, but this is not what we want in the above situation. There is no macro variable name, hence we would get a warning "WARNING: Apparent symbolic reference NAME not resolved."