I want to group & order by unique & new_var but I get an error that the new_var does not exist , I want this to be done in PROC SQL and not a data step as it has over 24million records and space is an issue.

Next off your "test" datastep does not work. You input new_var from what I assume to be your datalines (although datalines is missing), as a numeric, which JAN is not for instance. So the variable gets created as numeric with no data.

Next up is the substr, which is being put into a numeric variable - that will also fail.

Now onto your question. You say you want this to be done in SQL, there is only one technical reason why you would have to use proc sql, and that is if you are passing the SQL code through to a database. Otherwise there is no requirement to use SQL, and in some cases the resource usage will be higher than if you use datastep. So your statement that space is an issue actually works against the use SQL.

Now you say you get an error in the code you are running, please present the code you are running and sufficient test data to be able to check, as running the one I present above will have a new_var variable.

Finally, if your working with 24 million records, then its likely you need to approach the problem differently, use some big data techniques, filter the data down as much as you can, add flags and groups rather than creating multiple datasets, sort only when absolutely necessary. Just switching to using proc sql will not solve these things for you.

Since you read new_var on the input statement without a $ it is assumed to be numeric. Then values such as Jan May Apr are invalid numeric values. Also the assignment

NEW_VAR = SUBSTR(DATE,3,7)

subsequently fails because New_var is numeric.

So new_var is always missing.

Show the actual Proc SQL that your are attempting to see if you are doing something similar in your SQL.

You may need to use

group by calculated new_var

to tell SAS that the variable used for grouping is calculated on the Select clause.

However if you use actual SAS Date values instead of character many things can be done with date functions or formats that may not require adding an additional variable at all, depending on why you need the month abbreviation.