2008/2009 User Survey Results

Response Survey

Many thanks to the 421 users who responded to this year's User Survey. The response rate is comparable to last year's and both are significantly increased from previous years:

77.4 percent of users who had used more than 250,000 XT4-based hours when the survey opened responded

36.6 percent of users who had used between 10,000 and 250,000 XT4-based hours responded

The overall response rate for the 3,134 authorized users during the survey period was 13.4%.

The MPP hours used by the survey respondents represents 70.2 percent of total NERSC MPP usage as of the end of the survey period.

The PDSF hours used by the PDSF survey respondents represents 36.8 percent of total NERSC PDSF usage as of the end of the survey period.

The respondents represent all six DOE Science Offices and a variety of home institutions: see Respondent Demographics.

The survey responses provide feedback about every aspect of NERSC's operation, help us judge the quality of our services, give DOE information on how well NERSC is doing, and point us to areas we can improve. The survey results are listed below.

You can see the 2008/2009 User Survey text, in which users rated us on a 7-point satisfaction scale. Some areas were also rated on a 3-point importance scale or a 3-point usefulness scale.

SatisfactionScore

Meaning

Number ofTimes Selected

7

Very Satisfied

8,053

6

Mostly Satisfied

6,219

5

Somewhat Satisfied

1,488

4

Neutral

1,032

3

Somewhat Dissatisfied

366

2

Mostly Dissatisfied

100

1

Very Dissatisfied

88

Importance Score

Meaning

3

Very Important

2

Somewhat Important

1

Not Important

Usefulness Score

Meaning

3

Very Useful

2

Somewhat Useful

1

Not at All Useful

The average satisfaction scores from this year's survey ranged from a high of 6.68 (very satisfied) to a low of 4.71 (somewhat satisfied). Across 94 questions, users chose the Very Satisfied rating 8,060 times, and the Very Dissatisfied rating 90 times. The scores for all questions averaged 6.15, and the average score for overall satisfaction with NERSC was 6.21. See All Satisfaction Ratings.

For questions that spanned previous surveys, the change in rating was tested for significance (using the t test at the 90% confidence level). Significant increases in satisfaction are shown in blue; significant decreases in satisfaction are shown in red.

Significance of Change

significant increase (change from 2007)

significant decrease (change from 2007)

not significant

Highlights of the 2009 user survey responses include:

Areas with Highest User Satisfaction

Areas with Lowest User Satisfaction

Largest Increases in Satisfaction

Largest Decreases in Satisfaction

Satisfaction Patterns for Different MPP Respondents

Changes in Satisfaction for Active MPP Respondents

Changes in Satisfaction for PDSF Respondents

Survey Results Lead to Changes at NERSC

Users Provide Overall Comments about NERSC

The complete survey results are listed below and are also available from the left hand navigation column.

Areas with Lowest User Satisfaction

Areas with the lowest user satisfaction are Bassi queue wait times and Franklin uptime. This year only two questions received average scores lower than 5.5, and there were no average scores lower than 4.5. This compares with last year, when 1 average score was lower than 4.5 (Bassi wait time) and 9 were between 4.5 and 5.5.

Largest Increases in Satisfaction

The largest increases in satisfaction over last year's survey are for PDSF interactive services, grid job monitoring, Franklin !/O performance, the PDSF and Jacquard batch queue structure, and network connectivity.

Satisfaction Patterns for Different MPP Respondents

The MPP respondents were classified as "large" (if their usage was over 250,000 hours), "medium" (usage between 10,000 and 250,000 hours) and "small". Satisfaction differences between these three groups are shown in the table below. Comparing their scores with the scores of all the 2007/2008 respondents, this year's smaller users were the most satisfied, and the larger users the least satisfied.

Item

Large MPP Users:

Medium MPP Users:

Small MPP Users:

Num Resp

Avg Score

Change 2007

Num Resp

Avg Score

Change 2007

Num Resp

Avg Score

Change 2007

GRID: Job Monitoring

13

6.54

-0.04

26

6.54

0.46

11

6.64

0.56

SERVICES: Account support

67

6.54

-0.17

130

6.63

-0.07

77

6.79

0.09

OVERALL: Security

72

6.12

-0.23

145

6.44

0.07

82

6.55

0.19

WEB SERVICES: NIM web interface

71

6.35

0.07

135

6.44

0.16

76

6.49

0.21

OVERALL: Network connectivity

74

6.08

-0.05

147

6.35

0.22

84

6.40

0.28

SERVICES: Computer and network operations support (24x7)

67

5.96

-0.39

128

6.14

-0.21

68

6.37

0.02

Jacquard: Batch queue structure

14

5.50

-0.42

36

6.17

0.25

31

6.39

0.47

NETWORK: Remote network performance to/from NERSC

67

5.94

-0.12

90

6.19

0.13

51

6.37

0.32

Jacquard: Disk configuration and I/O performance

13

5.31

-0.67

33

6.30

0.32

31

5.97

-0.01

HPSS: User interface

44

5.82

-0.14

53

6.02

0.06

29

6.38

0.42

OVERALL: Available Computing Hardware

73

5.62

-0.51

151

5.98

-0.14

86

6.20

0.07

OVERALL: Hardware management and configuration

72

5.64

-0.34

142

5.75

-0.23

79

5.89

-0.09

Franklin: Ability to run interactively

56

5.75

0.17

108

5.67

0.09

46

5.93

0.36

Bassi: Batch queue structure

18

5.17

-0.40

58

5.53

-0.03

33

5.94

0.37

OVERALL: Data analysis and visualization facilities

42

5.40

-0.08

75

5.51

0.03

43

6.00

0.50

Franklin: Disk configuration and I/O performance

70

5.41

0.27

133

5.60

0.46

56

5.71

0.57

Jacquard: Batch wait time

15

4.60

-0.87

38

5.37

-0.10

33

5.91

0.44

Franklin: Batch wait time

73

5.45

-0.40

142

5.49

-0.36

61

5.70

-0.14

Bassi: Batch wait time

18

3.61

-0.85

64

4.48

0.03

35

5.23

0.77

Changes in Satisfaction for Active MPP Respondents

The table below includes only those users who have run batch jobs on the MPP systems. It does not include interactive-only users or project managers who do not compute. This group of users showed an increase in satisfaction for the NERSC Information Management (NIM) web interface, which did not show up in the pool of all respondents. This group also showed a decrease in satisfaction for available computing hardware and hardware management and for two of the Jacquard questions.

Item

Num who rated this item as:

Total Responses

Average Score

Std. Dev.

Change from 2007

1

2

3

4

5

6

7

GRID: Job Monitoring

2

2

12

34

62

6.56

0.76

0.48

Franklin: Disk configuration and I/O performance

7

5

13

31

26

105

72

259

5.58

1.46

0.43

OVERALL: Network connectivity

1

6

11

23

105

159

305

6.30

0.94

0.17

WEB SERVICES: NIM web interface

3

4

15

106

154

282

6.43

0.75

0.15

OVERALL: Available Computing Hardware

2

4

7

17

42

129

109

310

5.95

1.13

-0.17

SERVICES: Computer and network operations support (24x7)

3

10

14

21

84

131

263

6.15

1.14

-0.20

Jacquard: Uptime (availability)

1

1

2

4

36

45

89

6.28

0.86

-0.21

OVERALL: Hardware management and configuration

3

1

11

18

57

129

74

294

5.76

1.13

-0.22

Jacquard: Overall

1

1

2

6

5

46

29

90

5.97

1.15

-0.29

Franklin: Batch wait time

4

5

18

21

55

112

61

276

5.53

1.32

-0.32

Changes in Satisfaction for PDSF Respondents

The PDSF users are clearly less satisfied with web services at NERSC compared with the MPP users.

Item

Num who rated this item as:

Total Responses

Average Score

Std. Dev.

Change from 2007

1

2

3

4

5

6

7

PDSF: Ability to run interactively

1

1

1

3

19

15

39

6.08

1.20

0.53

PDSF: Batch queue structure

1

5

14

17

37

6.24

0.89

0.36

WEB SERVICES: NIM web interface

2

4

2

17

14

39

5.95

1.15

-0.33

WEB SERVICES: www.nersc.gov overall

1

1

7

15

10

34

5.94

0.95

-0.44

WEB SERVICES: Ease of finding information

3

2

7

14

6

32

5.56

1.16

-0.49

SERVICES: Allocations process

4

1

4

11

8

28

5.64

1.34

-0.53

TRAINING: Web tutorials

1

4

3

7

4

19

5.47

1.22

-0.67

Survey Results Lead to Changes at NERSC

Every year we institute changes based on the previous year survey. In 2008 and early 2009 NERSC took a number of actions in response to suggestions from the 2007/2008 user survey.

2007/2008 user survey: On the 2007/2008 survey Franklin's Disk configuration and I/O performance received the third lowest average score (5.15).

NERSC response: In the past year NERSC and Cray staff worked extensively on benchmarking and profiling collective I/O performance on Franklin, conducting a detailed exploration into the source of the low performance (less than 1 GB/s write bandwidth) reported by several individual researchers.

A number of issues were explored at various levels of the system/software stack, from the high-level NetCDF calls to MPI-IO optimizations and hints, block and buffer size allocations on individual nodes, Lustre striping parameters, and the underlying I/O hardware.

These metrics were instrumental in making the case for increased I/O hardware and for making software and configuration changes. Once implemented, the cumulative effect of the hardware, software and middleware improvements is that a class of applications is now able to achieve I/O bandwidths in the 6 GB/s range.

On the 2009 survey Franklin's Disk configuration and I/O performance received an average score of 5.60, a statistically significant increase over the previous year by 0.46 points.

2007/2008 user survey: On the 2007/2008 survey Franklin uptime received the second lowest average score (5.04).

NERSC response: In the past year NERSC and Cray assembled a team of about 20 people to thoroughly analyze system component layouts, cross interactions and settings; to review and analyze past causes of failures; and to propose and test software and hardware changes. Intense stabilization efforts took place between March and May, with improvements implemented throughout April and May.

As a result of these efforts, Franklin's overall availability went from an average of 87.6 percent in the six months prior to April to an average of 94.97 percent in the April through July 2009 period. In the same period, Mean Time Between Interrupts improved from an average of 1 day 22 hours h39 minutes to 3 days 20 hours 36 minutes.

The Franklin uptime score in the 2009 survey (which opened in May) did not reflect these improvements. NERSC anticipates an improved score on next year's survey.

2007/2008 user survey: On the 2007/2008survey the two lowest PDSF scores were "Ability to run interactively" and "Disk configuration and I/O performance".

NERSC response: In 2008 NERSC improved the interactive PDSF nodes to more powerful, larger memory nodes. In early 2009, we re-organized the user file systems on PDSF to allow for failover, reducing the impact of hardware failures on the system. We also upgraded the network connectivity to the files ystem server nodes to allow for greater bandwidth. In addition, NERSC added a queue to allow for short debug jobs.

On the 2009 survey the PDSF "Ability to run interactively" score increased significantly by 0.60 points and moved into the "mostly satisfied - high" range. The PDSF "Disk configuration and I/O performance" score increased by 0.41 points, but this increase was not statistically significant (at the 90 percent confidence level).

Users Provide Overall Comments about NERSC

130 users answered the question What does NERSC do best? How does NERSC distinguish itself from other computing centers you have used?

65 respondents mentioned good consulting, staff support and communications;

Nersc is good at communicating with its users, provides large amounts of resources, and is generally one of the most professional centers I've used.

EVERYTHING !!! From the computing centers that I have used NERSC is clearly a leader.

Very easy to use. The excellent website is very helpful as a new user. Ability to run different jobsizes, not only 2048*x as on the BG/P. In an ideal world I'd only run at NERSC!

NERSC tends to be more attuned to the scientific community than other computer centers. Although it has taken years of complaining to achieve, NERSC is better at providing 'permanent' disk storage on its systems than other places.

NERSC's documentation is very good and the consultants are very helpful. A nice thing about NERSC is that they provide a number of machines of different scale with a relatively uniform environment which can be accessed from a global allocation. This gives NERSC a large degree of flexibility compared to other computational facilities.

As a user of PDSF, I have at NERSC all the resources to analyze the STAR data in a speedy and reliable way, knowing that NERSC keep the latest version of data analysis software like ROOT. Thank you for the support.

NERSC has very reliable hardware, excellent administration, and a high throughput. Consultants there have helped me very much with projects and problems and responded with thoughtful messages for me and my problem, as opposed to terse or cryptic pointers to information elsewhere. The HPSS staff helped me set up one of the earliest data sharing archives in 1998, now part of a larger national effort toward Science Gateways. (see: http://www.lbl.gov/cs/Archive/news052609b.html) This archive has a venerable place in the lattice community and is known throughout the community as "The NERSC Archive". In fact until recently, the lingua franca for exchanging lattice QCD data was "NERSC format", a protocol developed for the archive at NERSC.

The quality of the technical staff is outstanding. They are competent, professional, and they can answer questions ranging from the trivial to the complex.

Getting users started! it can take months on other systems.

Very good documentation of systems and available software. Important information readily available on single web page that also contains links to the original documentation.

113 users responded to What can NERSC do to make you more productive? .

The top two areas of concern were Franklin stability and performance, and the need for more computing resources. Users made suggestions in the areas of data storage, job scheduling, software and allocations support, services, PDSF support and networking.

Some of the comments from this section are:

A few months ago I would have said "Fix Franklin please!!" but this has been done since then and Franklin is a LOT more stable. Thanks...

For any users needing multiple processors, Franklin is the only system. The instability, both planned and unplanned downtimes, of Franklin is *incredibly* frustrating. Add in the 24 hour run time limit, it is amazing that anyone can get any work done.

have more machines of different specialties to reduce the queue (waiting) time

Highly reliable, very stable, high performance architectures like Bassi and Jacquard.

When purchasing new systems, there are obviously many factors to consider. I believe that more weight should be given to continuity of architecture and OS. For example, the transition from Seaborg to Bassi was almost seemless for me, whereas the transition from Bassi to Franklin is causing a large drop in productivity, ie porting codes and learning how to work with the new system. I estimate my productivity has dropped by 50% for 6 months. To be clear, this is NOT a problem with Franklin, but rather the cost of porting, and learning how to work on a different architecture.

Put more memory per core on large-scale machines (>8 GB/core). Increase allowed wall clock times to 48 or 96 hours.

Enhance the computing power to meet the constrained the needs of high performance computation.

Save scratch files still longer

Get the compute nodes on Franklin to see NGF or get a new box.

Make more permanent disk space available on Franklin. It needs something line the project disk space to be visible to the compute nodes. The policies need to be changed to be more friendly to the user whose jobs use 10's or 100's pf processors, and stop making those of us who can't allocate 1000's of processors to a single job feel like second-class users. It should be at least as easy to run 100 50 CPU jobs as one 5000 CPU job. The current queue structure makes it difficult if not impossible for some of us to use our allocations.

Enable longer interactive jobs on Franklin login nodes. Some compile jobs require more than 60 minutes, making building a large code base -- or diagnosing problems with the build process -- difficult. Also, it would be useful to be able to run longer visualization jobs without copying large data sets from one systems /scratch to another. This would be for running visualization code that can't be run on compute nodes; for instance, some python packages require shared libraries.

it would be useful if it was easier to see why a job crashed. I find the output tends to be a little terse.

NERSC does an excellent job in adding new software as it becomes available. It is important to continue doing so.

Allocate more time!

We can always use man power to improve the performance and scaling of our codes.

Keep doing what you are doing. I'm particularly interested in the development of the Science Gateways.

23 users responded to If there is anything important to you that is not covered in this survey, please tell us about it. .

The consulting support at NERSC is very good in comparison with other DOE supercomputer centers. I like this very much!

Excellent consulting service. The best among all HPC centers that i know. Thank you and thanks in particular to Zhengji, Katie and Woo-Sun!

I am extremely satisfied with the consulting support - namely Woo-Sun Yang and Zhengji Zhao were very helpful in resolving all my problems with my model runs.

Very satisfied. Keep up the good work.

keep up the longstanding excellent work

Thank you very much for your great job. ... Thanks again your great job to maintain supercomputers so that I can most concentrate on my research project.

keep up the good work!

Excellent job. Keep up the high-quality work.

A great organization devoted to provide SERVICE to the users. Just keep up the excellent work that you are doing.

No suggestions for improvement. I've always thought the NERSC consulting is great!

As the STAR/RNC liaison, I deal with the NERSC consulting services on almost a daily basis. I couldn't be happier with the professionalism, helpfulness, and dedication of the NERSC consulting staff. ...

No. The consulting support team does a great job!

Mixed evaluation / requests and suggestions: 14 responses

The consultants are always friendly and do their best, but it is a tough job.

Hire more consultants...

Rewrite the "New Users Guide." Make sure it is up to date and highly accessible. Having recently started using NERSC systems, I found that I was expected to know a lot of things that I didn't necessarily realize I needed to know. Include better sample programs and scripts.

Consultant support quality varies with the person involved. Some people are excellent and most try to respond quickly. In general response is good, but I have also had important questions sit for months until a general software change, maybe unrelated to the problem, finally solved the problem. Some of the MPP problems/software bugs are larger than just NERSC, so some of the dissatisfaction can't be helped. The lack of a usable interactive debugger on Franklin is a big problem, both for communicating with the consultants and for developing MPP code.

twice this past year i have been stymied, something not working. i explain it clearly. the consultant has been unable to figure it out, suggest things to try to isolate or identify the problem. "must be your system is preventing things," they say. these were both connection problems, from LLNL, behind a firewall. there was no further help the consultants could offer. then, both times, help arrived from someone here who had dealt with the same problem. in both cases there were simple solutions. it would have been a major negative if those two problems had not been solved. my opinion is you need a consultant who is much more savvy in this area. apart from these cases, i am very satisfied with the quality of technical advice.

I think it would be nice if someone could be on duty on holidays and weekends. Normally, I found the queues are not that busy. So I can run quite a few jobs. The problem is that if nobody is on duty, if problems occur, I can not solve it. The computer time is waste.

... The online help desk could be more useful if the answers to all (or most) questions - not just mine - were available to me.

maybe this suggestion is not fair which might be a Cray problem, but I do need your help. I always have problem when I try to use Openmp and PETSc together.

more computing time

Please reduce the times of maintenance and increase the time of users.

... In general, I most use Franklin. Although sometimes I experienced some unexpected problems but I really like it. I felt very satisfied. One thing that I concern is the limited 15Gb space. Somehow, it is too small for me to run a nanocluster calculation. Hopefully, in future there are some chances to improve/enlarge this limited space.

please install gromacs

Make sure the hard disk in the pdsf are safe.

Again, I would suggest to use rsync instead of HSI for backup.

Unhappy: 6 responses

Sometimes tickets seem to get lost after the initial response. It would be good issues are better followed to resolution.

My impression is that at least some of the consultants are not super-strong in Fortran issues. Also, it appears you no longer have any experts in HDF5 IO library support (but I amy be wrong there).

Consultants were not able to help setting up AMBER jobs. Someone must have done it on NERSC. How did they do it?

I often get answers that put me off. For example, if I'm having trouble with X, they'll ask me if I've tried Y. It often takes a week to answer my question. If you could have more consultants with climate expertise that would help. The franklin error messages are misleading at best. Upgrading them would work wonders towards helping me fix my own problems. For example, I get "out of memory" errors for everything from a memory leak to missing restart files to inability to write output data files.

Consulting support asked me to recompile my software. I asked for help, and my request was ignored. I later received an email to ask if I was able to get my software recompiled, as CS wanted to get some performance data. I replied, yes, I recompiled my software, but it was due to me spending much of my time trying to find where certain libraries were located that CS would have known quite easily. This software (AMBER) was the most recent version (10 - available April 2009), yet NERSC still does not offer that version.

It has been several months since I used NERSC computers. The main reason I stopped was frequent crashes of my simulations, which to me seemed unrelated to my software -- i.e. hardware-related. In past experience with NERSC support, resolution of issues was so slow that I either abandoned the project temporarily, or found another computing center to run at.

Services and Communications

Legend

Satisfaction with NERSC Services

How Important are NERSC Services to You?

How useful are NERSC Services to You?

Where do you perform data analysis and visualization of data produced at NERSC?

Where do you perform data analysis and visualization of data produced at NERSC?

Location

Responses

Percent

All at NERSC

23

6.4%

Most at NERSC

44

12.2%

Half at NERSC, half elsewhere

57

15.8%

Most elsewhere

107

29.6%

All elsewhere

106

29.4%

I don't need data analysis or visualization

24

6.6%

Are you well informed of changes?

Do you feel you are adequately informed about NERSC changes?

Answer

Responses

Percent

Yes

362

97.6%

No

9

2.7%

Are you aware of major changes at least one month in advance?

Answer

Responses

Percent

Yes

334

91.5%

No

31

8.5%

Are you aware of planned outages 24 hours in advance?

Answer

Responses

Percent

Yes

347

93.0%

No

22

6.0%

Comments about Services: 18 respondents

Analytics Comments and Suggestions: 4 responses

NERSC visualization tutorials would be nice.

Many years back I felt NERSC was somewhat weak in providing information concerning movie making. My impression is that things improved but I do not know the current situation. In the last couple years i have used Quicktime Pro to create movies but wonder if tools at NRSC provide more or better capability . Again, I am ignorant of the current situation and whether there are good tutorials on the NERSC website.

Improved visualization and analysis for large scale data. I would like to see parallel AVS support. I use the existing AVS Express on davinci extensively, but it is already becoming too small and too slow for the size of (M3D nonlinear plasma simulations) run on Franklin, even for jobs that are small by Franklin standards. One reason I am not pushing too hard to increase the size of my Franklin jobs beyond a few hundred cpus is that it will be very difficult to visualize the data at the higher resolution. I don't really know Visit, but from what I have seen the experts do with it, it just does not compare in quality to a developed AVS Express interface.

I need to catch up by using more analysis at NERSC.

Software Comments and Suggestions: 4 responses

it would be great to have python, numpy and sm all linked on the nersc computers.

please install gromacs

Matlab installed by NERSC does not provide complete math support compare to the matlab installed in my own personal computer.

The latest version of NCL (5.1.1) *correctly installed* on davinci ...

Hardware Comments and Suggestions: 3 responses

Many core SMP machines are extremely useful for many of the theories we use, which are necessarily very communication-heavy.

... A new davinci with more standard hardware (not Itanium) and much more of it

clusters with lots of memory and fast turn-around for short jobs

No additional services needed: 3 responses

Again, NERSC is the flag ship of DOE supercomputing. A great and valuable resource.

I cannot think of anything more than what NERSC offers now.

None right now.

Data Storage Comments and Suggestions: 2 responses

When I analyze my data, I need a place to do it and computer time to do it with. In theory, davinci would be the place. However davinci has very small disk allocations, so I cannot store the data on davinci for long enough to finish a run. I've been using /scratch, but I keep getting messages about needing to reduce my holdings there. Alternatively, I can work on franklin:/project. But that's not accessible from batch jobs, and it doesn't accomplish your goal of having me process my data on your data processing machine.

larger space

Other Suggestions: 3 responses

Lots more help porting code to new platforms. Codes like NIMROD have 5-10 different version, so realize that 5-10 versions need to be ported to new platforms. I think lots of consulting time needs to be allocated to porting code when a new system is brought up.

It would be useful to me if the MOTD were available as a news feed (RSS/ATOM).

I appreciate the generous allocation to our project, but I just found out that our allocation is reduced from 1,000,000 to 750,000. I guess this is due to the policy in place that if some percentage is not used, it will be returned, which I can understand. I also understand that NERSC may return some portion to me, if I request. However, I would suggest that NERSC should consider this carefully. For instance, I normally test and develop my codes during a first few months of the year, and I do teach classes. But in the summer, I have lots of time. Secondly, we are fortunate to have access NERSC computers, so we use the computing time very carefully and save some time for later big runs. In some cases, the referees will ask us to provide more data, but we do not have time to run it. This is the caution that we have to build in when we use NERSC computers. I have more to say about this, but I suggest to consider this. One option would be to let the users choose a time frame when their peak time period would be during the year.

Comments

What does NERSC do best? How does NERSC distinguish itself from other computing centers you have used?

In their comments:

65 users mentioned ease of use, good consulting, good staff support and communications;

NERSC tends to be more attuned to the scientific community than other computer centers. Although it has taken years of complaining to achieve, NERSC is better at providing 'permanent' disk storage on its systems than other places.

The machines are very well-run and well documented. There is a wealth of chemistry software available and compiling our own is easy; the support is great. Allocations are both fair and simple, and we are given plenty of hours to support our projects. The large pool of memory and CPUs per node on Bassi makes it a great machine for the software we use.

Franklin is a superior machine, with lots of cycles for its users. That is, given you have time on the machine, the wait queue is reasonable. The consultant staff is almost always available during their stated time frame, is courteous and evidently aims to please. In my opinion, this is very important for the success of the institution.

Provides a stable long-term environment with hassle-free continuation of the allocation from year to year.

Writing as the PI of a moderate sized repo, NERSC provides a vital computational resource with lightweight admin/management overhead: we are able to get on with our science. User support is very good compared to other centers.

NERSC's documentation is very good and the consultants are very helpful. A nice thing about NERSC is that they provide a number of machines of different scale with a relatively uniform environment which can be accessed from a global allocation. This gives NERSC a large degree of flexibility compared to other computational facilities.

As a user of PDSF, I have at NERSC all the resources to analyze the STAR data in a speedy and reliable way, knowing that NERSC keep the latest version of data analysis software like ROOT. Thank you for the support.

Speed, both in terms of computing performances and in terms of technical support

Fair and balancing queuing on a robust platform (bassi), and the support for technical questions is good.

Customer support is the best. And NERSC has much more resources for access than other computing centers.

NERSC has very reliable hardware, excellent administration, and a high throughput. Consultants there have helped me very much with projects and problems and responded with thoughtful messages for me and my problem, as opposed to terse or cryptic pointers to information elsewhere. The HPSS staff helped me set up one of the earliest data sharing archives in 1998, now part of a larger national effort toward Science Gateways. (see: http://www.lbl.gov/cs/Archive/news052609b.html) This archive has a venerable place in the lattice community and is known throughout the community as "The NERSC Archive". In fact until recently, the lingua franca for exchanging lattice QCD data was "NERSC format", a protocol developed for the archive at NERSC.

Resources and software are superior.

I mostly used franklin for my computing. Franklin was stable most of the time except that period when it changed duel core to qual core. I think nersc has done a great job to keep the supercomputers stable 24x7 which is very important to increase our production. Also nersc consulting support is great in comparison with other computing centers.

I have been using NERSC facilities for over a decade and I acknowledge gratefully that NERSC facility is sine quo non for my research in the investigation of Physics and Chemistry of Superheavy elements. The Relativistic coupled-Cluster calculations carried out by us at NERSC for the atomic and molecular systems of the superheavy elements(SHE) Rutherfordium ( Z=104) through Eka-plutonium element 126 are well nigh impossible to perform at any other computing facility.This is due to extraordinary demands not only on CPU but also on disk storage and Memory requirements. We have published some of our recent results on the various SHE and this has been possible only due to the untiring efforts , help and advice of David Turner and most generous grants of additional CPU times by Dr. Sid Coon and currently by Dr. Ted Barnes. Ms. Francesca Verdier has been a tower of strength and always willing and ready to iron out when we ran into problems . Last but not least I am most grateful to my Principal Investigator and distinguished colleague Prof. Walter Loveland who has most generously supported my theoretical research in the SHE. It is impossible for me to pay my debt to Prof. Loveland except by expressing once again my sincerest thanks to him for his guidance, advice and encouragement throughout our research supported by the US DOE Division of Theory of Nuclear Physics. In conclusion, I express my sincerest thanks especially to all those mentioned above and other very kind and helpful men and women who have made NERSC a most user-friendly place to work in.I look forward to continue using the state-of the art second to none NERSC Supercomputing facility for my research for many years.

NERSC generally provides a reliable computing environment with expert consultants. The hardware is more reliable than NCCS and the consultants are more informed.

Provide the start-of-the-art computing facilities and necessary scientific softwares for the purpose of conducting frontier research.

Provides good machines and cycles

Top of the line production cycle provider in a high performance supercomputing environment

access to a range of systems (Bassi, Jacquard, Franklin) suitable for relatively small jobs (a handfull of cores) to large jobs that need (tens of) thousands of cores.

I had a very pleasant computing experience at NERSC, especially on Franklin. I admire how well and reliably I can run both small and large (several thousand procs) jobs on Franklin. A good thing is, that is is convenient to run also smaller jobs (8-128 procs) which is advantageous for development and testing or for the running of lots of small jobs each with very good parallel performance. Also the available time a job can spend in the queue varies on a reasonably large scale. There is practically no limit on the number of jobs I can submit for consecutive executing each taking a relative short time, utilizing temporarily available processors.

NERSC provides resources that would not otherwise be available.

The NERSC machines are more reliable in terms of uptime.

Size of the clusters.

I like HPSS.

Providing me with the computational resources I need. NERSC is the best managed supercomputing center I know.

I am mostly satisfied with NERSC. Please keep on running the servers well.

NERSC has the most powerful computers I have access to; therefore my research works can't be done without NERSC.

I mostly use PDSF. There, the focus is on data analysis/production, and the computing emphasis reflects this: availability and uptime, which (in my opinion) are excellent.

NERSC provides exceptional computing power and remote data storage. However, these resources still (over the last year) have not reached an acceptable level of reliability. I have not used other computing centers.

Providing large amount of computer power difficult to find elsewhere in a relatively stable fashion. I think the queues work very efficiently, at least compared to other systems I've used.

It's quicker than other computer I have used.

With NERSC I have access to larger machines (franklin) than anywhere else.

This is the only computing center I use. I am pleased with the resources I can use, although uptime on Franklin can be an issue.

Excellent management of the Franklin computing system along with rapid turnaround on medium to large jobs. Scratch files are saved longer than on comparble computers elsewhere.

the connect with pdsf and disk space seem best.

I like Bassi most that is very good for my shared memory parallel jobs with somewhat MPI.

The best thing is the power of clustering in terms of numbers of processors, resources, ...

allow me to run jobs that would be impossible to run on a local machine

convenience of getting an allocation if one works for DOE

NERSC provides accessible large-scale (>2000 core) machines.

The software I am using is well optimized to help fast calculations for my project. It is also much faster than local resource available so that I can get results soon.

Short queue times! Teragrid queues are at least 3+ days. I've never waited more than 12 hours for a job to start on Jacquard.

NERSC has high quality machines and plenty of option for interactive debugging and development.

a fantastic system!!!

I have been very pleased with the queue times on Franklin.

For me, the main distinguishing feature is that big jobs (thousands of processors) go through the queue much faster at NERSC than at other centers.

Support and turnaround times for 'medium size' MPP jobs of a few hundred to a few thousand cpus. (So far, I have just used the few hundred). Since understanding the physics requires parameter scans, this is much more useful than one very large job. Also, since I run highly nonlinear fluid-based simulations, the time step is closely related to the spatial resolution and the medium resolution at this size job runs in a reasonable wall clock time. A large job would require proportionately more time to cover the same simulated interval (changing from a few weeks for a fairly complete run to several months). This is not really affordable. Running several smaller, faster jobs that are designed to be compared against each other also means that software bugs and other problems are more quickly recognized and solved. This is important for the continuing improvement of MPP computer systems. This is a very important computational service that NERSC should continue to support.

Good support services and staff

Very competent and timely user support.

Very helpful, knowledgeable support staff and consultants.

NERSC consulting is the most responsive of any computing center I have used.

NERSC is very responsive to both individual questions and problems and system issues. I get the feeling that there is a team of people trying very hard to keep the computers up and running and the users able to use them.

The best technical and consulting support !

Consulting. Advice. And software updates.

user support

The quality of the technical staff is outstanding. They are competent, professional, and they can answer questions ranging from the trivial to the complex.

Local resource. In general very adaptive to specific needs.

NERSC is user-friendly, its web-site is good (though not great), its staff is very knowledgeable.

NERSC is a great example for user support and outreach.

Getting users started! it can take months on other systems.

Very good at providing access to HPC. Very helpful staff.

Far and away it is the people that work for NERSC and the service they provide, from data analytics to the help desk and everything in between.

t seems like the account support is very helpful and quick to respond.

NERSC is doing a super job on supporting the users. It is this user-friendly environment that keeps me with them all the time. I should add that their action is all for the users, even if that means more complications for them. I appreciate what they are doing.

The support team at NERSC is great, far better than other computing centers I have used!

Mostly satisfied. By its quality of service.

The consultants are very friendly and very helpful.

I feel like the response time is very quick and professional. The fact that it's in the same timezone probably helps on the quick response.

The support is the best that I've experienced, absolutely fantastic. Keep up the good work.

Consulting support! The best among all HPC centers that I know.

better consulting services than others. Generally easier to use.

Resource Management. Advance messages about any updates or downtime of the system. MOTD is important and useful. Can easily find the status of all system in one click.

I am grateful that it is easy to get accounts for new users quickly, even if they are not U.S. citizens. I have always found that NERSC responds very quickly to all requests for assistance, including help desk requests and also requests to Francesca Verdier for information on how to get additional allocations.

Serves a range of users.

NERSC is doing excellent jobs on account support!

NERSC has been very responsive to comments.

NERSC is very user friendly and the staff is excellent, in striking contrast to most other computing centers.

NERSC excels in support, and in active engagement with users. I have not only received responses to my questions but have been called by technical staff actively looking for ways to streamline our computing process which has been very helpful. We have a very productive collaboration with the visualization staff.

MOTD Consulting Keeping users informed More than one batch queue

NERSC is excellent at responding to service requests and being flexible about dealing with problems. They are better at communicating with users than other centers.

NERSC provides quick feedback on issues, regarding information from a team of experts.

Nersc is very good at responding when I email them with concerns or questions.

good user support, good on-line documents

Your technical support staff is really on the ball!

good technical support good user support

service/consulting is helpful and prompt; appears to be efficient at solving and dealing problems (e.g., system failures)

Good web documentation

The information on the website and the reliability of that information.

Great tutorials and user guides on the web pages

A nice and clear website. A strong and quick response team of support.

Very good documentation of systems and available software. Important information readily available on single web page that also contains links to the original documentation.

Things seem to be well organized in NERSC. The web interface is very user-friendly and well maintained.

Web page is nice compared to other computer centers I have used.

user friendly web site service and comprehensive information. efficient use and allocation of computer resource

NERSC provides excellent information on its website on how to use its resources. Further, whenever I've called for help, the staff have been fantastic at helping me track down problems. Both of these features help NERSC to distinguish itself from other computing facilities I've used.

Good software / easy to use environment

NERSC has great tech support and supports their software well. Other computing centers don't install anything and leave you to suffer through building MPI libraries and ScaLAPACK and all kinds of painful things like that. NERSC always has that nasty-to-build, nasty-to-install stuff already built and installed for you, which is very helpful. They are also impressively available at all hours of the day and night for account issues.

NERSC does an excellent job providing state-of-the-art popular applications softwares for most common uses, such as in quantum-chemistry and materials simulations. NERSC also does a good job communicating upgrades/problems with users. Account support is very customer-oriented.

Software support Disk space management Consultant support

Nearby. Good for code development.

NERSC has consistently been a more stable place for development than other computing centers I use. Unfortunately, NERSC is a victim of their own success because then a lot of people try to use the resources, which results in slow turn-around.

Good maintenance about software.

more useful software and STAR environment.

Well-compiled quantum mechanics codes, stable math libraries.

Programs can be easily compiled.

NERSC is a mature system and is relatively easy to use.

Good networking and security

It is the convenience of access that makes NERSC distinguished from other centers. Still maintaining the overall security, NERSC provides excellent points of access that makes users more comfortable with their use of computing facilities. An excellent institute indeed is NERSC.

The non-firewalled network configuration at NERSC is extremely valuable. I can always use scp on my laptop to get results from PDSF disks. Compare this to e.g. the BNL cluster. User home or data disks are not visible from the "gateway" nodes that are the only externally accessible ones. If a laptop is also behind a firewall there is no easy way to to get data from BNL to the laptop.

Ease of login without a SecureID or equivalent makes using NERSC machines much more enjoyable. It also greatly simplifies data transfers when the home institution (PNNL) has very tight security that can get in the way if both sides have very tight security (such as when doing transfers to/from NCAR).

Network.

NERSC is much easier to use than other centers, in particular because of the absence of key fobs.

Relatively open and easy to use. No crazy security hoops to jump through, which is nice.

Other comments

not spelling. it's spelling is undistinguished. i use only one other large computing center, at LLNL, and that not enough to draw a meaningful comparison.

I am ok with current status.

One of the things that NERSC has been doing extremely well is the emphasis on the scientific aspects of the research projects the center supports.

NERSC has been the best computing center I have used. However the I/O issues on Franklin and the fact that the fortran compilers on Franklin are not fully F95 compliant makes life difficult.

It's the only one I have!

In the past I have found franklin to be unreliable (crashes). In addition, before I stopped using Franklin, my jobs would sit in queues for days. I think that queues/allocation should be such that most jobs begin within 1 day.

What can NERSC do to make you more productive? 113 respondents

Less problems with franklin. Speedier resolution when it goes offline.

Improve stability in Franklin.

If FRANKLIN could be more stable and require much less frequent hardware maintenance, my efficiency would be much more improved.

more stable system and ...

[A batch queues with less QC (< 64 nodes, <256 processors) and larger Max Wallclock (3~7 days).] Of course, firstly, the system should be stable enough.

Improve Franklin's runtime! It's incredibly unreliable, and there is at least a shutdown per week...it migth be a very fast machine, but you can't trust it, cause it goes down unexpectedly so often...The bad functioning of franklin has seriously affected the performance of my work, and of many other users of franklin that I've talked to.

More system stable.

Less downtime.

More stability of the Franklin system

Franklin uptime may be improved.

Improve Franklin uptime, ...

keep improving machine stability and decreasing down time.

Less downtime and ...

The beginning of the year had a lot of down time that got in the way of productivity.

Job failure rates on franklin have been crippling. I know you're doing what you can to mitigate this, but I'm still seeing very high failure rates. ...

A few months ago I would have said "Fix Franklin please!!" but this has been done since then and Franklin is a LOT more stable. Thanks...

Franklin up-time has been a bit a stumbling block, but that's obviously not a NERSC-only problem.

Stable machine up time

Continue `hardening' Franklin (I probably did not have to write this.)

The only problem I have is that Franklin was often down when I needed to use it, but that has gotten better.

... better uptimes ...

[Increase the memory size on franklin] and improve her stability

Franklin stability has been largely improved, which is most critical to the productivity.

For any users needing multiple processors, Franklin is the only system. The instability, both planned and unplanned downtimes, of Franklin is *incredibly* frustrating. Add in the 24 hour run time limit, it is amazing that anyone can get any work done.

1) Stop jobs from crashing (maybe you already did this) ...

avoid system and hardware crashes

Avoid node failures!

Franklin is a terrible computer. I often have jobs die and the solution is to resubmit them with no changes. ...

... Better MPP computational reliability. Although it has improved since the worst levels earlier this year, I still regularly have jobs fail periodically for unknown and irreproducible reasons. I write restart files very frequently, particularly for large size jobs, which probably not very efficient even with parallel io. This is roughly 4x more than in seaborg/bassi days (every 500 times steps versus every 2000), while 6-8 hr wallclock jobs now run 4000-6000 time steps, at higher resolution instead of 2000-4000.

improve stability, I/O, and performance:

Improve uptime and file I/O performance of Franklin, [and make these top priorities for the next supercomputer procurement.]

It would be nice to to see higher stability and scalability

Fix the I/O issues on Franklin ...

... The login nodes are very underpowered, I had issues in April with two htars overloading the node. I have often found myself waiting for an 'ls' to complete. I put htars into batch scripts because they will exceed the interactive time limit.

scheduled downtime issues:

Keep scheduled maintenance to a minimum. It's nice that Franklin is getting more stable finally.

... much less frequent hardware maintenance

... and having maintenance on monday instead. thank you

NERSC is about to take out Bassi and Jacquard, but Franklin is most of the time on maintenance; so the only reliable computer that will be left is Davinci. Can you do something to fix Franklin maintenance schedule, the maintenance frequency is too high....it happens to often, and this is not good for the long run.

The main problem was the long waiting queue time esp. on bassi, faster turn around time in queue would increase our productivity.

Reduce the waiting time of the scheduled jobs. [Bassi user]

The time wait on queue is too long. ...[Bassi user]

the chief limit for me is allocation and batch wait time. i do not see how you can make improvements here. [Bassi user]

PLEASE!!!! Change the Queue system on Bassi. It is not only slow, but I can't put enough jobs into the queue to make working there at all useful. I much prefer the system on Franklin, which allows me to run more jobs more quickly.

have more machines of different specialties to reduce the queue (waiting) time [Franklin / Jacquard / DaVinci user]

As the number of users inevitably increases, I hope that the queuing time goes inversely proportional with the increasing user number counterintuitively. [Franklin / Jacquard / Bassi user]

decrease the queue time per job. [Franklin user]

... and somehow reduce queue wait time for the average user on Franklin.

Fix the batch and queue system. The queues in the past have been absurdly long..forcing me to use the debug queue over and over and limiting what I can run at NERSC. [Franklin user]

... faster turnaround ... [Franklin user]

... 2) Decrease pending job time [Franklin user]

Architecture suggestions:

Highly reliable, very stable, high performance architectures like Bassi and Jacquard.

Provide more resources that have 95+% uptime.

[Improve uptime and file I/O performance of Franklin,] and make these top priorities for the next supercomputer procurement.

Keep BASSI.

Most of our codes will port seamlessly to Franklin, but decommissioning Bassi will inevitably hit our projects hard.

have more machines of different specialties to reduce the queue (waiting) time

The majority of our cpu cycles are spent on ab initio electronic structure calculations. In principle Jacquard and Franklin would be very attractive systems for us to run on. Unfortunately, these applications are very I/O intensive. The global scratch space on these clusters makes running these electronic structure codes on them very inefficient. We have attempted to run these codes (primarily Molpro) in parallel across more than one node on Franklin and Jacquard, and this has proved to be extremely inefficient on Franklin. Trying to do this on Jacquard crashed the compute nodes. This makes running big jobs at NERSC largely counterproductive.

When purchasing new systems, there are obviously many factors to consider. I believe that more weight should be given to continuity of architecture and OS. For example, the transition from Seaborg to Bassi was almost seemless for me, whereas the transition from Bassi to Franklin is causing a large drop in productivity, ie porting codes and learning how to work with the new system. I estimate my productivity has dropped by 50% for 6 months. To be clear, this is NOT a problem with Franklin, but rather the cost of porting, and learning how to work on a different architecture.

At the moment, my group has shifted most of our supercomputing to NASA Ames, where the available systems (Columbia, Pleiades, and Schirra) and the visualization hardware and staff are better suited to our needs. I hope that NERSC will upgrade to more powerful systems like these soon.

Get some data processing machines [and tools] that actually work

NERSC needs a large vector processor machine to go with the Cray XT multi-core machine

provide more memory:

Larger memory quota for HOME directory. ... [Jacquard user]

Increase the per-core memory of the machines.

Put more memory per core on large-scale machines (>8 GB/core). ...

... , increase memory per processor. [Franklin / Bassi user]

Increase the memory size on franklin ...

provide more cycles:

Enhance the computing power to meet the constrained the needs of high performance computation.

Make more permanent disk space available on Franklin. It needs something like the project disk space to be visible to the compute nodes. ...

Get the compute nodes on Franklin to see NGF or get a new box.

Improved Franklin/NGF integration. [Better remote download services]

... I am very much looking forward to universal home directories and to having franklin:/project accessible during batch job runs.

Improve HPSS interfaces:

... better interface to the mass storage system [Franklin user]

The slowing of network access to NERSC may be understandable as the result of increased usage, but the denial of service attack by archival storage that has recently interfered with my work is not so readily explained. Apparently, archive now refuses service for any more than one hsi session (to my UID). This eliminates (as though by design) the option of uploads to archive from multiple UIUC machines. This also potentially eliminates NERSC archival usability for access to outputs from our projects, whether or not generated on NERSC. The failure of hsi on data transfer nodes dtn0[1,2].nersc.gov is an additional unpleasant surprise from NERSC. The Web pages indicate that this should work, but it does not.

... Data storage tools. htar does not work on longer file names, which are the easiest way to transparently index different simulations. The old pipeline commands from hsi to tar no longer seem to work to read files out of hpss. The entire tar file is read out to disk. so I have had to cut down the size of tar files and try to avoid accessing some of the larger old files. Different computers (eg, Franklin and davinci) have problems with each other's hsi/tar utilities, so the file has to be read out on the computer it was stored on, then transferred. This requires both computers to be up simultaneously. ...

Job scheduling suggestions: 16 comments

more support for mid range jobs / longer wall times:

... The policies need to be changed to be more friendly to the user whose jobs use 10's or 100's pf processors, and stop making those of us who can't allocate 1000's of processors to a single job feel like second-class users. It should be at least as easy to run 100 50 CPU jobs as one 5000 CPU job. The current queue structure makes it difficult if not impossible for some of us to use our allocations. [Franklin / Bassi user]

It would be nice if a subset of nodes allowed wallclock times up to 3 or 4 days. [Franklin / Jacquard user]

Since Franklin is now stabilized, longer wall-time limits for queues will attract more jobs.

[Put more memory per core on large-scale machines (>8 GB/core).] Increase allowed wall clock times to 48 or 96 hours.

... Add in the 24 hour run time limit, it is amazing that anyone can get any work done. [Franklin user]

Many of the cases I simulate have to run for a longer time (several days) but do not use a tremendous amount of nodes (say 500). I wonder whether it is feasible to have a queue for such long-time runs. [Franklin user]

More simultaneous aprun commands. [Franklin user]

more interactive / debug support:

It can be difficult to get interactive time for tests & debugging, particularly if more than a few nodes are needed. The 30 minute limit is fine, but more nodes should be available. ... [Franklin / Bassi user]

... Also, it would be useful to be able to run longer visualization jobs without copying large data sets from one systems /scratch to another. This would be for running visualization code that can't be run on compute nodes; for instance, some python packages require shared libraries. [Franklin user]

Enable longer interactive jobs on Franklin login nodes. Some compile jobs require more than 60 minutes, making building a large code base -- or diagnosing problems with the build process -- difficult. ...

The login nodes are very underpowered, ... I put htars into batch scripts because they will exceed the interactive time limit. [Franklin user]

better job information:

it would be useful if it was easier to see why a job crashed. I find the output tends to be a little terse. [Franklin user]

Queue wait times are not always consistent. I suggest that an estimated wait time be given after a job is queued, either on the website queue list or with the qstat command. Also, it would make my life easier if the job list invoked by the qstat command on franklin showed the number of cores for each job. Right now that column is blank. [Franklin / Jacquard user]

some more featured job/queue monitors [Franklin user]

Software suggestions: 13 comments

... and more support for various computational chemistry codes. [Franklin user]

NERSC does an excellent job in adding new software as it becomes available. It is important to continue doing so.

I would like for NERSC to add Gaussview 4. [Bassi / Franklin / Jacquard user]

Install more development tools, like Git and Valgrind.

i would like to be able to use numpy, python and sm all together at nersc. [DaVinci user]

... Better remote download services

I'd like to see a fully developed gfortran environment. This would be compatible with the linux, open software systems many of us use, and I think could be more stable and responsive than what is available from some of the for-profits. gcc is at the heart of a great deal our OSs, seems like gfortran might satisfy our scientific computing needs. [Franklin user]

The whole compiling paradigm is not as much productive as it could be (as it is in other computing centers I have used). Compilers themselves are great but module loading should be easier and more effective. [Bassi / Jacquard user]

Continue and resolve work with HDF5 developers on parallel I/O issues with flexible domains. ...

My only complaint so far is the lack of distributed version control software, such as Mercurial: http://www.selenic.com/mercurial/wiki/ It's pretty ridiculous that the best repository option you have is subversion. [Jacquard user]

Get some data processing [machines and] tools that actually work

... and install the Intel Compliers on Franklin

All of theses are also more general limitations of MPP computation) Better interactive MPP debugging tools for MPP codes !!!! The only really usable method on Franklin is print statements, since. much of the code development requires testing on multiple processors. DDT, especially recent versions (last year or so) are not very informative. Totalview was impossible over the web. (NX tunneling works very well on davinci to speed up interactive guis - consider installing it on the MPP computers). Better visualization and data analysis for larger MPP jobs. The tools I use now are at their limits. I have a lot more data that I am unable to digest for presentation. For example, I would like to make movies of a number of quantities from my simulations, but I would have to extract and match the frames by hand from many different files, one for each time slice. ...

Allocations suggestions: 11 comments

need a larger allocation:

the chief limit for me is allocation and ...

More allocation and more effective use of the allocation.

Increase my allocation... Seriously, you're very good.

Larger allocation?

Please allocate more CPU time to me.

provide larger allocations

the only thing that NERSC can make is to put me unlimit time, but I know that is impossible and I'm very satisfy with NERSC

Allocate more time!

Faster allocation: Currently, we do not have enough available computational time on NERSC, and we need a new allocation. NERSC provides excellent computational resources, and we will be happy to use them as soon as the allocation process is successfully finished.

improve allocation management:

Get me a grant :^). More flexibility in the allocation use would be nice. Sometimes we front-load our research and other times it's towards the end. We are punished for not using the resource at a constant rate and sometimes research just doesn't work that way.

for long term users with proven productivity make the allocation process easier. If people are producing peer reviewed papers using NERSC resources, make it QUICKER to get allocations when possible.

More or Better Services: 10 comments

improved web and communications services:

Up to date help pages!

It is becoming significantly less important now, but NERSC could have done much better at easing the learning curve of using the systems. I could have accomplished a great deal more already if I had known exactly how each system worked. Make sure all the information pages are up to date, and include comprehensive information, not just random tidbits.

NERSC could improve user's manuals.

Keep doing what you are doing. I'm particularly interested in the development of the Science Gateways.

It would be good if the NIM website and the www.nersc.gov website did not require separate login to go between them. ...

... The search tool for the web site does not work very well.

They should allow the users (1) to upload their published papers on-line and (2) to have annual user conference to communicate each other and explain to the general public.

1. Make this survey shorter! 2. Send announcements (e.g. for maintenance) per email with an attached iCal or ics file such that it can easily be imported in a calendar program. Can you create an online calendar to which one can subscribe with common calendar programs?

more consulting help:

More willing to provide consulting help that requires more than 5 minutes of a consultant's attention. We are all pretty good with computers, so the problems that plague us may take several or many hours to resolve and your help is much appreciated. [Bassi / Franklin user]

We can always use man power to improve the performance and scaling of our codes.

The main problem I have is not being able to run any batch jobs when usage is heavy. It seems that when both STAR an d ATLAS are running jobs, it is impossible for my group's jobs to be run. This means that our jobs may sit in the queue for days at a time without any progress. It is very frustrating for our work to be brought to a complete standstill when other groups are using the system, especially since our needs are very small in comparison.

Shorter procurement cycles at PDSF.

make sure the safe of the hard disk where the data are saved.

The interactive session to the PDSF usually extremely slow. I'm not sure this is due to the network connectivity since it has also been seen when I've connected from the LBNL site. The speed of the connection is even slower than I've connected to the other places, like RCF at BNL. It would be very helpful for us to do the data analysis at the PDSF by improving the slowness of interactive session.

extend hard disk storage capability

larger disk space

faster and smarter NERSC: My 1st hope is to make pdsf more efficient. Because I am a Asian user, I hope I can run more jobs at our night. When I get up in the morning, I can deal with the gotten results. Could our server align the jobs by the world time zone? On the other hand, I feel the data transfer rate is not fast enough for me, when I transfer the big files from US to China. So my other hope is for it to be faster in some day. Anyway, I wish NERSC keep going dynamically.

Network suggestions: 4 comments

... increased file transfer speed for the case of very large files between NERSC and ANL would also be a nice feature, though I usually extract from large files what I need and transfer only that.

Help users overcome the impact of high-latency network connections for terminal settings. Home connections, hotel connections, etc. all become close to unusable because of latency.

If there is anything important to you that is not covered in this survey, please tell us about it. 23 respondents

Areas not covered by the survey: 6 comments

pdsf website

More survey about allocation time will be important to users like us.

No survey on the individual software satisfaction.

Runtimes on Franklin vary a lot for the same job (this is after the upgrade).

I think it would be wise to ask the users about what they would like to see in the next procurements, from the next gen viz machine to replace davinci to the big iron.

Changes in service over the last year, five years. What did you do well before that you don't do well now? What are you doing well now that was a problem before? New acquisitions and the transition from old to new can be addressed in detail. This is a big computer science type issue that physical scientists need help with.

Additional feedback - Franklin: 6 comments

I've been using NERSC for 12 years, and this is first time when the whole Scratch file system was lost!!! And you've lost it on both Franklin and Jacquard! What are you doing there? It will take me a lot of time to recover all lost data and code upgrades.

Franklin has vastly improved over the last year, I hope the stability gains continue.

Franklin is the only one I am using now. It is not always stable. I don't know why.

I do not know where to vent my frustration with the poor performing franklin login nodes. The login nodes are relatively slow compared to basic workstations as well as being highly used, which of course makes them even slower. Very difficult to compile C++ code and do other basic tasks... Of course, some of this login node slowness, but not all, is likely due to lustre. And I didn't see where I could report that lustre has been difficult to work with. Losing all of my data on /scratch was particularly painful. The amount of space (and especially inodes) given to users is simply too small. I realize users can request more space (and I do), but I don't feel that the work I'm doing is particularly special with regard to disk space. It just seems unbalanced to have such a powerful machine and such a small amount of space to work with. I wish there was an easier way to give/take files to users on the machine. Creating a /project directory is too much overhead for simple give/takes.

Congratulations for Franklin and the general maintenance of NERSC systems!

There was a period that franklin was quite unstable. I am satisfied with franklin except for this problem.

Additional feedback - allocations: 4 comments

It is very important to renew allocation time (get more resources).

It would be much better to accept applications for large allocations quarterly. With the annual application currently used, one has to guess what funding will be in place to be able to use the allocation and plan ahead. Then, if the funding does not match up with expectations, one is left with a lot of left over cycles.

About the allocations process - I am a junior faculty member at a University, and I would like to comment about the allocation reductions. I realize that it is important to have a program whereby unused or underused hours should be reallocated. However, as a junior faculty member who is establishing a research group, a good portion of my computational resources will be used over the summer months, at least until my students get established in research. As such, I am finding that I am becoming susceptible to the first and second quarter allocation reductions. The current system negatively impacts junior faculty members disproportionately. I am not sure what to suggest to make it better, though perhaps some lenience could be given to junior faculty members as their research groups are established. Thanks!

Could I get more CPU time from other PIs who have a lot of surplus during the first half year? And this CPU time may be just specified that to Franklin so that such a management would not affect others.

Additional feedback - other: 7 comments

The ticket system is designed to support individual users, but fails badly when there is a group-wide issue. One should be able to make it possible for others to add comments on one's ticket, but currently there is no way to even make a ticket visible to other users.

I am running NCAR climate models, and I guess there are other people who do that. I wish there is a web page (I think there used to be a web page but I cannot find it any more) so that we can get some help from it.

Thanks for such a fantastic resource (people and systems)!!! Mike Barad

NERSC is the BEST!

PDSF is pricing itself out of the market.

Many of the services are used by others in my group, I am a low level user so my answers may not be the most informative for some categories.

We do almost all of our post-processing using NCL which does not work well on davinci at all right now. This one fact renders NERSC practically useless to me.