Hi all,
Matt asked me to outline how to get parallel HOP (in the hg devel version) working. In easy to follow list format:
0. Build yt like you did already (you're on this list, no?).
1. Get the latest hg, choose the 'yt' branch, which is the default.
2. Install Forthon in the standard python way.
http://hifweb.lbl.gov/Forthon/
3. Go into yt-devel/yt/extensions/kdtree and I think 'make' will work on most machines. If not, let me know. It should make 'fKDpy.so' in that same dir.
4. Go to the top level of yt-devel, and type 'python setup.py install' like usual, but then do 'cp -r yt/extensions/kdtree ../../lib/python2.6/site-packages/yt-version-num-sys-etc/yt/extensions/'. I haven't added the kdtree stuff to setup utils yet, so this step isn't automated.
5. One runs parallel HOP very similarly to old HOP:
h = yt.lagos.HaloFinding.parallelHF(pf, threshold=160.0, safety=1.5, \
dm_only=False,resize=True, fancy_padding=True, rearrange=True)
- safety - is a padding multiplier that adjusts the size of the padding around each subvolume. 1.5 is usually sufficient, and 1.35 probably is. There's no reason to try going lower from 1.5 unless memory is a concern.
- dm_only - I currently have things a little messed up in HaloFinding.py in the dev, so this needs to be set unless you have a ['creation_time'] field.
- resize - Load balancing. This is a recursive bisection that subdivides the domains like a kd tree, but it can do arbitrary numbers of cuts, so you can run with any number of tasks. Generally, I turn it on for datasets smaller than 300 Mpc, and for ones larger, the particles are already pretty evenly balanced and it doesn't help.
- fancy_padding - Each subvolume has six different faces, and six different values for the padding, one for each face. There is really no reason to turn this off.
- rearrange - The fortran kdtree can make an internal copy of the point data which increases the speed of the searches by almost 20%. It however increases the memory by 3 fields, so if memory is a concern, turn this off.
6. The usual halo convenience functions work as well, but the output is a bit different.
h.write_out('dist-chain.out') - Just like the normal one.
h.write_particle_lists("chain") - One hdf5 file per task, that contains the particle data for all the haloes on that task.
h.write_particle_lists_txt("chain") - One text file that lists which tasks own a piece of a halo. Since haloes can exist on more than one task, each halo may have several entries on it's line.
7. There is a rough memory scaling law: it takes 1 MB of RAM per 4,500 particles, with rearrange turned on, regardless of task count. This does not include the yt hierarchy costs, so if you have a dataset like L7, which uses 1GB per task, you will need to take that into account, too.
Good luck!
_______________________________________________________
sskory(a)physics.ucsd.edu o__ Stephen Skory
http://physics.ucsd.edu/~sskory/ _.>/ _Graduate Student
________________________________(_)_&#92;(_)_______________

Hey everyone,
The following error pertains to the yt branch of the hg repo.
I'm getting a particle io error when using the various yt halo finders in
parallel. It doesn't appear to have anything to do with the halo finders,
but I don't know what else uses ParticleIO.py. This only happens when
running in parallel.
P000 yt INFO 2009-11-27 15:16:59,775 Getting ParticleMassMsun
using ParticleIO
Setting period equal to 1.000000
Setting period equal to 1.000000
Setting period equal to 1.000000
Traceback (most recent call last):
File "do_hop.py", line 5, in <module>
h = FOFHaloFinder(pf)
File "/Users/britton/Documents/work/yt-hg/yt/lagos/HaloFinding.py", line
1024, in __init__
self._parse_halolist(1.)
File "/Users/britton/Documents/work/yt-hg/yt/lagos/HaloFinding.py", line
747, in _parse_halolist
this_max_dens = halo.maximum_density_location()
File "/Users/britton/Documents/work/yt-hg/yt/lagos/ParallelTools.py", line
130, in single_proc_results
return func(self, *args, **kwargs)
File "/Users/britton/Documents/work/yt-hg/yt/lagos/HaloFinding.py", line
322, in maximum_density_location
return self.center_of_mass()
File "/Users/britton/Documents/work/yt-hg/yt/lagos/ParallelTools.py", line
130, in single_proc_results
return func(self, *args, **kwargs)
File "/Users/britton/Documents/work/yt-hg/yt/lagos/HaloFinding.py", line
306, in center_of_mass
pm = self["ParticleMassMsun"]
File "/Users/britton/Documents/work/yt-hg/yt/lagos/ParallelTools.py", line
130, in single_proc_results
return func(self, *args, **kwargs)
File "/Users/britton/Documents/work/yt-hg/yt/lagos/HaloFinding.py", line
136, in __getitem__
return self.data.particles[key][self.indices]
File "/Users/britton/Documents/work/yt-hg/yt/lagos/ParticleIO.py", line
46, in __getitem__
self.get_data(key)
File "/Users/britton/Documents/work/yt-hg/yt/lagos/ParticleIO.py", line
104, in get_data
if len(to_add) != 1: raise KeyError
KeyError
I checked the contents of to_add, and it was a list with two items (hence
the exception):
['particle_mass', 'particle_mass']
I was able to get around this by removing non-unique entries in to_add, but
I don't think that really fixes the underlying problem. Anyone have any
ideas?
Regards,
Britton

Hi there,
Tonight I'm merging a bunch of changes from me, Britton Smith, John
Wise and Stephen Skory back into the trunk from the mercurial
repository. If you're running on the yt-1.5 source tree, this won't
affect you -- but if you're running on trunk, the next time you update
you will have to rerun "python2.6 setup.py develop" or "python2.6
setup.py install" depending on how you installed. If you used the
install script, you should also be able to re-run it at any time to
get an updated source tree and installation.
There are no pressing fixes in this, only speed and memory
improvements, so don't feel it's necessary to make the jump just now
if you're happy with where you are. If you do update, here's a minor
changelog:
* Hierarchy is much faster, and the .yt file now only stores data
products, all hierarchy info goes into a .harrays file (for forward
compatibility with Enzo)
* The attributes off the hierarchy for grids: gridLeftEdge and
gridRightEdge are now grid_left_edge and grid_right_edge, in keeping
with our coding standards
* Hierarchy has been rewritten for *clarity*
* The bug with DM-only sims has been fixed
* IO routines have been refactored and very cleaned up
* Preliminary particle IO readers have been added -- much, much
faster, but not yet production-ready
* New cloud-in-cell deposit function without needing fortran
* Some other minor things to support coming features
All the public and primary functions, classes etc should be the same.
Thanks, and let us know if there are ANY problems!
-Matt

Hi all,
I saw this post today:
http://morepypy.blogspot.com/2009/11/some-benchmarking.html
It's about benchmarks for the PyPy project, with a just-in-time
compiler enabled; PyPy is a reimplementation of Python in a restricted
version of Python, which aims to be far more flexible as well as
faster. (Also, some other language-engineer goals I don't really know
much about!) Very impressive, but I'm pretty sure it's not yet a
target for deployment -- but maybe some day, and it's becoming clear
that Python's speed will improve with time.
-Matt

Just as a note, I noticed a problem with the unit tests today and
traced it back to a bug I introduced in the profiling code about two
weeks ago. It's now fixed, but please be sure to update trunk to
r1527.
---------- Forwarded message ----------
From: <mturk(a)wrangler.dreamhost.com>
Date: Mon, Nov 16, 2009 at 1:16 PM
Subject: [Yt-svn] yt-commit r1527 - trunk/yt/lagos
To: yt-svn(a)lists.spacepope.org
Author: mturk
Date: Mon Nov 16 13:16:27 2009
New Revision: 1527
URL: http://yt.enzotools.org/changeset/1527
Log:
This fixes a regression in the lazy profiling! For a very brief window of
time, child cells were not being masked in the profiling method.
Modified:
trunk/yt/lagos/Profiles.py
Modified: trunk/yt/lagos/Profiles.py
==============================================================================
--- trunk/yt/lagos/Profiles.py (original)
+++ trunk/yt/lagos/Profiles.py Mon Nov 16 13:16:27 2009
@@ -160,9 +160,14 @@
data = []
for field in _field_mapping.get(this_field, (this_field,)):
pointI = None
- if check_cut and not self._data_source._is_fully_enclosed(source):
+ if check_cut:
+ # This conditional is so that we can have variable-length
+ # particle fields. Note that we can't apply the
+ # is_fully_enclosed to baryon fields, because child cells get
+ # in the way.
if field in self.pf.field_info \
- and self.pf.field_info[field].particle_type:
+ and self.pf.field_info[field].particle_type \
+ and not self._data_source._is_fully_enclosed(source):
pointI = self._data_source._get_particle_indices(source)
else:
pointI = self._data_source._get_point_indices(source)
_______________________________________________
Yt-svn mailing list
Yt-svn(a)lists.spacepope.org
http://lists.spacepope.org/listinfo.cgi/yt-svn-spacepope.org

Hi guys,
(For all of these performance indicators, I've used the 512^3 L7
amr-everywhere run called the "LightCone." This particular dataset
has ~380,000 grids and is a great place to find the )
Last weekend I did a little bit of benchmarking and saw that the
parallel projections (and likely several other parallel operations)
all sat inside an MPI_Barrier for far too long. I converted (I
think!) this process to be an MPI_Alltoallv operation, following on an
MPI_Allreduce to get the final array size and the offsets into an
ordered array, and I think it is working. I saw pretty good
performance improvements, but it's tough to quantify those right now
-- for projecting "Ones" (no disk-access) it sped things up by ~15%.
I've also added a new binary hierarchy method to devel enzo, and it
provides everything that is necessary for yt to analyze the data. As
such, if a %(basename)s.harrays file exists, it will be used, and yt
will not need to open the .hierarchy file at all. This sped things up
by 100 seconds. I've written a script to create these
(http://www.slac.stanford.edu/~mturk/create_harrays.py), but
outputting them inline in Enzo is the fastest.
To top this all off, I ran a projection -- start to finish, including
all overhead -- on 16 processors. To project the fields "Density"
(native), "Temperature" (native) and "VelocityMagnitude" (derived,
requires x-, y- and z-velocity) on 16 processors to the finest
resolution (adaptive projection -- to L7) takes 140 seconds, or
roughly 2:20.
I've looked at the profiling outputs, and it seems to me that there
are still some places performance could be squeezed out. That being
said, I'm pretty pleased with these results.
These are all in the named branch hierarchy-opt in mercurial. They
rely on some rearrangement of the hierarchy parsing and whatnot that
has lived in hg for a little while; it will go into the trunk as soon
as I get the all clear about moving to a proper stable/less-stable dev
environment. I also have some other test suites to run on them, and I
want to make sure the memory usage is not excessive.
Best,
Matt

Hi,
This is particularly aimed at Matt since he has experience with
installing mpi4py on an altix :-) I saw your post on the mpi4py list (http://tinyurl.com/ybnb6p7
), but I'm running into another problem.
I'm trying to install the latest svn version with the stock sgimpi
setup in mpi.cfg. I've also tried with intel's v10.1 compilers (v9.1
was the default).
It compiles and installs just fine, but when I try to load mpi4py.MPI
it can't find the MPI libraries.
----
he:jwise>python
Python 2.6.3 (r263:75183, Oct 19 2009, 12:15:19)
[GCC 3.3.3 (SuSE Linux)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mpi4py.MPI
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: /home/astro/jwise/local/lib/python2.6/site-packages/
mpi4py/MPI.so: undefined symbol: MPI_Comm_get_name
----
libmpi and libmpi++ are in /usr/lib, which is in my LD_LIBRARY_PATH
(just in case... it should be included by default).
If you haven't run into this problem before, maybe you (or someone
else) could give me some tips on tracking down the mis-config. I
haven't debugged something like this before...
Thanks!
John