Description

PyICU requires the underlying C++ library to work, and this is in the libicu52 apt package on Debian Jessie. When this is installed, PyICU can then be installed in a virtualenv. Alternatively, the python-pyicu apt package can be installed, which installs PyICU and the underlying C++ library systemwide. One of the above packages would need to be installed in the python 2 Docker image to be used in Kubernetes containers.

The fundamental design of the getAllUser.py workflow is going to be difficult to scale for multiple concurrent users on tools.wmflabs.org. I would recommend looking in to using a javascript library and the map tile servers that Wikimedia maintains. See https://www.mediawiki.org/wiki/Maps for more information.

@scfc Previous issue seems to be resolved with a wrapper installation (local vitual environment installation via pip install pyicu). However, the following trackback comes up for python getAllUsers.py.

@scfc Has sudo apt-get install python-matplotlib and sudo apt-get install python-mpltoolkits.basemap both been run? I'm not sure if the latter has been run. I get the following when searching for what has already been installed globally:

Just to note, python getAllUsers.py is being run from ~/repo/SPIArticleAnalyzer.(virtualenv)tools.spiarticleanalyzer@tools-bastion-03:~/repo/SPIArticleAnalyzer$ python getAllUsers.py
repo is https://github.com/JustBerry/SPIArticleAnalyzer.

You shouldn't be running any scripts on the bastion, please follow the documentation on how to use the grid.

@Legoktmssh justberry@tools-login.wmflabs.org still brings me back to justberry@tools-bastion-03:~$. become spiarticleanalyzer yields tools.spiarticleanalyzer@tools-bastion-03:~$. How do I change over to tools-login?

@Legoktm: In this case it was justified to use the bastion as I had only installed the modules there for testing; you're right of course that any resource-intensive or non-interactive use must only happen on the grid.

@JustBerry: No, I'm usually not on IRC; if I'm in a channel I feel compelled to look for and correct any wrong information I see which leaves me no time to do something productive or entertaining :-).

I installed the package python-mpltoolkits.basemap on tools-bastion-03, so you'd need to look for that package.

That package is not installed on grid execution nodes, so if you submit your script as a job to the grid that script cannot use that package and will fail.

Installing the packages on the system is only useful if you use them and not your virtual environment. If you are already using a virtual environment, you should be able to install whatever module you like into that. Looking at https://pypi.python.org/pypi/basemap/, the command would probably be something like pip install basemap.

(virtualenv)tools.spiarticleanalyzer@tools-bastion-03:~$ pip install basemap
Downloading/unpacking basemap
Could not find any downloads that satisfy the requirement basemap
Cleaning up...
No distributions at all found for basemap
Storing debug log for failure in /data/project/spiarticleanalyzer/.pip/pip.log

In Toolforge as a matter of policy we only install Python packages that are shipped as part of Ubuntu (Precise/)Trusty; in this case, they are already installed due to T63445 and T102165:
So this should already work for you. If you need a different version, you'll have to use virtual environments.
For Kubernetes I believe no packages are installed in the container (?), so you'll have to use virtual environments.

@scfc It looks like the packages are currently installed for bastion and not for kubernetes or the grid engine. Is this correct? Also, it appears that the packages will not be installed on kubernetes (non-ubuntu) per policy? Is this also correct?

If both of those are the case, can I request installing the packages on the grid?

@scfc It looks like the packages are currently installed for bastion and not for kubernetes or the grid engine. Is this correct? Also, it appears that the packages will not be installed on kubernetes (non-ubuntu) per policy? Is this also correct?

There are no current policy documenting this. The point is, k8s is container-based, and we are trying to make it as lightweight as possible.

@zhuyifei1999 Okay, so the discussion right now seems to be a) use Wikimedia Maps or b) explain why Wikimedia Maps is not sufficient and request installation of basemap on the grid.

Regarding Wikimedia Maps, there is little reference made to an API at https://www.mediawiki.org/wiki/Maps besides "GeoData extension allows articles to specify geographical coordinates, and expose them via search API." The GeoData extension API seems to return articles that are relevant to a specific geographical coordinate, rather than return a world map with dots marking a list of coordinates. Are you referring to https://github.com/kartotherian/kartotherian then (and how it uses .js to make any API calls/map generation on the client side)?

@JustBerry: Sorry, I thought that when I asked you to test on tools-bastion-03 you understood that I meant write a small Python script that does whatever needs to be tested and run that on tools-bastion-03. If instead you run jsub on tools-bastion-03, the script will be run on the grid. I installed the package python-mpltoolkits.basemap only on tools-bastion-03, not on the grid or in Kubernetes.

When I replace ipaddress in getAllUsersHelper.py with ipaddr, it succeeds without any output.

(You had a continuous job webservice running which I stopped. webservice is always executed on a bastion host and never with jstart webservice or similar ways.)

And just to avoid any misunderstanding: It's not that I know how to make your application work and am coy about presenting the solution; I simply can't just look at someone else's application and say pointing at some line: "There's your problem!" If you say you need the package python-mpltoolkits.basemap or another one installed for your application to work, then that's easily done. But finding out what package you need or how to fiddle with your virtual environment is not something I can do.

Regarding the continuous webservice job, I tried doing jstop for that a day or so ago. I was just testing to see if that might have been the reason why the module was not loading. Thanks for clarifying that the module is only installed for the bastion.

That being said, @zhuyifei1999 seems to have been successful at building basemap locally in his venv. I'll have to look into what path variables they might have used during installation.

To clarify, before the step 1 above, I performed the following steps to install icu in my venv (because icu could not be found came up before doing these steps, i.e. w/o icu installed in the venv, in the ~/uwsgi.log, i.e. after doing webservice --backend=kubernetes python2 start; also, created the venv before doing the following steps via via virtualenv venv in the kubernetes shell within ~/www/python):

I'm a bit lost after all this exchange. I only want to say that PyICU is an important package to offer for us, especially since it's one of the ways to ship and use the CLDR data which Wikimedia users contribute. Having the packages python-pyicu and python3-icu installed is good.

So this should already work for you. If you need a different version, you'll have to use virtual environments.
For Kubernetes I believe no packages are installed in the container (?), so you'll have to use virtual environments.

There seem to be two concerns here:

No apparent libicu (icu) installed for python 2 (even for bastion)

Modules are not (frequently) installed on kubernetes to keep the containers lightweight. Users are asked to build locally in a virtual environment instead. However, when trying to build locally, icu gives errors. After speaking with others, installing the debian jessie distribution of icu will be less error prone than building binaries locally. Alternatively, if someone is able to get icu locally installed in their venv, feel free to post your steps here.

@scfcwebservice --backend=kubernetes python2 start (kubernetes), though, yields the output posted earlier in the ticket (ImportError: No module named icu). Seeing if there may be a workaround for installing icu on kubernetes, as locally installing in the venv seems to be breaking. A few other people have mentioned that they have also tried installing icu in the venv but were soon confronted with lib issues similar to the ones mentioned earlier.

PyICU requires the underlying C++ library to work, and this is in the libicu52 apt package on Debian Jessie. When this is installed, PyICU can then be installed in a virtualenv. Alternatively, the python-pyicu apt package can be installed, which installs PyICU and the underlying C++ library systemwide. One of the above packages would need to be installed in the python 2 Docker image to be used in Kubernetes containers.

python-pyicu won't be installed - you should install pyicu inside your virtualenv. We don't want to provide python libraries outside of virtualenvs anymore - that doesn't really scale very well, and ties us down to what's in Debian Jessie. So we'll only install devel files and what not, and you should install the libraries themselves directly from pip.