emulab-devel issueshttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues2016-03-03T19:51:12Zhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/19Update contexts for Jacks embedded in Apt and CloudLab2016-03-03T19:51:12ZRobert Ricciricci@cs.utah.eduUpdate contexts for Jacks embedded in Apt and CloudLabUpdate the contexts for the Jacks embedding in both Apt and CloudLab to reflect the clusters they actually have available to them - eg. the one for CloudLab currently does not allow physical nodes or the Moonshot nodes.Update the contexts for the Jacks embedding in both Apt and CloudLab to reflect the clusters they actually have available to them - eg. the one for CloudLab currently does not allow physical nodes or the Moonshot nodes.NSFCloud WorkshopaptcloudlabJacksJonathon DuerigJonathon Duerighttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/18Filter aggregate picker list for specific profiles2016-04-01T21:35:14ZRobert Ricciricci@cs.utah.eduFilter aggregate picker list for specific profilesOn the Instantiate page, once the user has picked a profile and sees the aggregate picker, run the constraint engine from Jacks on all aggregates. Show the ones that it will run on at the top of the list, then gray out the ones that (we think) it won't run on at the bottom of the list. For now, let people pick aggregates that are grayed out.On the Instantiate page, once the user has picked a profile and sees the aggregate picker, run the constraint engine from Jacks on all aggregates. Show the ones that it will run on at the top of the list, then gray out the ones that (we think) it won't run on at the bottom of the list. For now, let people pick aggregates that are grayed out.NSFCloud WorkshopaptcloudlabJonathon DuerigJonathon Duerighttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/16Add Utah CloudLab cluster to CloudLab backends2014-12-06T03:06:02ZRobert Ricciricci@cs.utah.eduAdd Utah CloudLab cluster to CloudLab backendsJust like the title says, make utah.cloudlab.us one of the clusters that the interface can dispatch to.Just like the title says, make utah.cloudlab.us one of the clusters that the interface can dispatch to.NSFCloud WorkshopcloudlabLeigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/500DDC Powervault Maintenance2019-05-23T20:01:56ZAleksander MaricqDDC Powervault MaintenanceToday we will be destroying the Apt and Utah Cloudlab block-stores to rebuild them with more redundancy.
The Powervault contains 30 4 TB drives that were previously arranged in the following manner:
* The 30 disks are split into two 15-drive RAID5 arrays, each on a separate RAID controller.
* Each array is split into two equally-sized logical drives.
* One logical drive from each array is exposed to `dbox1`, and other on each array is exposed to `dbox2`.
The new configuration will be as follows:
* The 30 disks are split into two 8-drive RAID6 arrays and two 7-drive RAID6 arrays. Each controller will have an 8-drive and 7-drive array.
* Each array contains only one logical drive using the entire space.
* The logical drives on the 8-drive arrays will be exposed to `dbox1`, the logical drives on the 7-drive arrays will be exposed to `dbox2`.
We have set aside a locked down Apt node with a RAID1 pair of 10TB drives containing all of the backed up datasets. @hibler ported all of them over earlier and has been applying syncs since. The final update has already been made as of this morning. iSCSI services have been shut down, and @hibler will be destroying the zpool on `dbox1` (`dbox2` is already done) before shutting down `dbox1` and `dbox2`. I'll re-initialize the arrays as described above, and once we confirm things look good from the dbox sides, we'll repopulate the Powervault with the datasets.Today we will be destroying the Apt and Utah Cloudlab block-stores to rebuild them with more redundancy.
The Powervault contains 30 4 TB drives that were previously arranged in the following manner:
* The 30 disks are split into two 15-drive RAID5 arrays, each on a separate RAID controller.
* Each array is split into two equally-sized logical drives.
* One logical drive from each array is exposed to `dbox1`, and other on each array is exposed to `dbox2`.
The new configuration will be as follows:
* The 30 disks are split into two 8-drive RAID6 arrays and two 7-drive RAID6 arrays. Each controller will have an 8-drive and 7-drive array.
* Each array contains only one logical drive using the entire space.
* The logical drives on the 8-drive arrays will be exposed to `dbox1`, the logical drives on the 7-drive arrays will be exposed to `dbox2`.
We have set aside a locked down Apt node with a RAID1 pair of 10TB drives containing all of the backed up datasets. @hibler ported all of them over earlier and has been applying syncs since. The final update has already been made as of this morning. iSCSI services have been shut down, and @hibler will be destroying the zpool on `dbox1` (`dbox2` is already done) before shutting down `dbox1` and `dbox2`. I'll re-initialize the arrays as described above, and once we confirm things look good from the dbox sides, we'll repopulate the Powervault with the datasets.aptcloudlabcriops2status:activeAleksander MaricqAleksander Maricqhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/492Remove `m2crypto` from xmlrpc paths2019-04-24T17:02:44ZDavid JohnsonRemove `m2crypto` from xmlrpc pathsDue to the overall instability of the `m2crypto` codebase, we would like to remove it from our use. There are four tasks:
- [x] Remove from the SSL/TLS Emulab xmlrpc client (13ee8406, 535c8d7a, b1110139)
- [x] Remove from the SSL/TLS Emulab xmlrpc server (fccfee60)
- [x] Remove from the various protogeni test scripts (37ccded4)
- [x] Remove from a few other misc places (ccc44c9f)
Removal from the server is the most complex task, and I expect only to work on it as I have spare time.Due to the overall instability of the `m2crypto` codebase, we would like to remove it from our use. There are four tasks:
- [x] Remove from the SSL/TLS Emulab xmlrpc client (13ee8406, 535c8d7a, b1110139)
- [x] Remove from the SSL/TLS Emulab xmlrpc server (fccfee60)
- [x] Remove from the various protogeni test scripts (37ccded4)
- [x] Remove from a few other misc places (ccc44c9f)
Removal from the server is the most complex task, and I expect only to work on it as I have spare time.cloudlabstatus:activeDavid JohnsonDavid Johnsonhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/487Moonshot chassis 4 is in hwdown2019-02-05T22:01:26ZAleksander MaricqMoonshot chassis 4 is in hwdownMany machines in this chassis are in `hwdown` due to presumed multicast issues during frisbee. We've tried rebooting the chassis manager, the chassis switch, and @hibler has power-cycled the whole chassis and re-seated the switch. None of these have done anything, and there are no obvious errors on either the chassis switch or in various boss logs. I've placed the rest of them in `hwdown` pending further investigation into the issue.Many machines in this chassis are in `hwdown` due to presumed multicast issues during frisbee. We've tried rebooting the chassis manager, the chassis switch, and @hibler has power-cycled the whole chassis and re-seated the switch. None of these have done anything, and there are no obvious errors on either the chassis switch or in various boss logs. I've placed the rest of them in `hwdown` pending further investigation into the issue.cloudlabstatus:activeAleksander MaricqAleksander Maricqhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/483Apt dbox node won&#39;t boot2019-02-15T18:10:25ZMike HiblerApt dbox node won't bootWhen doing a routine FreeNAS upgrade resulting in a reboot, the machine would not boot again.
At first I thought this was due to an intermittently failing disk in the RAID, I have seen a situation where upon reboot it refused to boot due to a drive with a "foreign config" and requiring that I physically type some character on the console to continue. But I pulled the drive and hooked up the VGA, and there is no additional output beyond what I get via the iDRAC virtual console.
I would think it was a failed upgrade of FreeNAS except that I cannot make the node go into the BIOS or the boot menu or even PXE boot, so there appears to be something wrong before it gets to the actual OS boot. I could be getting fooled however.
I also hooked up the serial console in case it was switching over and no longer outputting to the VGA. No dice.When doing a routine FreeNAS upgrade resulting in a reboot, the machine would not boot again.
At first I thought this was due to an intermittently failing disk in the RAID, I have seen a situation where upon reboot it refused to boot due to a drive with a "foreign config" and requiring that I physically type some character on the console to continue. But I pulled the drive and hooked up the VGA, and there is no additional output beyond what I get via the iDRAC virtual console.
I would think it was a failed upgrade of FreeNAS except that I cannot make the node go into the BIOS or the boot menu or even PXE boot, so there appears to be something wrong before it gets to the actual OS boot. I could be getting fooled however.
I also hooked up the serial console in case it was switching over and no longer outputting to the VGA. No dice.cloudlabstatus:activeMike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/410Improve portal search/lookup capability for past experiments.2018-07-16T16:46:29ZMike HiblerImprove portal search/lookup capability for past experiments.We often have need of going back to see what a specific node was doing at a previous point in time, e.g.:
> So I got email from Campus people today about an open resolver running at CloudLab (hp120) at 2018-06-21T06:30:35Z (UTC). So that is about 11:30pm on the 20th our time. So what is the most expedient way through the portal to "learn more"? I always use the classic interface to look at "Node History" for the node in question, and from that I learn that it was in experiment huygens-PG0/shiyuliu-QV37948 at that time. That is pretty much the dead end through the classic interface, since it was a portal experment. I can get the UUID but that is about it.
At the moment, this is not easy to do as per @stoller's response:
> @mike I am working on a related item today (@eeide's ticket about email that uses geni slice terminology instead of portal experiments). But at the moment, we do not have a Portal UI to search for and show a specific historical experiment. And remote clusters have no concept of portal experiments from which they come (I am changing that today). But to answer the question above ... On the Cloudlab Portal UI, use the Admin menu item Users/Project, click on the search tab and past in "shiyuliu", then click the link to go to his dashboard page. Then on the admin menu tab, click on Experiment History. Find the line for shiyuliu-QV37948. Now you know the profile, you can click on that.We often have need of going back to see what a specific node was doing at a previous point in time, e.g.:
> So I got email from Campus people today about an open resolver running at CloudLab (hp120) at 2018-06-21T06:30:35Z (UTC). So that is about 11:30pm on the 20th our time. So what is the most expedient way through the portal to "learn more"? I always use the classic interface to look at "Node History" for the node in question, and from that I learn that it was in experiment huygens-PG0/shiyuliu-QV37948 at that time. That is pretty much the dead end through the classic interface, since it was a portal experment. I can get the UUID but that is about it.
At the moment, this is not easy to do as per @stoller's response:
> @mike I am working on a related item today (@eeide's ticket about email that uses geni slice terminology instead of portal experiments). But at the moment, we do not have a Portal UI to search for and show a specific historical experiment. And remote clusters have no concept of portal experiments from which they come (I am changing that today). But to answer the question above ... On the Cloudlab Portal UI, use the Admin menu item Users/Project, click on the search tab and past in "shiyuliu", then click the link to go to his dashboard page. Then on the admin menu tab, click on Experiment History. Find the line for shiyuliu-QV37948. Now you know the profile, you can click on that.cloudlabstatus:activeLeigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/403Make benchmark scheduler into a &quot;production&quot; service2018-11-01T18:18:22ZRobert Ricciricci@cs.utah.eduMake benchmark scheduler into a "production" service# Prerequisite Tasks:
### Server-level tasks:
- [x] Combine the parts from two PG nodes into one to make a machine for benchmark scheduling/CONFIRM.
- [x] Install base Ubuntu 18.04 image on the machine.
- [x] Install database (mariadb).
- [x] Make non-root user on DB to handle benchmark results inserts.
- [x] Install Python2 and requisite libraries for orchestration.
- [x] Create a user account to be used for benchmark orchestration.
- [ ] Make `c6420` servers able to set CPU frequency scaling.
### Flux-level tasks:
- [x] Move code to the emulab group on gitlab.
- [x] Create a service account for geni-lib.
### Benchmark-level tasks:
##### (The following will be done on the currently non-live code in the emulab repo!)
- [x] Add source for NIC -> NIC latency benchmarks to benchmark repo.
- [x] Add invocation of, and results parsing for, NIC -> NIC latency benchmark to benchmark script (Utah only).
- [x] Check if APT can run NIC -> NIC latency benchmark. If so, add invocation.
- [x] Add source for NASA NPB CPU benchmarks to benchmark repo.
- [x] Add invocation of, and results parsing for, NASA NPB CPU benchmarks to benchmark script.
- [x] Disable DVFS section for `c6420` machines temporarily
- [x] Fix benchmark script to support Ubuntu 18.04 output for `ifconfig` (or switch to `ip link`/`ip addr`).
- [x] Pick an `r320` machine to be the network destination node for APT.
- [x] Configure the `r320` machine to be the network destination node for APT.
- [x] Pick a `d430` machine to be the network destination node for Emulab.
- [x] Configure the `d430` machine to be the network destination node for Emulab.
- [x] Start dumping OS version (16.04/18.04) in env_info.
- [x] Start dumping CPU model in env_info.
- [x] Do a final pass to make sure we don't need to add any new fields to the DB.
- [x] Tune iperf3 params for `xl170`s (25 Gbps links)
### Orchestration-level tasks:
##### (The following will be done on the currently non-live code in the emulab repo!)
- [x] Add support for emulab/APT in orchestration.
- [x] Add support for CPU benchmarks in results processing.
- [x] Add support for NIC->NIC latency benchmark in results processing.
- [x] Make created experiments use Ubuntu 18.04 instead of 16.04.
- [x] Update the db schema to support the CPU benchmarks.
- [x] Update the db schema to support the NIC->NIC latency benchmark.
- [x] Update the db schema to support OS version.
- [x] Update the db schema to support CPU model.
- [x] Update DB access sections to use non-root DB user.
- [x] Make the code more environment agnostic (move hard-coded vars to cmdline args).
- [x] Rewrite arg parser to use argparse instead of getopt.
- [x] Properly handle thrown exceptions due to resource reservations.
- [x] Make a pass through the code to check refactor potential.
# Verification tasks:
### Benchmark-level tasks:
- [x] Test the benchmark script on all Cloudlab Phase II hardware types.
- [x] Test the benchmark script on all APT hardware types.
- [x] Test the benchmark script on all Emulab hardware types.
### Orchestration-level tasks:
- [x] Test the orchestration script on Cloudlab Utah.
- [x] Test the orchestration script on Cloudlab Wisconsin.
- [x] Test the orchestration script on Cloudlab Clemson.
- [x] Test the orchestration script on APT.
- [x] Test the orchestration script on Emulab.
# Migration tasks:
- [x] Stop collection cronjobs on `ms1102`.
- [x] Pick an `xl170` machine to be the new network destination node for CloudLab Utah.
- [x] Pick a `c220g5` machine to be the new network destination node for CloudLab Wisconsin.
- [x] Pick a `c6420` machine to be the new network destination node for CloudLab Clemson.
- [x] Import old DB data into new database.
- [x] Update NULL fields as needed for old data.
- [x] Set up crontab to run scheduled benchmark orchestration (with proper arguments).
- [x] Find a better system for DB backups.
- [x] Set up cronlog rollover using savelog.# Prerequisite Tasks:
### Server-level tasks:
- [x] Combine the parts from two PG nodes into one to make a machine for benchmark scheduling/CONFIRM.
- [x] Install base Ubuntu 18.04 image on the machine.
- [x] Install database (mariadb).
- [x] Make non-root user on DB to handle benchmark results inserts.
- [x] Install Python2 and requisite libraries for orchestration.
- [x] Create a user account to be used for benchmark orchestration.
- [ ] Make `c6420` servers able to set CPU frequency scaling.
### Flux-level tasks:
- [x] Move code to the emulab group on gitlab.
- [x] Create a service account for geni-lib.
### Benchmark-level tasks:
##### (The following will be done on the currently non-live code in the emulab repo!)
- [x] Add source for NIC -> NIC latency benchmarks to benchmark repo.
- [x] Add invocation of, and results parsing for, NIC -> NIC latency benchmark to benchmark script (Utah only).
- [x] Check if APT can run NIC -> NIC latency benchmark. If so, add invocation.
- [x] Add source for NASA NPB CPU benchmarks to benchmark repo.
- [x] Add invocation of, and results parsing for, NASA NPB CPU benchmarks to benchmark script.
- [x] Disable DVFS section for `c6420` machines temporarily
- [x] Fix benchmark script to support Ubuntu 18.04 output for `ifconfig` (or switch to `ip link`/`ip addr`).
- [x] Pick an `r320` machine to be the network destination node for APT.
- [x] Configure the `r320` machine to be the network destination node for APT.
- [x] Pick a `d430` machine to be the network destination node for Emulab.
- [x] Configure the `d430` machine to be the network destination node for Emulab.
- [x] Start dumping OS version (16.04/18.04) in env_info.
- [x] Start dumping CPU model in env_info.
- [x] Do a final pass to make sure we don't need to add any new fields to the DB.
- [x] Tune iperf3 params for `xl170`s (25 Gbps links)
### Orchestration-level tasks:
##### (The following will be done on the currently non-live code in the emulab repo!)
- [x] Add support for emulab/APT in orchestration.
- [x] Add support for CPU benchmarks in results processing.
- [x] Add support for NIC->NIC latency benchmark in results processing.
- [x] Make created experiments use Ubuntu 18.04 instead of 16.04.
- [x] Update the db schema to support the CPU benchmarks.
- [x] Update the db schema to support the NIC->NIC latency benchmark.
- [x] Update the db schema to support OS version.
- [x] Update the db schema to support CPU model.
- [x] Update DB access sections to use non-root DB user.
- [x] Make the code more environment agnostic (move hard-coded vars to cmdline args).
- [x] Rewrite arg parser to use argparse instead of getopt.
- [x] Properly handle thrown exceptions due to resource reservations.
- [x] Make a pass through the code to check refactor potential.
# Verification tasks:
### Benchmark-level tasks:
- [x] Test the benchmark script on all Cloudlab Phase II hardware types.
- [x] Test the benchmark script on all APT hardware types.
- [x] Test the benchmark script on all Emulab hardware types.
### Orchestration-level tasks:
- [x] Test the orchestration script on Cloudlab Utah.
- [x] Test the orchestration script on Cloudlab Wisconsin.
- [x] Test the orchestration script on Cloudlab Clemson.
- [x] Test the orchestration script on APT.
- [x] Test the orchestration script on Emulab.
# Migration tasks:
- [x] Stop collection cronjobs on `ms1102`.
- [x] Pick an `xl170` machine to be the new network destination node for CloudLab Utah.
- [x] Pick a `c220g5` machine to be the new network destination node for CloudLab Wisconsin.
- [x] Pick a `c6420` machine to be the new network destination node for CloudLab Clemson.
- [x] Import old DB data into new database.
- [x] Update NULL fields as needed for old data.
- [x] Set up crontab to run scheduled benchmark orchestration (with proper arguments).
- [x] Find a better system for DB backups.
- [x] Set up cronlog rollover using savelog.cloudlabstatus:activeAleksander MaricqAleksander Maricqhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/401Fix XL170 Shared VLAN issues2018-06-25T17:53:59ZAleksander MaricqFix XL170 Shared VLAN issuesCurrently, when an XL170 node is hooked into a shared VLAN, it is unable to properly swap out upon experiment expiration (and any subsequent experiment on that shared VLAN). This appears to be due to an inability to clear port vlans on the mellanox switches.Currently, when an XL170 node is hooked into a shared VLAN, it is unable to properly swap out upon experiment expiration (and any subsequent experiment on that shared VLAN). This appears to be due to an inability to clear port vlans on the mellanox switches.status:activecloudlabAleksander MaricqAleksander Maricqhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/399expose more image info to the portal UI2018-11-01T18:00:24ZDavid Johnsonexpose more image info to the portal UIAt minimum, it seems we should add a description field to the portal image creation dialogue. Users just get copies of the underlying image description by default, and have no way to modify it, I believe.
Maybe we should allow users to toggle the global bit on/off?
Finally, given that I'm now creating a bunch of docker format images via the portal, it would be nice for me if the image format was displayed in a column on the images.php and list-images.php pages. I realize that is wasted space most of the time, so maybe it could be an optional column that is only displayed if there are images of more than one format in the list.At minimum, it seems we should add a description field to the portal image creation dialogue. Users just get copies of the underlying image description by default, and have no way to modify it, I believe.
Maybe we should allow users to toggle the global bit on/off?
Finally, given that I'm now creating a bunch of docker format images via the portal, it would be nice for me if the image format was displayed in a column on the images.php and list-images.php pages. I realize that is wasted space most of the time, so maybe it could be an optional column that is only displayed if there are images of more than one format in the list.cloudlabstatus:activeLeigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/398update openstack profile to queens2018-07-06T18:34:45ZDavid Johnsonupdate openstack profile to queensThis requires the new `UBUNTU18-64-STD` image; see https://gitlab.flux.utah.edu/fluxsysadm/flux-system-admin/issues/156 .
The queens release only lasts til Aug 30 or so, so now that we will shortly have `UBUNTU18-64-STD`, time to get cracking.
This is happening in https://gitlab.flux.utah.edu/johnsond/openstack-build-ubuntu/commits/queens .
I hope this will be relatively painless, and that the biggest pain will be to redo the networking configuration for `systemd-networkd`.This requires the new `UBUNTU18-64-STD` image; see https://gitlab.flux.utah.edu/fluxsysadm/flux-system-admin/issues/156 .
The queens release only lasts til Aug 30 or so, so now that we will shortly have `UBUNTU18-64-STD`, time to get cracking.
This is happening in https://gitlab.flux.utah.edu/johnsond/openstack-build-ubuntu/commits/queens .
I hope this will be relatively painless, and that the biggest pain will be to redo the networking configuration for `systemd-networkd`.2018-07-06cloudlabstatus:activeDavid JohnsonDavid Johnsonhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/393Upgrade storage servers with latest FreeNAS (11-STABLE)2018-06-30T15:58:37ZMike HiblerUpgrade storage servers with latest FreeNAS (11-STABLE)I arrived at this issue in a circuitous manner...
It started out as trying to fix a race condition that is/was biting a Powder profile. After making some fixes to our latest FreeNAS clientside, I discovered that only one of the six storage servers was running the latest FreeNAS that could use the fixes. So rather than back-porting the fixes, I am upgrading the servers.I arrived at this issue in a circuitous manner...
It started out as trying to fix a race condition that is/was biting a Powder profile. After making some fixes to our latest FreeNAS clientside, I discovered that only one of the six storage servers was running the latest FreeNAS that could use the fixes. So rather than back-porting the fixes, I am upgrading the servers.cloudlabcriops2status:activeMike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/382Wire up the CloudLab2 user allocatable switches2018-06-21T17:45:48ZMike HiblerWire up the CloudLab2 user allocatable switchesWe need a plan for how the user allocatable switches will be connected to the netscouts and how they might be hardwired to each other. Some of the issues and ideas in no particular order:
* Do we connect the netscouts to the regular experiment fabric? This would allow us to have a second experimental interface on nodes. One scenario where this is useful came up lately where an experimenter wanted to have a blockstore mounted but also wanted low-level control (DPDK) over the experimental interface. Currently, a blockstore has to share the same physical link as any topology setup by the user.
* The nodes are distributed across the two netscout switches and the two netscouts are not interconnected. Thus to access an arbitrary node from any user alloc switch, the user alloc switches will have to be connected to both netscouts.
* Do we want direct wires between the user alloc switches? While we can interconnect the switches via the netscouts, that limits the interconnects to 40Gb. We could directly connect the Mellanoxes in particular at 100Gb allowing for example a multi-level fat tree topology.
* What do we envision the common user alloc switch topos to be? Do we emphasize more, simple (i.e., single switch) simultaneous topologies over fewer, more complex (e.g., multiple level, multiple switch) topologies? This is largely related to the previous bullet--the only way we are really going to be able to form complex topologies will be to take advantage of the extra "uplink" (40Gb on Dells, 100Gb on Mellanox) ports directly wired to each other.
Obviously, the plan will evolve over time and any plan will not survive first contact with experimenters. For the milestone of getting CloudLab2 resources operational, we need to decide on our initial plan.
Notes:
* Each netscout has 3 blades of 24 40Gb ports each, for a total of 72 40Gb or up to 288 10Gb per switch. So 144 40Gb or 576 10Gb ports. Presumably we can mix and match, with some ports broken out at 10Gb and others not. So we should have plenty of flexibility.
* Six of the 24 ports on each netscout are "Smart Ports". We don't yet know the implications of this, in particular, whether it limits what we can do with those ports.We need a plan for how the user allocatable switches will be connected to the netscouts and how they might be hardwired to each other. Some of the issues and ideas in no particular order:
* Do we connect the netscouts to the regular experiment fabric? This would allow us to have a second experimental interface on nodes. One scenario where this is useful came up lately where an experimenter wanted to have a blockstore mounted but also wanted low-level control (DPDK) over the experimental interface. Currently, a blockstore has to share the same physical link as any topology setup by the user.
* The nodes are distributed across the two netscout switches and the two netscouts are not interconnected. Thus to access an arbitrary node from any user alloc switch, the user alloc switches will have to be connected to both netscouts.
* Do we want direct wires between the user alloc switches? While we can interconnect the switches via the netscouts, that limits the interconnects to 40Gb. We could directly connect the Mellanoxes in particular at 100Gb allowing for example a multi-level fat tree topology.
* What do we envision the common user alloc switch topos to be? Do we emphasize more, simple (i.e., single switch) simultaneous topologies over fewer, more complex (e.g., multiple level, multiple switch) topologies? This is largely related to the previous bullet--the only way we are really going to be able to form complex topologies will be to take advantage of the extra "uplink" (40Gb on Dells, 100Gb on Mellanox) ports directly wired to each other.
Obviously, the plan will evolve over time and any plan will not survive first contact with experimenters. For the milestone of getting CloudLab2 resources operational, we need to decide on our initial plan.
Notes:
* Each netscout has 3 blades of 24 40Gb ports each, for a total of 72 40Gb or up to 288 10Gb per switch. So 144 40Gb or 576 10Gb ports. Presumably we can mix and match, with some ports broken out at 10Gb and others not. So we should have plenty of flexibility.
* Six of the 24 ports on each netscout are "Smart Ports". We don't yet know the implications of this, in particular, whether it limits what we can do with those ports.Utah CloudLab2 cluster operational!cloudlabstatus:activeMike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/375Address Meltdown/Spectre vulnerabilities2018-06-21T18:20:37ZMike HiblerAddress Meltdown/Spectre vulnerabilitiesWe need to inform users what our position is (@ricci).
We also need to decide how to address this in future images. Since the performance hit for KPTI (and presumably whatever FreeBSD comes up with) is non-negligible, should we warn users in advance that future images will have the mitigations enabled? Do we offer alternative images?We need to inform users what our position is (@ricci).
We also need to decide how to address this in future images. Since the performance hit for KPTI (and presumably whatever FreeBSD comes up with) is non-negligible, should we warn users in advance that future images will have the mitigations enabled? Do we offer alternative images?cloudlabcriops2status:activehttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/371Add project profiles table to My Profiles page2018-01-22T21:15:17ZDavid JohnsonAdd project profiles table to My Profiles pageFrom a Slack conversation:
I realized while I was talking with some unfamiliar Cloudlab users that it is nontrivial to list all profiles in your projects. For instance, this user was trying to find the `vbg-net` profile in the `SafeEdge` project. He went to the `My Profiles` page, and of course it is not there, because I (`dmjuser`) wrote the profile. Of course he realized he could find it by starting an experiment and clicking `Change Profile`, but he really wanted the `Show Profile` page so he could consider copying the profile -- and he missed the `Show Profile` button in the start experiment page.
So I am wondering if it might be a good idea to also show profiles in my projects on the `My Profiles` page, in a secondary table. We have this info already in the UI; if you clickk Membership and then the project the profile is in, you get that list.From a Slack conversation:
I realized while I was talking with some unfamiliar Cloudlab users that it is nontrivial to list all profiles in your projects. For instance, this user was trying to find the `vbg-net` profile in the `SafeEdge` project. He went to the `My Profiles` page, and of course it is not there, because I (`dmjuser`) wrote the profile. Of course he realized he could find it by starting an experiment and clicking `Change Profile`, but he really wanted the `Show Profile` page so he could consider copying the profile -- and he missed the `Show Profile` button in the start experiment page.
So I am wondering if it might be a good idea to also show profiles in my projects on the `My Profiles` page, in a secondary table. We have this info already in the UI; if you clickk Membership and then the project the profile is in, you get that list.cloudlabstatus:activeLeigh StollerLeigh Stollerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/364Test one of the new nodes2018-03-11T04:27:29ZMike HiblerTest one of the new nodesSince we have the management switches and a spare S4048-ON (the control switches), I can get one of the new nodes up and get its information in the DB and make sure out boot procedures work. We actually have a loaner Mellanox switch too, so I could hook up the experiment interface.Since we have the management switches and a spare S4048-ON (the control switches), I can get one of the new nodes up and get its information in the DB and make sure out boot procedures work. We actually have a loaner Mellanox switch too, so I could hook up the experiment interface.Utah CloudLab2 cluster operational!cloudlabstatus:activeMike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/363Wire up the nodes2018-05-18T15:19:40ZMike HiblerWire up the nodesThe funnest task of all! Once the nodes are racked (#359) and the switches racked (#362) we can run all the wires.The funnest task of all! Once the nodes are racked (#359) and the switches racked (#362) we can run all the wires.Utah CloudLab2 cluster operational!cloudlabstatus:activeDan ReadingDan Readinghttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/362Rack all the switches2018-02-14T16:19:26ZMike HiblerRack all the switchesWe have the management switches and layer1 switches, the control net switches are on the way, still need to order the remaining switches (#360). Rackem as they arrive.We have the management switches and layer1 switches, the control net switches are on the way, still need to order the remaining switches (#360). Rackem as they arrive.Utah CloudLab2 cluster operational!cloudlabstatus:activeMike HiblerMike Hiblerhttps://gitlab.flux.utah.edu/emulab/emulab-devel/issues/361Return incorrect cables and get new ones2018-02-03T00:14:08ZMike HiblerReturn incorrect cables and get new onesHP sent us some cables that were too long or too short. They have agreed to replace them for no cost.HP sent us some cables that were too long or too short. They have agreed to replace them for no cost.Utah CloudLab2 cluster operational!cloudlabstatus:active