00:11:42 -!- sesuncedu1 [~Adium@cpe-076-182-016-061.nc.res.rr.com] has quit [Quit: Leaving.]
00:23:54 erikc [~erikc@CPE00222d53fe78-CM00222d53fe75.cpe.net.cable.rogers.com] has joined #ccl
00:25:11 DataLinkDroid [~DataLinkD@1.130.56.238] has joined #ccl
00:29:56 -!- DataLinkDroid [~DataLinkD@1.130.56.238] has quit [Ping timeout: 256 seconds]
00:32:44 -!- dmiles_afk [~dmiles@c-71-237-234-93.hsd1.or.comcast.net] has quit [Quit: Read error: 110 (Connection timed out)]
00:33:08 were purify and freeze used in relation with write-elf-symbols-to-file so that functions would be at a fixed address for the sake of OPROFILE ?
00:46:25 sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has joined #ccl
00:47:17 DataLinkDroid [~DataLinkD@1.150.18.96] has joined #ccl
00:50:00 That may have been it. FREEZE has effectively been deprecated for the last few years. It should probably be removed.
00:52:02 -!- DataLinkDroid [~DataLinkD@1.150.18.96] has quit [Ping timeout: 256 seconds]
00:57:20 does write-elf-symbols-to-file still do something useful for OPROFILE ?
00:57:44 does freeze and/or oprofile still work?
01:00:14 PURIFY. (It'd be pretty stupid if it didn't). As I said a minute ago, FREEZE has been deprecated and should be removed. oprofile still works , though a newer profiler (perf) is used on newer Linux kernels. Using perf with CCL also depends on WRITE-ELF-SYMBOLS-TO-FILE.
01:02:42 -!- sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has quit [Quit: Leaving.]
01:05:01 sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has joined #ccl
01:08:12 DataLinkDroid [~DataLinkD@1.145.251.252] has joined #ccl
01:14:58 ok, so remove both purify and freeze with no regret.
01:15:00 thanks.
01:16:50 And I would guess that (save-application ... :purify t) - where T is the default - is perfectly safe. Changing the order of calls to FREEZE and PURIFY may have caused the problem, but I don't think that either call is necessary.
01:18:17 ok
01:18:20 thanks
01:23:47 -!- sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has quit [Quit: Leaving.]
01:24:00 -!- rme [rme@6D10F4DD.4CC8819B.699BA7A6.IP] has quit [Quit: rme]
01:24:01 -!- rme [~rme@50.43.190.179] has quit [Quit: rme]
01:25:18 rme [~rme@50.43.190.179] has joined #ccl
01:26:39 pjb` [~t@AMontsouris-651-1-198-45.w83-202.abo.wanadoo.fr] has joined #ccl
01:27:35 -!- pjb [~t@AMontsouris-651-1-93-50.w82-123.abo.wanadoo.fr] has quit [Read error: Operation timed out]
01:47:34 -!- DataLinkDroid [~DataLinkD@1.145.251.252] has quit [Ping timeout: 256 seconds]
01:56:04 just tried wit 1.9 r15784, and it entered the debugger during the save-application :-/
01:57:58 you're using 1.9 now ?
01:58:01 Unhandled exception 11 at 0x41cbc9, context->regs at #x7fff8b9d4058
01:58:01 Exception occurred while executing foreign code
01:58:01 at check_range + 105
01:58:01 received signal 11; faulting address: 0x300000cfc000
01:58:01 invalid permissions for mapped object
01:58:01 ? for help
01:58:03 [9423] Clozure CL kernel debugger:
01:58:05 just tried it
01:58:13 going back to 1.8
02:02:59 check_range is part of the integrity checking code.
02:03:28 should I try w/o check-gc-integrity ?
02:04:37 It wouldn't fault there if you did. I don't know why it would fault there.
02:08:16 DataLinkDroid [~DataLinkD@1.145.128.72] has joined #ccl
02:09:32 it didn't use to fault with r15782
02:10:02 Did you update the kernel a couple of days ago ?
02:10:15 I don't think I did on this machine
02:10:37 up 54 days. kernel 3.2.5-gg987
02:11:06 Um, the CCL kernel ?
02:15:24 oh, well, I just tried 15784.
02:15:52 Previously, I was trying 1.9 15782; and my base version is still 1.8 15490.
02:23:48 indeed on r15784, it manages to dump an image w/o check gc integrity enabled
02:27:24 So, we don't look for problems, and don't find any. Not exactly reassuring.
02:27:24 -!- ipmonger [~IPmonger@c-68-81-244-69.hsd1.pa.comcast.net] has quit [Quit: ipmonger]
02:27:36 Of course it isn't.
02:28:33 didn't claim it was reassuring.
02:28:37 The address that it faulted on is in the readonly area; I just added code to check that area the other day. The function in question is called 'check_readonly_area", not 'check_area', so that doesn't make any sense.
02:37:27 -!- DataLinkDroid [~DataLinkD@1.145.128.72] has quit [Ping timeout: 256 seconds]
02:39:31 consolers [fork@59.92.56.123] has joined #ccl
02:40:29 any *bsd users know how to get alt send meta to emacs on the console ?
02:52:20 I guess not.
02:53:28 -!- consolers [fork@59.92.56.123] has quit []
02:54:03 Fare: can you tell if the image saved with integrity checks off exhibits the same problems with CTYPEs etc ?
02:57:08 DataLinkDroid [~DataLinkD@101.171.200.98] has joined #ccl
02:57:12 -!- alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has quit [Quit: alms_]
02:59:03 alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has joined #ccl
03:01:48 -!- DataLinkDroid [~DataLinkD@101.171.200.98] has quit [Ping timeout: 256 seconds]
03:10:51 probably not.
03:11:06 Since I removed the purify and freeze, I haven't seen the ctype thing again
03:11:25 and my checks says A-Ok
03:12:09 OK. So at this point we aren't sure why the GC check faulted and would like to fix that, but other problems seem to be gone ?
03:12:43 yes, it looks lik
03:13:12 Good.
03:13:32 I'm seeing some unstability, but I don't think it's related to CCL, more like plenty of timeouts connecting to Oracle, and things like that.
03:14:35 Always glad when it's someone else's problem, if indeed it is.
03:14:45 but I need to check each failed test one by one to assess why the hell it failed.
03:14:58 OK.
03:16:12 DataLinkDroid [~DataLinkD@1.147.71.91] has joined #ccl
03:16:30 I'll go eat then. Good night.
03:20:14 The value # is not of the expected type (MEMBER :WITH-INFANT-IN-LAP :INFANT-IN-SEAT ...)
03:20:19 looks like a memory corruption
03:20:34 using 1.9 r15784
03:20:48 Well then I won't go eat then.
03:20:55 so... much rarer, but still something fishy :-(
03:21:17 I'm running another test suite w/ 1.8 15490, but haven't looked at all the failures.
03:21:46 re-running that test on a clean image (also 1.9 r15784) works
03:22:38 42...42 is that B B in UCS4?
03:23:04 Yes. Could be part of a string.
03:23:34 the substring BB appears in one of the strings manipulated by the test.
03:25:02 another failure with fixnum 0 instead of the above in the same type check failure
03:26:14 (this happens while parsing messages)
03:27:12 and various other objects, still with the same typecheck while parsing.
03:27:20 in other similar tests
03:30:21 I'll stop that run. It's discouraging.
04:33:40 on 1.8 15490, with check gc integrity, I get:
04:33:43 Missing memoization in doublenode at 0x302012ff1740
04:33:43 ? for help
04:33:43 [7061] Clozure CL kernel debugger:
04:33:48 in one of the runs.
04:42:29 A good reason to get to 1.9 ASAP.
05:00:08 well, 1.9 is much more unstable, as seen above.
05:00:59 So we need to find out why and fix that. Can you send me the image that gets the test failures and explain how to generate them ?
05:01:01 and 1.8 w/o integrity check doesn't seem to have issues (at least, it's been running rather stably for us for a year)
05:01:39 "run our big bulking test suite against an oracle database" is currently how I reproduce the test -- not very transmissible.
05:02:22 That involves the FFI as well as databases and hulking test suites ?
05:04:18 (:SSR-CHLD 0 #)
05:04:41 yes -- presumably the error could be narrowed down, but at the moment, the field is wide open.
05:05:16 these failures don't happen with sbcl, didn't happen with 1.8 (except that 1.8 seems to fail gc integrity checks once in a while when enabled)
05:06:14 looks like these errors are always around the same part of the code -- and in multiple processes, so it's not just one image being corrupted one way, but multiple being corrupted in similar ways
05:07:08 Well then, they're either CCL bugs or something else. I agree that it'd be necessary to narrow that down further.
05:08:20 Multiple OS-level processes ? Or multiple lisp threads ?
05:08:37 OS-level processes
05:09:02 we are still running one worker thread per process, multiple processes. Sucks.
05:09:45 well, the symptom is pretty low-level and didn't happen with 1.8, so that suggests a CCL bug.
05:10:22 that particular function that appears on a lot of the backtraces uses that evil nconc primitive, in case that matters
05:10:50 oh, and calls a method with a dynamic-extent declaration
05:11:10 Yes. If rme or I came back there later this week and just got locked in a room until this worked, would that work for you ?
05:11:36 probably
05:12:00 meanwhile, I'll try removing the dynamic-extent declarations and see if that works better.
05:12:29 I think that the possibility was discussed. If we wanted to do it this week, we'd need to decide soon.
05:13:49 yes, I'll talk to Allan tomorrow.
05:14:40 OK.
05:15:17 that 1.8 image seems to not have had any ORA-00000 failure. Dunno if it's because running later at night there's less contention, or because it is less buggy.
05:15:59 a lot of process attrition in this 1.9 test
05:16:41 Dunno. I think of 1.9 as being a lot less buggy than 1.8 was, and I also think that a lot of the changes weren't THAT extensive.
05:21:58 yes, all the slaves are dead or stalled
05:22:59 did dynamic-extend treatment change much?
05:23:30 Nope.
05:24:54 At some point in the last year or two, we allowed larger objects to be stack-allocated but IIRC that was before 1.8
05:27:04 Does your test suite allow running a single test ? Multiple times ?
05:28:54 yes, it does
05:29:04 or, well, we can start a REPL and do it.
05:29:12 Good.
05:30:35 which is not a 100% reproduction of the test suite, in that logging doesn't go to the same place, and the cpu pressure is less, and well, slime is running.
05:32:07 And if that behaves differently, that might tell us something. I wouldn't think that the #x4b0000004b and similar things are timing-sensitive, but don't know for sure.
05:32:43 I'm perfectly willing to blame slime for everything, on general principles.
05:41:29 well, it's sensitive to something, because if I just run the test once at the repl, it works.
05:43:52 -!- DataLinkDroid [~DataLinkD@1.147.71.91] has quit [Ping timeout: 256 seconds]
05:46:04 If you turn on gc integrity checks in this image, does it get a memory fault ?
05:51:05 on the 1.9 image or 1.8 image?
05:51:21 on the 1.8, I'll try when the current run is over -- I bet not
05:51:54 1.9, if that's where you were getting errors and if that's where the image couldn't be saved with checks enabled.
05:52:46 on the 1.9 I'm trying again w/ a few suspect dynamic-extent removed, right now -- and I think it didn't have the gc integrity check, anyway, because I can't even dump an image w/ the integrity check
05:53:04 good question...
05:53:20 Yes. Was wondering if it would find anything now, or if it'd fault again.
05:56:39 (recompiling it)
06:08:27 DataLinkDroid [~DataLinkD@123.208.122.198] has joined #ccl
06:08:54 pjb`` [~t@AMontsouris-651-1-122-197.w83-202.abo.wanadoo.fr] has joined #ccl
06:10:52 -!- pjb` [~t@AMontsouris-651-1-198-45.w83-202.abo.wanadoo.fr] has quit [Ping timeout: 256 seconds]
06:22:24 -!- DataLinkDroid [~DataLinkD@123.208.122.198] has quit [Ping timeout: 256 seconds]
06:23:31 yes, the image fails the integrity check after the dump
06:24:19 http://paste.lisp.org/+2X7C
06:25:49 however, it passes the tests so far.
06:26:03 a few thousand more tests to go
06:26:07 same way as it did at image-save time.
06:26:26 yes, same way, modulo slightly different numbers
06:27:12 (and a slower gc)
06:27:17 check_range checks a range of addresses between X and Y. The address that it's faulting on is less than any value of X that the function is called on.
06:35:06 is it a bug in the check because it's failing to consider purified memory areas, or is it a bug in the purify because it's doing something wrong?
06:35:13 I could retry with :purify nil
06:35:20 (now that I don't have a manual purify)
06:36:42 I don't understand yet how what happens happens. It's as if it found some object whose size was negative and backed up instead of moving forward, but I don't yet see how that could happen.
06:38:48 if I dump an image w/o purify, maybe after purification, it can reliably display the bug?
06:39:40 Maybe. It's walking another region of memory when it gets confused.
06:41:58 Are tests stilll running ?
06:42:52 However many #\l s in 'still
06:42:56 '
06:43:34 well, removing those particular dynamic-extent declarations didn't help
06:43:48 yes, tests are still running
06:44:05 And you got some of the same kinds of failures.
06:45:47 I stopped the 1.9 run -- it has those same corruptions as before
06:46:01 the 1.8 run is still one. It has a few failures that I have to investigate.
06:46:22 (possibly the oracle connection failure, or maybe something else)
06:46:50 now trying 1.9 w/o purify
06:51:15 Your paste involved a 1.9 image saved without integrity checks, but when you enabled them it faulted for reasons that I don't understand. If we don
06:51:58 you previous message was cut at "If we don"
06:52:10 't make progress in other ways and if that image is still around, it might make sense for me to look at it if possible.
06:52:45 from almost identical source, I just clobbered the image w/o purify
06:52:57 Sorry.
06:53:23 Don't understand that.
06:53:34 even w/o purify, same bug
06:53:45 I can send the image
06:54:03 can you enable incoming like rme did before?
06:54:15 Let me try to recreate a place to put it. Just a sec, yes.
06:56:41 Hope I did that right. Should exist with correct permissions now.
06:57:19 same w/o purify http://paste.lisp.org/+2X7C/1
06:58:20 (bzip2'ing)
06:58:34 (stdin): 6.619:1, 1.209 bits/byte, 84.89% saved, 398971120 in, 60277600 out.
06:59:03 cd: Access failed: 550 Failed to change directory. (/pub/incoming)
06:59:32 try again ?
06:59:50 same
07:01:15 sorry. Our IT guy keeps more normal hours ... try again ?
07:03:08 can cd, can't put
07:04:49 one last try, please.
07:06:26 put: Access failed: 550 Permission denied. (borks2.bz2)
07:08:16 just a sec
07:10:37 once again ?
07:12:54 nope
07:13:07 which ftp server do you use?
07:13:20 vsftpd. I'm not familiar with it.
07:16:40 me neither
07:17:23 Sorry. Can beg rme tomorrow, I guess.
07:17:48 Guy has way too much job security ...
07:17:51 I tried vsftpd loooong ago and didn't like it. muddleftpd is what I ended up using in the end.
07:20:15 on the 1.8 + check, I got two slaves dead with Missing memoization in doublenode at 0x302012ff1740
07:20:42 make that 3
07:20:58 That may be spurious, or may have been fixed. Don't remember.
07:27:23 I could also try to rebuild w/ 1.9 and check the gc after each file.
07:27:25 sigh.
07:28:04 but while I'm sleeping, I'll do a plain 1.8 with purify but no gc check, like I'd like to commit.
07:28:29 If I could guess how to set permissions right, we might be able to make some progress.
07:30:32 not tonight anymore, I fear
07:32:02 Well, I might make progress. As it is, I just have to wimp out and ask rme to do it.
07:33:01 I'll do so, and will try to be up early (relatively) tomorrow.
07:39:19 any other proposed way to ship 58MB around?
07:39:54 Not that I can think of; I don't think about this sort of stuff much anymore.
07:41:22 dropbox
07:42:05 I suppose I could encrypt and dropbox
07:42:08 If I had clue one of how to use dropbox, that'd be a great idea. Thanks, but I am sans clue one.
07:42:18 or encrypt and google doc or something
07:43:38 I greatly enjoy not knowing a damned thing about that sort of thing. I think that the best option may be "wait for rme and try to get up early tomorrow."
07:44:27 creating a google drive document
07:44:31 I hope I can share it
07:45:26 And I hope I can understand how to access it. I'll try ...
07:47:02 message sent
07:48:54 I think that's the unpurified image
07:50:00 downloading it.
07:51:51 thanks
07:52:15 got it. thanks. If the bug
07:52:47 sorry. If the bug is in the integrity checking code I'll try to commit a fix; otherwise, I'll try to identify the bug.
07:53:29 if you need the source, I fear my paranoid overlords will require that you come here -- although I suppose some kind of screen sharing via google hangout could be possible.
07:55:35 I'm tentatively planning on being back there thursday and friday, assuming that others agree.
07:56:38 ok
07:56:41 Krystof [~user@81.174.155.115] has joined #ccl
07:56:43 will talk to Allan tomorrow.
07:56:48 bye!
07:57:20 Bye.
07:58:08 -!- Fare [fare@nat/google/x-gbgnmakmjqxqjkcm] has quit [Quit: Leaving]
08:03:07 DataLinkDroid [~DataLinkD@1.144.70.106] has joined #ccl
09:06:24 -!- PuffTheMagic [uid3325@gateway/web/irccloud.com/x-hrrdccslqgajyzqt] has quit [*.net *.split]
09:06:24 -!- peccu1 [~peccu@KD106179020073.ppp-bb.dion.ne.jp] has quit [*.net *.split]
09:15:12 peccu1 [~peccu@KD106179020073.ppp-bb.dion.ne.jp] has joined #ccl
09:52:13 DataLinkD2 [~DataLinkD@101.175.65.1] has joined #ccl
09:53:46 -!- DataLinkDroid [~DataLinkD@1.144.70.106] has quit [Ping timeout: 256 seconds]
09:55:25 DataLinkDroid [~DataLinkD@101.175.65.1] has joined #ccl
09:55:33 -!- DataLinkD2 [~DataLinkD@101.175.65.1] has quit [Read error: Connection reset by peer]
10:00:00 -!- DataLinkDroid [~DataLinkD@101.175.65.1] has quit [Ping timeout: 256 seconds]
11:07:39 PuffTheMagic [uid3325@gateway/web/irccloud.com/x-guimktxcdemgamps] has joined #ccl
11:33:32 sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has joined #ccl
11:56:51 rme_ [~rme@50.43.190.179] has joined #ccl
12:04:43 -!- rme [~rme@50.43.190.179] has quit [Read error: Connection reset by peer]
12:04:44 -!- Krystof [~user@81.174.155.115] has quit [Ping timeout: 246 seconds]
12:04:44 -!- rme_ is now known as rme
12:40:54 Krystof [~user@81.174.155.115] has joined #ccl
13:41:47 beyeran [~beyeran@p54A90D93.dip0.t-ipconnect.de] has joined #ccl
13:43:15 -!- beyeran [~beyeran@p54A90D93.dip0.t-ipconnect.de] has quit [Client Quit]
13:43:55 -!- sellout- [~Adium@c-98-245-92-119.hsd1.co.comcast.net] has quit [Quit: Leaving.]
13:54:28 -!- alms_ [~alms_@209-6-130-32.c3-0.bkl-ubr1.sbo-bkl.ma.cable.rcn.com] has quit [Quit: alms_]
14:13:29 alms_ [~alms_@173-162-137-153-NewEngland.hfc.comcastbusiness.net] has joined #ccl
14:22:14 sellout- [~Adium@c-50-134-130-65.hsd1.co.comcast.net] has joined #ccl
16:53:29 -!- sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has quit [Quit: Leaving.]
17:15:52 sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has joined #ccl
19:45:01 -!- sellout- [~Adium@c-50-134-130-65.hsd1.co.comcast.net] has quit [Quit: Leaving.]
19:54:07 -!- sesuncedu [~Adium@cpe-076-182-016-061.nc.res.rr.com] has quit [Quit: Leaving.]
20:06:31 -!- pjb`` is now known as pjb
20:20:44 sellout- [~Adium@c-98-245-92-119.hsd1.co.comcast.net] has joined #ccl
20:31:38 Fare [fare@nat/google/x-myimhsmchviquuwz] has joined #ccl
21:17:07 dioxirane [~lqcd@unaffiliated/dioxirane] has joined #ccl
21:32:36 -!- dioxirane [~lqcd@unaffiliated/dioxirane] has quit [Quit: leaving]
21:39:28 DataLinkDroid [~DataLinkD@1.149.61.229] has joined #ccl
21:39:49 -!- Fare [fare@nat/google/x-myimhsmchviquuwz] has quit [Quit: Leaving]
21:54:00 -!- DataLinkDroid [~DataLinkD@1.149.61.229] has quit [Ping timeout: 256 seconds]
21:54:42 DataLinkDroid [~DataLinkD@120.154.131.2] has joined #ccl
21:59:50 -!- alms_ [~alms_@173-162-137-153-NewEngland.hfc.comcastbusiness.net] has quit [Quit: alms_]
22:00:48 -!- DataLinkDroid [~DataLinkD@120.154.131.2] has quit [Ping timeout: 256 seconds]
22:14:54 DataLinkDroid [~DataLinkD@1.149.237.54] has joined #ccl
22:29:08 -!- DataLinkDroid [~DataLinkD@1.149.237.54] has quit [Ping timeout: 256 seconds]
22:43:38 DataLinkDroid [~DataLinkD@123.208.33.105] has joined #ccl
22:50:15 patrickwonders [~Patrick@user-38q42ns.cable.mindspring.com] has joined #ccl
23:00:52 -!- DataLinkDroid [~DataLinkD@123.208.33.105] has quit [Ping timeout: 256 seconds]
23:06:44 -!- patrickwonders [~Patrick@user-38q42ns.cable.mindspring.com] has quit [Quit: Leaving]
23:15:13 DataLinkDroid [~DataLinkD@123.208.83.245] has joined #ccl
23:26:56 -!- DataLinkDroid [~DataLinkD@123.208.83.245] has quit [Ping timeout: 256 seconds]
23:40:44 DataLinkDroid [~DataLinkD@1.148.231.236] has joined #ccl
23:46:46 -!- DataLinkDroid [~DataLinkD@1.148.231.236] has quit [Ping timeout: 256 seconds]