|
Pascal
|
Hi Colin, first of all, welcome aboard! My suggestion is: take a look at the speed of your PCs and then you know, why there are different numbers of results. The second thing is indeed the behaviour of each simulation. For this issue - please take a second look at the project's website - www.evolutionary-research.net - and enter 'Muller's Ratchet' into your most favorite search engine. On the website you may find any compact information about that. I also encounter very often, that some simulations only need 40 % of their expected time - others need up to 200 % of that time. So don't worry. There's one statistic on the website which counts the GigaIndividuals - and that's a perfect thing if you want to see your computed time in the stats.
|
|
|
|
|
Gespeichert
|
|
|
|
|
Laurence Loewe
|
Hi Colin,
Thank you for reporting this really strange behaviour. I must admit that I dont have any idea about what could cause this! As far as I understand it, it should have nothing to do with the total number of GigaIndividuals processed or with predicted CPU-times.
However, there are two things that I would be interested in that might contribute to a solution: a) Can you exclude a power-failure or someone playing with the console (i.e. all started at the same time and then were hit by an external cause; you can turn off writing these files if you fiddle with the right console parameters) b) Do you use a 64 bit Windows? (Yes I know, this should not affect the XPs...)
I suggest to try hit the same barreer again to see if it can be reproduced. This is the first time I hear such a report.
Thanks for your patience with this.
Laurence
|
|
|
|
|
Gespeichert
|
|
|
|
|
ColinDB
|
Hello again!
I understand PC speeds as I build my own computers to cater for specific applications. As most are not extensively used, they may as well keep busy when they're not engaged for their specific purposes - hence E@H gets the CPU cycles. I thank you Pascal for your input. Glad to see someone reads my postings. I'm clocking up significant numbers already and will continue to do so in the name of science and research.
As for other possible causes - power failure can be ruled out as 2 other PCs running separate applications were fine during the same timeslot. While all were started "simultaneously", there was a 2hr difference between the first and last computer being started. The same 2hr differentials were reproduced on the timestamps of the mentioned files on each machine. (approx 595hrs from launch to stop report)
All machines are networked in a home environment. There is only my wife and I who live here. My wife would rather play sports than pay any attention to my activities on my PCs. Only 2 of the 6 PCs can access the net through a 1.5Mb broadband connection. (My problem occurred before I had broadband and was still on dial-up). OS is a mixture of 2000 Pro, XP home and XP Pro (SP1 + SP2).
Since 3 of my machines rarely run other applications, I will leave them running without resetting to see if I encounter a repeat of my findings. My other PCs are constantly in use, so they cannot be allowed to run under the same scenario. I will report back in around 12 days as I have 13 days up at present on 1.
This is still fun! I love doing research. Sheer curiosity keeps me wanting to find out what's happening. Perhaps I should get out more!!!
The CPUs will stay warm in the mean time!
Cheers,
Colin.
|
|
|
|
|
Gespeichert
|
|
|
|
|
Laurence Loewe
|
Hi Colin,
Looking forward to see what you report,
Cheers, Laurence
|
|
|
|
|
Gespeichert
|
|
|
|
|
ColinDB
|
Hi again!
Time has passed, and I ran 2 of the 3 trial machines without resetting (the third was required for other work for a short time). Both machines stopped reporting eprogress files after 596.5 hours. The first generated a completed run file 8 hours later then stopped, the other 19 hours later.
I looked back through the reporting files to find out which were the last simulations run and created new run files from the "completed" run files taking out those simulations already run (there were plenty of simulations that had not been looked at yet in both)
I re-booted one machine only, relaunched the program on both and both are behaving normally again. The second machine has been running for nearly 31 days (about 730hrs) with the only hiccup at 596.5 hrs.
Will this shed any further light on what's going on? While resetting a machine every 25 days isn't a problem, it just breaks up run continuity and leaves a CPU idle when it could be doing research.
Cheers,
Colin.
|
|
|
|
|
Gespeichert
|
|
|
|
Barrie
Gast
|
It might be interesting to see if stopping and starting the simulator would be enough to circumvent the problem ie is a re-boot of the OS really needed? If restarting the simulator is enough, it would suggest that the problem is internal to the simulator. If a re-boot is needed, then that suggests some interaction between the simulator and the OS is involved. ?
|
|
|
|
|
Gespeichert
|
|
|
|
|
ColinDB
|
I intend on experimenting and I am doing just that. 1 machine has been restarted without reboot and will be allowed to run again until the next "incident". It has already been running for 4.5 days already and is behaving normally.
Perhaps at this point I should be asking the obvious question - has anyone else run the simulator non-stop past 600 hours (25 days)?
|
|
|
|
|
Gespeichert
|
|
|
|
Barrie
Gast
|
Looking at the high scores (it's what I live for) there have been simulations longer than 25 days, but single simulations may be a special case which do not hit this problem.
|
|
|
|
|
Gespeichert
|
|
|
|
|
ColinDB
|
Just reporting in again to say that all 6 of my machines stall around said 595hrs. All I do now is simply halt the application before this time elapses and relaunch the program (without re-boot) to avoid any lost time. I am surprised that no-one else suffers the same problems or can offer any advice on my findings. I no longer consider it an issue though, I just want to know why for simple curiosity!
Cheers, Colin.
|
|
|
|
|
Gespeichert
|
|
|
|
|
Laurence Loewe
|
Hi Colin and others!
To revisit the issue, I had a quick glance at the source code and I did not find any obvious reason for this behaviour (at least it was not what I had thought for a moment).
An additional question though: Your reports seem to suggest that the simulations can easily run for longer than the 596.5 hours and will also write their *final* results to the hard disk, even if they stopped writing eprogress, elastparameters and intermediate results files. Is that correct?
A question for other Windows participants: Did you observe the same behavioir as reported by ColinDB on your machine? If Yes, I would be interested to know. I assume that it all would have different symptoms on the Mac, if any.
Currently I cannot spend much time to digg into the details, unfortunately, but I hope that I will be able to fix it, once I can focus again on the technical side of the simulators.
Cheers, Laurence
|
|
|
|
|
Gespeichert
|
|
|
|
|
Laurence Loewe
|
An additional question that may help track down the bug:
Does this phenomenon depend in any way on the frequency that eProgress is written?
This can be adjusted, if the simulator is not running, by opening the "S005_Preferences.txt" of the evolution@home-client and changing the value after the line "WriteProgressFileIntervallSec (Write one "eProgress.txt" file after how many seconds) :" to 5 (or any other number you want, the default is 120).
If the simulator is running, the menu can be used to change that.
Cheers, Laurence
|
|
|
|
|
Gespeichert
|
|
|
|
|
ColinDB
|
Hi All,
Slow checking back - sorry.... but feedback is here.
Simulations do run on past 596.5hrs even if progress files stop writing and application will complete it's immediate task. It reports a result, changes the run file to a completed run file then stops. The run file has always had more simulations available to run in each case but sets to completed regardless.
I will adjust the report intervals on 2 of my high end machines. I will change one to"5" and another to "600" and monitor performance.
I assume that if the problem was to occur under these new settings and was attributable to this setting, the machine set to 5 would display it's problems in around 25hrs from start of application. I'll report back in a couple of days.
Keeping on ratcheting!
Cheers, Colin.
|
|
|
|
|
Gespeichert
|
|
|
|
|
Laurence Loewe
|
I'll report back in a couple of days.
Hi Colin, Thanks for doing these tests. I'm interested in the results. I definitively want to get to the bottom of this (if I can) to avoid such problems with future simulators. Cheers, Laurence
|
|
|
|
|
Gespeichert
|
|
|
|
|