We are using a moodle XAMPP installation on Windows 2008 for our credentials forwarding server for Moodle A. This is within our DMZ. We're Moodle version 2.5.2 for Moodle A.
We have our learning content on Moodle B which is in the cloud on a Linux server.
We have firewall ports open on 80 and 443 in both directions for communication between Moodle A and Moodle B.
LDAP authentification is working just fine on Moodle A. We've had single signon working ok with
users logging onto Moodle A then getting to courses on Moodle B. Then it stopped working with
users getting an MNET timeout error:-
"Ooops! Your MNET communication has failed! Here's that error message to pass on to your administrator: connect() timed out!ERROR 28:
28:connect() timed out!"
Some users get the 'page can't be displayed' instead of the MNET timeout error.
Then it started working again - it wasn't clear why. Then it stopped working again. And it started working and stopped.... alternating over a period of three weeks.
Our supplier for Moodle B in the cloud has a number of customers using a similar credentials-forwarding configuration without problems, but they are all using LAMPP (Linux) for their Moodle A. We're the only customer with these problems.
We find that on Moodle B we have to paste in the public key for the Moodle A peer, it doesn't pick it up automatically. We get a message on the peer page that the public key doesn't match the one on moodle A and the public key for Moodle A is reported as being blank:
"The public key you are holding for this host is different from the public key it is currently publishing.
The current public key is: <BLANK>"
However in spite of the above error message we find users can still logon to Moodle A and get to content on Moodle B SOME OF THE TIME.
Is anyone aware of similar MNET issues for the windows platform?
If so is there a resolution to these problems?
Any ideas at all out there?
If you were me would you throw away Windows Moodle A and start over again with a Linux/LAMPP Moodle A?
Thanks for your attention so far.
When I say intermittent I mean its only working one third of the time. Unacceptable.
we've refreshed and copied the keys in both directions.
First, to your problem …
One item that might cause issues with MNET had been time clocks of servers.
The Moodle ‘in the cloud’ (i.e., remotely hosted) is it in the same time zone? Is your Mooble B operating system time clock on an NTP service?
Have you turned on debugging to see if there is anything more reported?
Have you checked (I assume) apache error logs?
Have you increased the PHP time limits for a script to run/memory to use?
Line 8 or so, there is this variable: var $timeout = 60
Might change that to 90. No need to restart server.
Might have to increase again if there appears to be a connectivity issue.
Connectivity between the ‘cloud’ server and your Moodle runs through internet. Have you done things like trace routes and pings at the provider Moodle to see if there isn’t a hop somewhere between that could be causing the issue? How about any changes to your entities firewall rules?
What does the provider say or recommend you do? Switch to Linux because they can support it best?
Begin opinion ...
This might start an OS war … but … so be it …
MS has extended the life of server 2008 (just last year) so you will be looking at moving upwards soon (and shelling out some more $).
Moving to a long term supported version of Linux (Ubuntu Server LTS or CentOS 6.5) means $0 … no more cals, licenses, extra ‘taxes’. Will provider of service also provide remote support for your Moodle?
Then again, it might mean a learning curve for system admin … I understand that … but, if one looks at the tech world globally … there’s more *nix’s (or at least based on *nix) out there than ever before … iOS, MacOSX, Android, Chromebooks, etc.. Don’t know how one can ‘avoid’ them!!! Administering a Linux server, BTW, isn't all that difficult and there is tons of free support forums/blogs etc. to help. Just have to be willing to learn and work a little.
... end opinion.
Again, what does the provider say/recommend?
And a question for the future of whatever remote resources using ... is provider looking at LTI?
'spirit of sharing', Ken
I’m a colleague of Linda’s and have been looking at this with her.
We have captured the packets being sent in-between servers A and B by running Wireshark on server A. To get them to talk to each other, we deleted the public key on B. A new one was created and shortly afterwards it attempted a keyswap with server A.
When that happens, our server (A) will respond with the following message:
<value><string>Sorry, but that hostname (0) could not be resolved!</string></value>
Based on some of the suggestions at https://mahara.org/interaction/forum/topic.php?id=4584 I added some debug statements to the bootstrap function in moodle\mnet\peer.php. I print the value of the $wwwroot argument on entry into that function and again after the set_wwwroot function call. In both cases the output value is 0. Perhaps unsurprisingly, the result of the call to mnet_get_hostname_from_uri is also 0.
We believe that this is the reason that the UI on server B (under Site Administration >Networking > Peers) says that the currently published public key for server A is <empty string>.
I’m going to follow the chain of function calls backwards but whilst I do that, does anyone have any idea what the problem might be?
Note: It’s worth reiterating that after a period of not talking to each other, the two servers will establish a trust relationship and the system will function normally (exchanging encrypted messages) for a while but will eventually fall out of sync again.
Well, that's a new one! But it does say cannot resolve ... which means DNS.
On the 'troubled' server and from the 'troubled' server can it resolve it's FQDN? Can it resolve the FQDN of the other Moodle to which it's trying to establish an MNet relationship?
Both Moodles should be using a publically resolvable FQDN.
Check /etc/resolv.conf for the DNS servers the 'troubled' server is pointed to. Do dig queries using those DNS servers specifically. If there are issuing resolving, use @220.127.116.11 (a Google DNS server).
'spirit of sharing', Ken
Thanks for your comments, Ken. They are much appreciated.
Our Moodle server (the Windows server which is handling the auth) can resolve the names of the other Moodle server as well as itself. I've verified this by flushing the DNS records and doing nslookups on both FQDNs. FYI, our server is using Google public DNS. I have no idea about the remote server but it's our server that's reporting the resolution problem. It can certainly resolve our FQDN as it sends requests to us.
I think the problem might be deeper than DNS resolution as peer.php (which is throwing the exception) is trying to parse and resolve a $wwwroot value of 0.
I'll be following that code path back up the stack this morning.
After some more investigation I'm seeing some very strange behaviour. As I said above, I have added error logging statements to peer.php to output the values of some of the variables. As far as I'm aware, those sections of code should always be hit.
I logged onto the admin interface on the cloud/hosted Moodle instance and deleted the public key. This caused that server to talk to our server and do a keyswap. It was successful. None of my logging statements were hit.
I logged onto the admin interface and deleted the public key again. Another keyswap was attempted. It failed. All of my logging statements were hit. The usual info ($wwwroot: 0 etc) was written to the webserver error log.
I'm very confused.
UPDATE: It looks like the successful keyswap might have going in the other direction (initiated by our server) as I have been unable to recreate what I've described above.
We've decided to go for Linux(Unbuntu). We'll post how this turns out.
Many thanks for your comments Ken.
Thanks Ken for your reply
Kevin's reply above hones into what we now think is the problem area.
We do have firewall rules set up between the two servers with ports 443 and 80 open
in both directions which we've double checked. We can ping between the two servers.
Our supplier has been trying to help us through the problem. They've commented so far that their other customers are all on Linux and have no problems.
Re Linux I would be happy to make the move as I have a background in unix (hp-ux), but we're not convinced the problem is the OS. What we know is our server (Moodle A) is not able to verify the security certificate that accompanies the messages from the cloud server; therefore it doesn't trust it.