Copying Production Moodledata to Dev Brought Down Production.

Copying Production Moodledata to Dev Brought Down Production.

by Jeff White -
Number of replies: 7

Hello All,

I ran into a very interesting issue when I was replicating my production site's moodledata to my dev environment that lead to a outage for my production site. I am still trying to determine the cause and was wondering if I could get some direction. I only copied the Moodledata and not the database nor the moodle code. My dev environment had a few difference in settings along with being in maintenance mode. After the copying of moodledata to the dev environment, end users on production were getting the maintenance mode page and certain settings  and pages on dev were showing on production. I checked the production's database to see if it was in maintenance mode, it was not. My environments are on different hardware and dev is the same design as production. I could not take my production site out of maintenance mode through the GUI. I assume it was the caches that caused the problem but here is where it is odd:

  • tempdir was not copied over as it was not in moodledata. Different storage servers
  • cachedir was not copied over as it was not in moodledata. Different storage servers
  • localcachedir was not copied over as it was not in moodledata. On the apache servers. 
  • MUC caching goes to memcache port 11211. Different servers with prod and dev
  • Session handling goes to memcache port 11212.  Different servers with prod and dev

In the end I had to drop dev completeley and purge all the caching (through the CLI and flush_all with memcache) on production to fix the issue but I am still not 100% certain if the dev impacting production is fixed.. Did I overlook something on how the caching is set in Moodle where something in moodledata that could make dev push caching data to the production server instances? 

Average of ratings: -
In reply to Jeff White

Re: Copying Production Moodledata to Dev Brought Down Production.

by Howard Miller -
Picture of Core developers Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Peer reviewers Picture of Plugin developers

My strong suspicion is that one or more of your assumptions is untrue. As one of my college lecturers used to say (many years ago), "go and prove those assumptions!"

In reply to Howard Miller

Re: Copying Production Moodledata to Dev Brought Down Production.

by Jeff White -

lol Howard smile Which of my assumptions should I re-examine? The assumption that copying moodledata caused the outage, caching to be culprit for the end user issues, or how the environment is set up? 

If you say all of them then you ruined my weekend sad 

In reply to Jeff White

Re: Copying Production Moodledata to Dev Brought Down Production.

by Emma Richardson -
Picture of Documentation writers Picture of Particularly helpful Moodlers Picture of Plugin developers

I would guess that the dev site was still pointing to your live site moodledata folder and that was what caused the issue.

In reply to Jeff White

Re: Copying Production Moodledata to Dev Brought Down Production.

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi Jeff

Did you find out what happened?

For me your "assumptions" make sense, provided you implemented them correctly!
;)

One thing though:
> I ran into a very interesting issue when I was replicating my production site's moodledata to my dev environment that lead to a outage for my production site. I am still trying to determine the cause and was wondering if I could get some direction. I only copied the Moodledata and not the database nor the moodle code.

_I only copied the Moodledata and not the database nor the moodle code._ Now how does that work? If you haven't copied moodle code and the moodle database, you are manipulating the original (production) moodle site! Read https://docs.moodle.org/en/Moodle_migration.

Assuming that is a communication problem, there is a good chance that you something in config.php unchanged, like the trio $CFG->dbname , $CFG->dbuser, $CFG->dbpass.
In reply to Visvanath Ratnaweera

Re: Copying Production Moodledata to Dev Brought Down Production.

by Jeff White -
Hi Visvanath, 


I am still researching the cause but I did discover a mistake I made when I was copying over the moodledata, I left the cron job running but as I had the climaintenance.html the dev's moodledata. That would be fine when the climaintenance is there in moodledata as you will just see a

CLI maintenance mode active, cron execution suspended.
on the cron logs but when I did the rsync that file went away in dev which made the cron jobs run fully. OOPS. I am not sure if the cron job running during the copy (something you should never do) caused things to start getting pushed to production but it is a possibility if something in moodledata pointed to the production instance. 

For clarification on the just copying the moodledata. 

I was only copying what was in $CFG->dataroot. I keep the Moodle application code in a different directory from moodledata. So the config.php, that has all those settings, was not be part of the copy, unless it was in the caching. I did see a directory called moodledata/MUC/config.php which has a mess of things in there but nothing that says points to specific servers. 

In reply to Jeff White

Re: Copying Production Moodledata to Dev Brought Down Production.

by Visvanath Ratnaweera -
Picture of Particularly helpful Moodlers Picture of Translators
Hi Jeff

It may be late, nevertheless did you see this: "Multiple installation of Moodle on the same server are overriding the admin settings. Version 2.9+" https://moodle.org/mod/forum/discuss.php?d=328221 ?
Average of ratings: Useful (1)
In reply to Visvanath Ratnaweera

Re: Copying Production Moodledata to Dev Brought Down Production.

by Jeff White -

Thanks for pointing out this forum post Visvanath. Looks like someone else was suspicious of that moodledata/muc/config.php file. I find it interesting that this occurred with my newly built environment that uses memcache and never had issue with any of my other test instances. I guess we need to include deleting moodledata/muc folder during any upgrade or site migration in moodle docs

I am still trying to get the time to test this situation again with consistent results but I cant really do it in my current dev environment without risking my production instance. I cant really use the excuse "I am applying the scientific method!" if I knock out production again by reproducing the error. This error has yet to occur when i copy a out of the box moodle instance with memcache and a muc/config.php. It only has happened to my production instance. I had the idea of building a completely isolated LAMP cluster that has no ability to communicate with anything outside the cluster and an isolated client machine but that takes time and resources I do not have atm.