Comments
-
Greg Hurrell
Ok, I've done steps 1 and 2.
GRANT DELETE ON the_db.sessions TO pruner@localhost IDENTIFIED BY 'foobar';
Turns out that
DELETE
privileges alone are not enough due to theWHERE
clause in the query:ERROR 1143 (42000): SELECT command denied to user 'pruner'@'localhost' for column 'updated_at' in table 'sessions'
So we add the minimal
SELECT
privilege on the required column only.GRANT SELECT (updated_at) ON the_db.sessions TO pruner@localhost; FLUSH PRIVILEGES;
Now just have to wait for sufficient records to build up and do some performance testing.
-
Greg Hurrell
Ok, so it looks like automated clearing every 30 days is just not going to be an option; it's just too slow:
mysql> select count(*) from sessions; +----------+ | count(*) | +----------+ | 534861 | +----------+ 1 row in set (1.33 sec)
So that's about 6 weeks' sessions. Pruning sessions older than 30 days would delete just over 200k sessions:
mysql> select count(*) from sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY); +----------+ | count(*) | +----------+ | 201177 | +----------+ 1 row in set (0.27 sec)
So I decided to try the deletion:
mysql> DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY);
After several minutes the query was still running and so I decided to throw up a maintenance webpage on the website to reduce contention for the database:
$ mysqladmin -u root -p processlist Enter password: +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+ | 16 | foo | localhost | foo | Sleep | 17 | | | | 17 | foo | localhost | foo | Sleep | 18 | | | | 18 | foo | localhost | foo | Sleep | 16 | | | | 19 | foo | localhost | foo | Sleep | 17 | | | | 20 | foo | localhost | foo | Query | 496 | updating | DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) | | 25 | root | localhost | | Query | 0 | | show processlist | +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+
By the time the query had been running for 10 minutes I decided to go even further and shut down the background mongrel processes which were sitting idle (not handling any requests because all queries were being directed to the maintenance page, but still with active connections to the database):
$ mysqladmin -u root -p processlist Enter password: +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+ | 20 | foo | localhost | foo | Query | 649 | updating | DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) | | 28 | root | localhost | | Query | 0 | | show processlist | +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+
After 1000 seconds the query was still running:
$ mysqladmin -u root -p processlist Enter password: +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+ | 20 | foo | localhost | foo | Query | 1000 | updating | DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) | | 30 | root | localhost | | Query | 0 | | show processlist | +----+----------+-----------+----------+---------+------+----------+------------------------------------------------------------------------------+
So I decided to abort it:
Query aborted by Ctrl+C
But note that the abort didn't take effect immediately; five minutes later it was still in progress:
$ mysqladmin -u root -p processlist Enter password: +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+ | 20 | foo | localhost | foo | Query | 1459 | end | DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) | | 37 | root | localhost | | Query | 0 | | show processlist | +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+
So I decided to go a step further and kill the corresponding thread:
$ mysqladmin -u root -p kill 20
After five or six minutes the thread was still lingering:
$ mysqladmin -u root -p processlist Enter password: +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+ | 20 | foo | localhost | foo | Killed | 1839 | end | DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) | | 42 | root | localhost | | Query | 0 | | show processlist | +----+----------+-----------+----------+---------+------+-------+------------------------------------------------------------------------------+
But finally exited!
ERROR 1053 (08S01): Server shutdown in progress
So now with the maintenance page up and no other threads in the database I tried once again:
mysql> DELETE FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY); ERROR 2006 (HY000): MySQL server has gone away No connection. Trying to reconnect... Connection id: 44 Current database: foo Query OK, 201177 rows affected (24 min 46.31 sec)
So as you can see, nearly 25 minutes to execute the query, which is clearly too long to ever turn this into an automated process.
I need to consider two options now:
- Keep fewer sessions in the database and purge them more often (for example, every seven days); will need to test if the lower number of rows in the table leads to sufficiently short queries.
- Alternatively, forget about using the MySQL-backed session store and switch to something that either doesn't need purging (ie. the cookie-based store) or is sufficiently fast (not sure what my options might be there).
-
Greg Hurrell
A little over two months since my last post and just checked the session count: 1,294,381 records. Of those, about two thirds are old enough to be pruned:
mysql> SELECT COUNT(*) FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY); +----------+ | count(*) | +----------+ | 865682 | +----------+ 1 row in set (0.90 sec)
I'm going to shut down the server and prune one more time. I am curious to see if a deletion based on the primary key (the
id
) is any faster.mysql> SELECT id FROM sessions WHERE updated_at < DATE_SUB(CURDATE(), INTERVAL 30 DAY) ORDER BY id DESC LIMIT 1; +---------+ | id | +---------+ | 1066871 | +---------+ 1 row in set (0.89 sec) mysql> DELETE FROM sessions WHERE id < 1066871; Query OK, 865688 rows affected (2 hours 6 min 37.96 sec)
Evidently no faster. Going to audit the codebase for possible security implications of moving to the cookie-backed session store.
-
Greg Hurrell
How would I get a histogram of session sizes in my current database? Would the MySQL
LENGTH
function work?Here's a test done on the staging database:
mysql> CREATE TEMPORARY TABLE session_lengths (id int, session_length int); Query OK, 0 rows affected (0.04 sec) mysql> INSERT INTO session_lengths SELECT id, length(data) FROM sessions; Query OK, 425014 rows affected (53.00 sec) Records: 425014 Duplicates: 0 Warnings: 0 mysql> SELECT session_length, COUNT(*) AS frequency FROM session_lengths GROUP BY session_length ORDER BY frequency DESC; +----------------+-----------+ | session_length | frequency | +----------------+-----------+ | 86 | 378495 | | 139 | 46504 | | 106 | 4 | | 212 | 2 | | 159 | 2 | | 171 | 2 | | 122 | 1 | | 143 | 1 | | 114 | 1 | | 167 | 1 | | 163 | 1 | +----------------+-----------+ 11 rows in set (0.21 sec)
Almost all records are 86 chars (that's the marshalled, encrypted form), with a smallish minority 129 chars, and some outliers. Evidently 86 chars is probably the baseline for an "empty" session; whatever Rails happens to be sticking in there by default, plus the overhead of marshalling, encryption, and encoding.
-
Greg Hurrell
Same thing, except done on production database:
mysql> CREATE TEMPORARY TABLE session_lengths (id int, session_length int); Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO session_lengths SELECT id, length(data) FROM sessions; Query OK, 428845 rows affected (2 min 0.11 sec) Records: 428845 Duplicates: 0 Warnings: 0 mysql> SELECT session_length, COUNT(*) AS frequency FROM session_lengths GROUP BY session_length ORDER BY frequency DESC; +----------------+-----------+ | session_length | frequency | +----------------+-----------+ | 86 | 405663 | | 139 | 20927 | | 143 | 451 | | 155 | 339 | | 147 | 302 | | 163 | 183 | | 159 | 142 | | 151 | 131 | | 167 | 119 | | 171 | 114 | | 135 | 107 | | 175 | 83 | | 179 | 60 | | 183 | 56 | | 212 | 28 | | 192 | 24 | | 208 | 21 | | 188 | 21 | | 196 | 18 | | 232 | 16 | | 240 | 7 | | 228 | 6 | | 131 | 5 | | 236 | 4 | | 297 | 4 | | 106 | 3 | | 338 | 2 | | 200 | 2 | | 301 | 1 | | 216 | 1 | | 114 | 1 | | 305 | 1 | | 346 | 1 | | 362 | 1 | | 249 | 1 | +----------------+-----------+ 35 rows in set (0.22 sec)
-
Greg Hurrell
See http://fukamachi.org/wp/2008/09/19/couchdb-session-model-for-rails/ for discussion with Sho about Rails session storage.
-
Greg Hurrell
I'm going to trial the cookie-based store in my staging environment.
Not sure whether this issue will come up, but wanted to save the link here just in case I see any
TamperedWithCookie
errors pop up. -
Greg Hurrell
Yep, first page view was a
TamperedWithCookie
error, seen by the user as an internal server error. Second page view was fine. Existing login session continued uninterrupted because that's not stored in the Rails session anyway. -
Greg Hurrell
To address the
TamperedWithCookie
problem when I go live on the production server will do the following:- Deploy as usual, but before going live...
cap deploy:web:disable
-
Edit
session_key
(config file on server) - Go live as normal
This effectively invalidates the session, but given the things that are stored in the session (flashes etc) this is probably not a problem at all.
-
Greg Hurrell
Doing all this has made me realize that the stuff I am storing in the session is totally transitory in nature (redirection counts, "original URI" for login etc) and keeping hold of it for a month at a time is totally unnecessary.
The only thing that I really care about persisting is the login, and that was never stored in the Rails session. So I think I'm going to do some tests on purging the sessions table with really small record counts (1 hour of sessions, for example). Any user preferences which should persist for longer than an hour will go in the user table anyway.
-
Greg Hurrell
Will be trying out something like:
DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 1 HOUR);
Hoping that it is fast enough to run without slowing down other requests.
-
Greg Hurrell
One thing which I've discovered is that if you have a page open with a form, wait long enough for your session to be purged on the database, and then try to submit, you'll get the 422.html error page (internally,
InvalidAuthenticityToken
is raised).I'm going to have to clean up the 422.html error template so as not to confuse users and give them a helpful suggestion.
-
Greg Hurrell
I've looked at my web stats to see when the best time of day would be to do this pruning (in the case that I end up doing it once every 24 hours).
For wincent.dev:
- quietest hour 5:00 AM EST, traffic (hits) drops to 82% of baseline (average hits per hour)
- busiest hour 11:00 AM EST, traffic (hits) climbs to 120% of baseline
- record hour for last month, traffic rose to 301% of baseline
And for wincent.dev (non-Rails part of the site):
- quietest, 1:00 AM EST, 75% of baseline
- busiest, 15:00 PM EST, 128% of baseline
- record, peaked at 909% of baseline
So I'll be testing how slow pruning is once 24 hours of records have built up. If its really slow and could interrupt service or impact performance noticeably I'd prefer to take the hit once per day instead of every hour. But if its fast then I'd rather do it every hour (still keeping 24 hours of records).
-
Greg Hurrell
More test results:
mysql> truncate sessions; Query OK, 51 rows affected (0.03 sec) mysql> SELECT COUNT(*) FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR); +----------+ | COUNT(*) | +----------+ | 571 | +----------+ 1 row in set (0.00 sec) mysql> DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR); Query OK, 571 rows affected (0.75 sec) mysql> SELECT COUNT(*) FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR); +----------+ | COUNT(*) | +----------+ | 9 | +----------+ 1 row in set (0.00 sec) mysql> DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR); Query OK, 13 rows affected (0.03 sec) mysql> DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR); Query OK, 1920 rows affected (0.22 sec) mysql> quit
Summarizing the above, you have:
-
We
TRUNCATE
thesessions
table entirely, emptying it. - After just over 24 hours, we check to see how many "stale" sessions (sessions not active in the last 24 hours) exist (971 sessions).
-
We
DELETE
those sessions; the query takes 0.75 seconds. -
We immediately perform another
DELETE
to see if the query runs any faster when the number of stale sessions is lower (13 sessions that had become stale since the previous query); this time it takes 0.03 seconds. -
We wait about three hours and do another
DELETE
; this time we delete 1920 records in 0.22 seconds.
So on the basis of these figures, looks like it will be perfectly safe, performance-wise, to set up an hourly cron job that will prune old sessions (anything over 24 hours without activity).
-
We
-
Greg Hurrell
I have two options here for doing this database maintenance from a cronjob:
Either, set up a limited-privileges account solely for the purposes of cleaning up the database; something like:
mysql> GRANT SELECT, DELETE ON database_name.sessions TO purge_user@localhost identified by 'kewl_pass'; mysql> FLUSH PRIVILEGES;
And run this from cron using:
mysql -u purge_user -pkewl_pass \ -e 'DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR)' \ database_name
Or, set up a rake task to do the same using the normal database account already used by the application; something which basically does:
SomeModel.connection.execute 'DELETE FROM sessions WHERE updated_at < DATE_SUB(NOW(), INTERVAL 24 HOUR)'
Although I like the idea of keeping things all "in the app" and not having to create a separate user, I don't like the fact that breakage in the app could prevent the second solution from executing properly. Will have to think about it a little.
-
Greg Hurrell
Ok, in the end going with the second solution (the Rake task) because:
- keeps the maintenance code in the repository, so all code related to the project is together
- avoids creation of additional database account
- avoids duplicate storage of database password
- although loading the Rails environment is heavyweight just to run a tiny Rake task, the task only need run once per hour
- although app breakage will prevent the job from running, consequences of not running are insignificant
Add a comment
Comments are now closed for this issue.