MySQL Log Rotation
Ok. So I just looked at two blog posts about MySQL log rotation. While not technically incorrect, I’d like to disagree with the final answer, mainly because recreating the wheel drives me crazy.
Rather than writing a custom script to do this and putting that script in cron, why not use logrotate?
We already ship fully functional log rotate scripts in the Debian MySQL packages. Here’s an example:
/var/log/mysql.log /var/log/mysql/mysql.log /var/log/mysql/mysql-slow.log {
daily
rotate 7
missingok
create 640 mysql adm
compress
sharedscripts
postrotate
test -x /usr/bin/mysqladmin || exit 0
# If this fails, check debian.conf!
export HOME=/etc/mysql/my.cnf
MYADMIN="/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf"
if [ -z "`$MYADMIN ping 2>/dev/null`" ]; then
# Really no mysqld or rather a missing debian-sys-maint user?
# If this occurs and is not a error please report a bug.
if ps cax | grep -q mysqld; then
exit 1
fi
else
$MYADMIN flush-logs
fi
endscript
}
This does the re-creation, flush logs and handles failure conditions gracefully. It also compresses old files. Even better - your system already has logrotate running (unless you run a really bad distro)
Now, if you aren’t on Debian, don’t just copy this verbatim, as you probably don’t have a debian-sys-maint user or an /etc/mysql/debian.cnf file. In Debian, we make this at package install time and refresh it on upgrades. It’s a system user with an autogenerated password put in a file only readable by root, that allows system scripts like this to run unattended.
Once you read up a little on logrotate, you’ll realize it’s really great at just about anything you need to do in the log rotation field. Have fun!
Gambling and calling it investing
I was reading comments on an article at BusinessWeek about Sun and Jonathan Schwartz, when I came across this gem:
Dude, my son’s college funds are tied up in your stock. He keeps getting older but the stock is still hanging. I have two years for that stock to rocket.
Aside from anything useful I might have to offer Sun… I would like to poke this guy in the eyeball. Why would you ever tie up the entirety of something as important as your son’s college fund in a single stock? That’s stupid. As if diversification weren’t a well-known concept already, I would have thought that maybe people would have learned from Enron. Perhaps this is a dedicated employee trusting in and investing in the company he works for. The really naive, since a tank in the company (which happens) means not only is your job but also your source of investment in jeopardy. (That’s what happened at Enron) Or, maybe it’s just a fanboi… not sure.
But please, PLEASE, don’t blame Sun or Jonathan Schwartz if your kids can’t go to college. I recommend talking to a investment broker or something.
Dude.
Powered by ScribeFire.
A little bit of untruthiness about MySQL and Threads
Curt Monash has an interesting post. As with everything on the web, I agree and disagree with various bits - which is one of the great things about blogging … but this comment concerns me enough to respond, since it’s not so much an opinion but incorrect technical information…
As for your distinction between too many connections and memory pools — lack of memory is the reason for limits on connections.
Lack of memory is actually _NOT_ the reason for limits on connections in a well tuned MySQL installation. MySQL is a multi-threaded application and as such in its current implementation allocates a thread to a connection. In the standard web hosting platform, which is Linux, there is a point at which Linux itself can’t actually deal with the threads effectively, thus getting you into an overloaded run queue situation. The answer to high-traffic and high concurrency situations is actually often to lower the number of connections to decrease the amount of time linux spends managing the thread queue and thus lowering your individual thread service time. People fight this a lot, since they think “I’m maxing out my connections, I should increase the setting”, but it is quite easy to demonstrate.
There is a patch coming up to decouple the two which should help the few people out there who cannot, for whatever reason, manage a sensible number of connections. But it is very rare that I see a client for whom this “problem” can’t be managed or fixed by simple education.
In any case, the issue is almost never (I say almost because you can always grossly misconfigure anything) memory related.
What’s more, my web host, who to my knowledge doesn’t handle any terribly busy sites, finds that MySQL will at times eat up all his RAM even so.
In that case, I would suggest that you tell your web host to learn how to configure MySQL properly. MySQL will only eat up all his RAM if he configures it to do so. If he isn’t sure how to do that, I’d be happy to help him out!
I would also like to add that there is often an unfair double standard applied to MySQL as far as this goes. No one seems to doubt that a well functioning Oracle system needs a team of well-trained Oracle DBAs to tune and run. But many times when I suggest that a problem can be solved by learning a little more about how MySQL works, people tell me that I have just some how pointed out a deficiency in MySQL. Try running an top-10 web property with 100+ Oracle databases with a team of 1 DBA and see how long it remains running. I’d be happy to take the challenge of running the same thing on MySQL.
Technorati Tags: mysql threads performance
A good build system is hard to find
I was just reading Jan’s post about how to write a good build system
I’m in the need for a portable, fast build-system.
While I’m not about to write a build system in lua - or anything else, I thought I’d pitch in my thoughts as I’ve been having a similar problem recently. The problem is, autotools is the only system that doesn’t require some prereq to be installed on the system … other than a working build system. The only real downside to autotools (other than needing to learn m4) is that it is useless on Windows.
As Jan says, though, the other systems fall down on the make dist part. This is way important to me, as I also work in my tree alot and I really want to have an automated step to make sure I’m packaging up all the files I need, but not too many. This leads Jan to the inevitable choice of building his own
As a matter of fact here I stand without build-system that meets my needs. My current target is building the MySQL Proxy on Linux, MacOS X and win32.
Throwing lua into the game as a nice side-effect: the scripting language is clean and simple. You can also think about adding the lua-src itself to the build-system to do a full bootstrap in case the user doesn’t have it install
First problem - if the user doesn’t have the build system, and if autotools won’t cut it, then how are they going to build the lua-src.
I’d like to point out that most of this is because Windows is crap. The autotools design goal of not requiring user to install a special build system other than Make is a great one. But because Windows can’t even manage to ship a decent POSIX environment, this is broken. Ever tried sharing source with someone who’s using VisualStudio? Ever get pissed off because there’s no way to have the source figure out where a library dependency is without setting some property in the source by hand?
Anyway - I do think Jan is on to something (as usual) but I don’t think the whole system needs to be rewritten - just autoconf.
Automake runs on the maintainer’s machine, as does autoconf, but the automake output are the Makefile.in files that autoconf replaces stuff in. There’s nothing really broken about this. make, make dist, make distcheck should all still work (ok, except on Windows where they don’t have tar… my sympathy is low) But if you made a program in lua that created something like a configure script that could be run with minimal install requirements on the client (I think needing lua is ok here - as Jan says, it’s easy to install)
That leaves the requirement of producing MSI targets. I think this is a requirement that should be shucked, because it’s not an apples/apples thing. MSI files are the same as RPMs or DEBs. They are packaging, not part of the build, and I think if emitting them is part of the needs, then the build system also needs to be able to make debs. Divorce the work of making MSI files from the build and I think the task is really doable in the short term.
I think that the m4 stuff in autoconf is the only part of autotools that people really get annoyed with. So kill it. Replace it with lua. Or python. Or anything that’s cross platform and low impact. (and by that I do not mean Java - sorry guys, the nightmare that is installing Java environments is a subject of an upcoming rant)
Powered by ScribeFire.
erlang and MySQL Cluster
Ok, in case you just showed up going “finally!”, I’m sorry to let you down - I haven’t yet ported NDB API to erlang.
But I should - and I want to.
Brian was just talking about concurrent program and mentioned erlang. Turns out that when I was starting off working on the NDB/Connectors, Elliot asked me if I’d considered erlang. Always up for learning a new language I did a quick check, but there were no swig bindings, so I put it off until later.
Then later came and I still hadn’t written any code, so I found a book online and started reading. I have to say erlang is very cool.
There is no way on earth I can wrap the NDB API in any meaningful way using erlang. However, I might be able to reimplement the wire protocol in erlang and have the resulting thing be way more stable and scalable. Thing is - it really made me thing we should reimplement all of NDB in erlang. It handles several of the tasks done by the NDB system already by itself - and it’s not C++ so it doesn’t crash and die every three seconds. On the other hand, I don’t think it’s as fast as C++ yet - so our speed-hungry friends might not want to play.
However… hey Brian - wanna sit down an write a new clustered database engine for MySQL using erlang? Could be fun…
Powered by ScribeFire.
MySQL 5.1 auto-inc patch in action: InnoDB scalability test
I’ve recently done a quick scalability test with MySQL 5.0 and 5.1 to check the new auto-inc patch with InnoDB and to see how MySQL 5.1 scales with InnoDB:
New in MySQL 5.1: innodb_autoinc_lock_mode = 1 (“consecutive” lock mode)
With this lock mode, “simple inserts” (only) use a new locking model where a light-weight mutex is used during the allocation of auto-increment values, and no AUTO-INC table-level lock is used, unless an AUTO-INC lock is held by another transaction. If another transaction does hold an AUTO-INC lock, a “simple insert” waits for the AUTO-INC lock, as if it too were a “bulk insert.”
http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html#innodb-auto-increment-configurable
This fixes the Auto-Inc bug with InnoDB
I’ve run sysbench in OTLP mode against MySQL 5.0.45 and 5.1.22-rc. Insert queries with auto_inc field were running in multiple threads: 1, 4, 16, 64, 128 and 256 threads on 4 cores box. Benchmark is CPU bound (except for log flushing/fsync).
Results
With 256 threads MySQL 5.0 starts deadlocking and MySQL 5.1 shows no deadlocks.
However, MySQL 5.1 is still affected by broken group commit bug for InnoDB, so enabling bin-log and using innodb_flush_log_at_trx_commit = 1 with no battery backup cache caused significant decrease in performance: tests showed only around 100 tps constantly with MySQL 5.1 and MySQL 5.0. Waiting on InnoDB logs/bin logs serialized transactions; with this situation speed depends upon hardware. After enabling Battery Backup Cache (BBU cache) on RAID we were able to handle around 2000 tps.
PeterZ originally performed InnoDB scalability tests for MySQL 5.0 to show broken group commit bug.
My fun with MySQL on EC2
Morgan is looking in to EC2 on MySQL, so I thought I’d pipe up about stuff I’ve been playing with.
The ephemeral nature of the data is troubling, because at best you’re going to have some lag before you can back stuff up to S3 or some other place. (Unless that was happening continuously… but we’ll come back to that) On the other hand, if you’re doing app sharding or something similar, this essentially just makes you plan that your machines can all die at any time. If you used Google’s semi-sync replication patch, you could easily spin up little replication clusters as needed.
Hm. Clusters. Well, I’m also a fan of MySQL Cluster. What if you ran MySQL Cluster on a single ec2 node (both data and sql nodes)? What if, further, you wrote (and by you, I mean me… code coming soon, I promise) an AsyncFile implementation for Cluster that read and wrote to S3 instead of local disk. Cluster itself is already decoupled from disk write latency. Sounds like a good UC talk…
Then you could do the same thing with a multi-node cluster, but Amazon doesn’t let you control the network between EC2 nodes, so the latency there could kill you.
I have put together a few scripts for spinning up an EC2 node with MySQL (and Cluster) ready to go, but I’m sitting in an airport right now, so I’ll have to post the code later.
I think the possibilities for scale-out here are fantastic, but like all application partitioning approaches, they do require some engineering of the application to take advantage of it.
Vlad The Enterprising: Automatic deploying large number of MySQL Slaves
This tool allow to deploy large number of MySQL replication Slaves (and Master servers)
It is available on Ruby Forge: http://rubyforge.org/projects/vladenvironment/ The tool is based on Vlad the Deployer and is written on Ruby
Running more than one ndbd on a machine
Personally, I’m not a fan of more than one ndbd per machine…
Diamond Notes » Fun with Running a Cluster on Two Servers
Others might argue with this, but I would never put the SQL nodes on the same servers as the ndbd nodes for production. Some say you can run multiple ndbd nodes on the same server and I am more comfortable with that since I can lock the ndbd daemon into memory and know its not going to change (my ndbd nodes on those two servers have been at exactly 71.3% since I started them up. If I had servers for the ndbd nodes that had 16+ gigs of RAM I might start allocating 4 gigs of RAM to a ndbd daemon with 3+ daemons per node. My understanding is that this helps keep the transactional logs for the nodes under control. When you do a ndbd node restart it takes less time for a node to get up and running because of the smaller files to read. I might be mistaken and its too late for me to look it upAnyone got other reasons or maybe (if I am right) someone can elaborate.
First of all, I’m very excited to see that Cluster is being used for MogileFS here. I’ve actually been thinking it might be fun to write a MogileFS::Store::NDB class to use NDB/Perl … but that’s another story.
One of the things that’s nice about NDB is that it spread across mutliple machines. Now, granted, in development we can’t always do this. I run NDB on my laptop all the time, so I certainly feel the pain there. But with a multi-node design, I say take advantage of it. People on other architecture are always asking how they can spread the load more easily across multiple machines, and here you can. If you need 8 data nodes, get 8 machines.
Practically, there is another reason. If you put 3 data nodes on a single physical machine, then if that machine crashes, you have not a single node failure, but 3 nodes failing at the same time. Although there is no specific reason that this won’t work, it’s also essentially an edge case and not really tested all that well. Other people may disagree with me here, especially as currently running multiple ndbd’s on a single box is the only way to take advantage of more that 2 CPU cores, but I’m just not a fan. For something like this, go get a bunch of 2 CPU boxes.
To address the larger question, though, you can certainly spread the write load and the redo log burden by having more data nodes, and having the data distributed across more data nodes will certainly make the log recovery shorter on any one given node. However, there is a price to pay here in query latency. The more data nodes you have, the more nodes your data might be on. If you are not using the NDB API (or any of the NDB/Connectors) then there is no optimized node selection going on. (I’ve heard someone is working on an Optimized TC Selection patch for mysqld, but it’s not in the mainline yet) So if you go from 2 nodes to 4 nodes, you went from a 100% chance that the TC will have your data and not have to ask someone else, to a 50% chance. At 8 nodes you are at a 25% chance that the TC selection will select a node with your data. That’s for primary key operations. If you’re doing a scan, then your data is going to (most likely) be on all of the nodes, which can be good or bad. But if you’re doing MogileFS queries, perhaps the extra milliseconds of latency isn’t a concern in this case and you’d rather have faster node recovery. I’d test that hypothesis out and see how much better the recovery is.
Another potential reason to have more data nodes even with the higher latency costs (and extra network traffic as that many more nodes have to talk to that many more nodes) is scalability. Of course, more nodes mean more scaling. But as of now ADDING a data node is not an online operation. Adding more memory can be done in a rolling fashion. So if you think you might need 8 data nodes worth of CPU processing, go ahead and get 8 data nodes with 4G of RAM a piece. Then as you need to store more data, stick in more RAM. Before long, you’ll have a nice 256G system…
SO… as with everything cluster, there is certainly room on both side of the argument as to what’s best here. And as always, testing is the best bet to see how it maps to your environment.
(Also, it’s 4AM at the moment, so please forgive me if this rambles a little.)
Join Syntax changes in 5.0 (not a bug)
Our friend Dathan recently suggested that The Quality of mySQL lately sucks.
I’ve just ran into yet another obvious bug that has made it’s way into production.
While I’m not going to argue that more unit tests would be a good thing, I’d like to point out that what he’s referring to is not so much a bug as it is a change that happened (and was documented) in 5.0.12:
Beginning with MySQL 5.0.12, natural joins and joins with USING, including outer join variants, are processed according to the SQL:2003 standard. The changes include elimination of redundant output columns for NATURAL joins and joins specified with a USING clause and proper ordering of output columns. The precedence of the comma operator also now is lower compared to JOIN, LEFT JOIN, and so forth.
These changes make MySQL more compliant with standard SQL. However, they can result in different output columns for some joins. Also, some queries that appeared to work correctly prior to 5.0.12 must be rewritten to comply with the standard. For details about the scope of the changes and examples that show what query rewrites are necessary, see Section 12.2.7.1, “JOIN Syntax”.
What this means is that the query in the bug report:
SELECT p.id, gt.object_id FROM Photos p, PhotosExtra px LEFT JOIN GeoTagged gt ON
gt.object_id=p.id WHERE px.photo_id=p.id AND p.id = 2173;
should be rewritten to either:
SELECT p.id, gt.object_id FROM PhotosExtra px, Photos p LEFT JOIN GeoTagged gt ON
gt.object_id=p.id WHERE px.photo_id=p.id AND p.id = 2173;
or even:
SELECT p.id, gt.object_id FROM , Photos p LEFT JOIN GeoTagged gt ON
gt.object_id=p.id JOIN PhotosExtra px ON px.photo_id=p.id WHERE p.id = 2173;
This is a common thing people have to deal with when upgrading to 5.0.