July 27th, 2010
Me: can I have a Whopper meal with diet coke, but with onion rings instead of chips?
BK: sorry, we cannot do that
ME: Ok, can I have a whopper burger on its own, diet coke, and onion rings?
BK: Ok
…
Price of chips: 1.99
Price of onion rings: 1.99
WTF?
Categories: personal
Tags: being a dumbass
June 6th, 2010
I have launched a new Ruby on Rails website, ChatUp.com. This is a personal project that i’ve been working on in my spare time with a little help from my girlfriend, and it’s great to finally have it out there.
I’ve got some really cool stuff integrated, including Facebook authentication using the new OAuth2 API and HTML5 GeoLocation. Doing GeoLocation *well* and having it work for any user of the site turned out to be a bit tricky, i might post more on that later. Back end searching is done using the latest Sphinx search daemon and ThinkingSphinx plugin.
There are still 101 things I want to improve, i’m working on getting a new design for the blog sorted right now, but as an initial version 1 it does all the things you would expect of any dating site.
I’m going to be writing about my efforts to build out the site further, and probably more importantly how I actually am getting traffic to the website (let’s face it, that’s one of the hardest things to do, programming is easy).
If you want to follow the project along, please check out the ChatUp Blog and grab the RSS feed.
Please check it out, Feedback welcome!
Categories: personal
Tags: chatup, interesting ideas, personal
June 3rd, 2010
This is part 2 of a 3 part series on getting the best performance out of Sphinx / Thinking Sphinx. Subscribe to my RSS feed for the last installment..
The previous article was How to configure Thinking Sphinx to index from your Slave MySQL database.
—-
If you have multiple indexes, you need to consider if they all need to be re-indexed at the same time.
For example, you might have a “users” index that changes frequently that you want to re-index from scratch or merge delta changes, and a “spare car parts” index that may only change once or twice a day.
The accepted way to perform a re-index your data is of course to run “rake ts:index” which will re-create ALL of your indexes, but that assumes all indexes are equal and all need re-indexing at the same time – usually that is not true. When you have to re-index a *huge* index when all you really want is to update the smaller one… it’s inefficient.
So, what does rake ts:index actually do?
The thinking sphinx index task works out what config file to use based on your rails environment and Rails.root directory, re-generates the config file, and then calls the sphinx indexer program with a link to the config file.
The indexer program has a whole range of options which you might like to get familiar with:
$ indexer
Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff
Usage: indexer [OPTIONS] [indexname1 [indexname2 [...]]]
Options are:
--config read configuration from specified file
(default is sphinx.conf)
--all reindex all configured indexes
--quiet be quiet, only print errors
--noprogress do not display progress
(automatically on if output is not to a tty)
--rotate send SIGHUP to searchd when indexing is over
to rotate updated indexes automatically
--buildstops
build top N stopwords and write them to given file
--buildfreqs store words frequencies to output.txt
(used with --buildstops only)
--merge
merge 'src-index' into 'dst-index'
'dst-index' will receive merge result
'src-index' will not be modified
--merge-dst-range
filter 'dst-index' on merge, keep only those documents
where 'attr' is between 'min' and 'max' (inclusive)
--merge-killlists merge src and dst killlists instead of applying src killlist to dst
Examples:
indexer --quiet myidx1 reindex 'myidx1' defined in 'sphinx.conf'
indexer --all reindex all indexes defined in 'sphinx.conf'
We are interested in indexing only specific indexes at specific times, following from the example above, the “spare car parts” index, instead of running rake ts:in from cron to re-index everything every hour (for example), you can split those indexes out to separate tasks like this:
indexer --config /var/www/your-website.com/current/config/production_slave.sphinx.conf spare_car_parts_core --rotate
indexer --config /var/www/your-website.com/current/config/production_slave.sphinx.conf users_core --rotate
Put those into cron with your desired time schedule, and you will have a happier database server. Important to note is that you must include the _core part, as this is how the ThinkingSphinx gem names the index in the sphinx configuration.
When you deploy remember to run “rake ts:config” with the correct Rails environment to generate your config.
Categories: performance
Tags: performance, sphinx, thinking sphinx
May 31st, 2010
This is part 1 of a 3 part series on getting the best performance out of Sphinx / Thinking Sphinx. Subscribe to my RSS feed for the next 2 installments.
——
For one of the sites I work with, we are running a load balanced web server setup with separate MySQL database servers in master-slave configuration on the back end. Each of the web servers run their own Sphinx searchd daemon (to reduce latency of client connection/queries) using a non-distributed sphinx index.
While looking at our Munin graphs I was concerned to see the amount of bandwidth and disk IO on the master MySQL database server caused by periodic re-indexing, and that load increases in line with the number of front-end web servers currently in use.
So that got me thinking… these are huge read-only queries, a small time lag is acceptable, they *should* be happening on the Slave MySQL database.
ThinkingSphinx does not have any official documentation for how to configure this, so here is the secret to setup the Sphinx indexer to use your slave MySQL database…
1/. Create a Read Only MySQL User
On your Master MySQL server create a read-only MySQL user if you do not have one already. It will be replicated automatically onto the slave servers. This is just good practice, you do NOT want any changes accidentally made on your slave databases or you run the risk of breaking MySQL replication.
2/. Create new production_slave environment
We will use a separate Rails environment to hold our slave database and sphinx configuration, but you need to get that environment working. To do this in your rails project, copy config/environments/production.rb to config/environments/production_slave.rb.
3/. Configure database.yml
In your database yml, create a new entry for your production_slave environment, pointing to your slave MySQL database, and use your read-only MySQL user.
4/. Edit your config/sphinx.yml file
Take a copy of your production section and duplicate it under production_slave. It is important that you use the same port number and settings of the production environment.
5/. Commit your changes and deploy live
Push our your code to the servers, so your servers now have access to your new production_slave environment. Edit your yml configs on the servers if your deploy process does not mange this for you.
6/. Setup Sphinx Indexer for new Environment
The sphinx indexer should now use the RAILS_ENV=production_slave. As this is the first time you should now run “RAILS_ENV=production_slave rake ts:in” which will automatically generate a new config/production_slave.sphinx.conf file for you that searchd and indexer can use, and generate your first set of indexes in db/sphinx/production_slave/.
7/. Restart Sphinx
Stop your production sphinx instance, and restart sphinx with the production_slave environment. It needs to read the same config file as your indexer in case of future schema changes etc.
8/. Update RAILS_ENV for sphinx elsewhere
This will largely be dependent on your setup, but you need to change your cron-jobs, init.d scripts, deploy scripts, process monitoring etc to use the production_slave environment for your sphinx daemon and sphinx indexing tasks.
That’s it!
The reason this works it because of step #4, where we use the same host & port for our sphinx server for both the production and production_slave environments. Although your app continues to run as production, your sphinx server runs as production slave on the port expected in the production environment so you are now querying the sphinx daemon that is indexed from the slave MySQL database.
Categories: performance
Tags: performance, sphinx, thinking sphinx
April 18th, 2010
FFMpeg has multiple problems and bugs with writing out to SWF files, I run into a few of these at my day-job and wrote a patch for one of the problems and submitted it back to FFMpeg. It was a nice change to go back and do some C programming for a while.
Anyway, list of ffmpeg swf audio problems as follows:
1) Converting to SWF was hard coding the number of audio frames to 6000 in the generated SWF file, whereas the SWF Format specification document from adobe says this must match the number of frames in the file. It was also hard-coding an arbitrary file size into the SWF header as well, rather than the correct file size of the generated file. These are the problems I fixed.
2) FFMpeg SWF encoder writes too many frames to a file. Yeah, really. It writes all of your audio, then a few minutes of empty sound as well. Haven’t got a fix for it yet, need to step through an FFMpeg run in GDB to figure out why it’s doing it. You can open a SWF in a binary editor and manually fudge the number of frames in the SWF Header if you have a scrobbler that is dependent upon the number of frames to determine play duration / progress.. or do something a little more automated.
3) The SWF encoder in FFMpeg is writing out audio only streams using version 4 of the SWF file format, where as the file format specification is now up to version 10. Not sure why it’s writing out in such an old version (maximum compatibility?). Could probably use some documentation.
This was the patch I submitted for problem #1.
http://lists.mplayerhq.hu/pipermail/ffmpeg-devel/attachments/20100415/c02bf648/attachment.obj
——
On the mailing list:
Problem Description:
When converting any Audio to SWF, the File Size is always hard-coded
as 104857600, and Frame Count is always hard coded to 6000. The
incorrect frame count can cause problems when ffmpeg generated swfs
are used inside a flash application and the _totalframes method is
used.
Example command line:
ffmpeg -i 157346.mp3 -ar 44100 -ab 96k -ac 1 -y 157346.swf
When inspecting the generated swf using the swfdump tool
(swftools.org), the following is shown:
==== Error: Real Filesize (1495887) doesn’t match header Filesize
(104857600) ====
[HEADER] File version: 4
[HEADER] File size: 104857600
[HEADER] Frame rate: 10.000000
[HEADER] Frame count: 6000
[HEADER] Movie width: 320.00
[HEADER] Movie height: 200.00
[02d] 6 SOUNDSTREAMHEAD2
[013] 317 SOUNDSTREAMBLOCK
[001] 0 SHOWFRAME 1 (00:00:00,000)
[013] 318 SOUNDSTREAMBLOCK
[001] 0 SHOWFRAME 2 (00:00:00,100)
………
[013] 318 SOUNDSTREAMBLOCK
[001] 0 SHOWFRAME 4595 (00:07:39,382)
[013] 317 SOUNDSTREAMBLOCK
[001] 0 SHOWFRAME 4596 (00:07:39,482)
[000] 0 END
As you can see, the last frame is 4596, and the file size specified in
the header does not match the real file size (swfdump gives a warning
about this).
File size should indicate 1495887 bytes:
$ ls -al 157346.swf
-rw-r–r– 1 boomkat default 1495887 Apr 14 11:25 157346.swf
$
Problem Solution:
Correctly set the File Size in the SWF header section to the real
number of bytes written, and correctly set the number of frames
included in the SWF.
Details of the SWF file format header information can be confirmed on
page 25 of the SWF File Format Specification document from Adobe,
available here: http://www.adobe.com/devnet/swf/
After my patch is applied, using ffmpeg with the same command line
options as above and then inspecting the generated swf with swfdump
shows:
[HEADER] File version: 4
[HEADER] File size: 1495887
[HEADER] Frame rate: 10.000000
[HEADER] Frame count: 4596
[HEADER] Movie width: 320.00
[HEADER] Movie height: 200.00
[02d] 6 SOUNDSTREAMHEAD2
[013] 317 SOUNDSTREAMBLOCK
[001] 0 SHOWFRAME 1 (00:00:00,000)
[013] 318 SOUNDSTREAMBLOCK
[001] 0 SHOWFRAME 2 (00:00:00,100)
………
[013] 318 SOUNDSTREAMBLOCK
[001] 0 SHOWFRAME 4595 (00:07:39,382)
[013] 317 SOUNDSTREAMBLOCK
[001] 0 SHOWFRAME 4596 (00:07:39,482)
[000] 0 END
Note that the Frame Count is correct, and the file size is correctly
reflects the total number of bytes in the file:
$ ls -al 157346.swf
-rw-r–r– 1 boomkat default 1495887 Apr 15 06:58 157346.swf
$
Categories: Uncategorized
Tags: bugs, ffmpeg, swf