I divide my time about equally between GeneSpring GX 11 from Agilent and BioConductor (free, from awesome people) for microarray analysis. The latter for all the neat tools that GeneSpring doesn’t have, the former because sometimes it’s nice to lead a researcher visually through their data, without having to type into a green on black terminal window.
GeneSpring GX 11 is the third iteration after Agilent bought up Silicon Genetics, then decided to throw the unwieldy, quirky, but very functional GeneSpring product in the trash and start again with something built on Strand Life Sciences AVADIS platform. We’ve been through versions 9, 10 and now we’re on 11. There’s been plenty of bugs on the way, the most serious (to me) being the one where GeneSpring 10 managed to miscall the quality flags on Illumina data in 50% of the cases. Not good, but at least fixed.
Many people have been griping on mailing lists about functionality missing in the new GeneSpring that existed in the old version. I always think it’s a matter of familiarity with the software really. I hadn’t really come across anything I couldn’t do in GeneSpring 11 that I could do in GeneSpring 7. That was until yesterday.
I sat down with a customer yesterday to look at some microbial Nimblegen data. GeneSpring doesn’t really deal with Nimblegen data very well, you are left with the choice of not analysing it in GeneSpring, or accepting there’s going to be a bit of fudging and some extra annotation steps in order to make the data useable as a ‘Custom technology’. The customer, quite reasonably, asked if we could get the biological genome information (effectively gene annotations that are independent of the chip technology you’re using) loaded into GeneSpring. And thus started a morning of fun and games.
GeneSpring 11 has a very handy import feature for biological genomes under Annotations>Create Biological Genome. That is providing you want to choose one of their predefined organisms to download the information from NCBI. There is *NO* route in the software to add another organism to this list, or do anything than use one of their check box limited organisms. This is not a bug apparently, because in a separate part of the software (dealing with Pathways for an organism) you can pull this information directly from NCBI using the Taxon ID of the organism you’re interested in. So why can’t you use it to download a biological genome? Who knows…
One of the things I really liked about the old GeneSpring was the fact that it came with a manual a foot thick. It told you how to do every single operation in the UI, it didn’t tell you anything about the order in which to apply them, but you could generally rely on it for an answer. There was no such answer to this issue in the GeneSpring manual..
It transpires that if you really want to do this, the following, slightly insane process needs to take place:
1) Take this snippet of XMLishness:
<hexff version="1.0">
<object type="cube.plugin" version="1.0" handle="0">
<key>type</key>
<string>plugin.product.TaxID</string>
<key>id</key>
<string>TaxID</string>
<key>data</key>
<!-- taxID.xml for various organisms {{{ -->
<dict>
<key>Homo sapiens</key><string>9606</string>
<key>Mus musculus</key><string>10090</string>
<key>Rattus norvegicus</key><string>10116</string>
<key>Anopheles gambiae</key><string>7165</string>
<key>Arabidopsis thaliana</key><string>3702</string>
<key>Bacillus subtilis</key><string>1423</string>
<key>Bos taurus</key><string>9913</string>
<key>Caenorhabditis elegans</key><string>6239</string>
<key>Canis lupus familiaris</key><string>9615</string>
<key>Citrus sinensis</key><string>2711</string>
<key>Danio rerio</key><string>7955</string>
<key>Drosophila melanogaster</key><string>7227</string>
<key>Equus caballus</key><string>9796</string>
<key>Escherichia coli</key><string>562</string>
<key>Felis catus</key><string>9685</string>
<key>Gallus gallus</key><string>9031</string>
<key>Glycine max</key><string>3847</string>
<key>Gossypium hirsutum</key><string>3635</string>
<key>Hordeum vulgare</key><string>4513</string>
<key>Macaca mulatta</key><string>9544</string>
<key>Magnaporthe grisea</key><string>148305</string>
<key>Medicago sativa</key><string>3879</string>
<key>Medicago truncatula</key><string>3880</string>
<key>Nicotiana tabacum</key><string>4097</string>
<key>Oryctolagus cuniculus</key><string>9986</string>
<key>Oryza sativa</key><string>4530</string>
<key>Ovis aries</key><string>9940</string>
<key>Pan troglodytes</key><string>9598</string>
<key>Plasmodium falciparum</key><string>5833</string>
<key>Pongo abelii</key><string>9601</string>
<key>Poplar mosaic virus</key><string>12166</string>
<key>Populus sp.</key><string>3697</string>
<key>Pseudomonas aeruginosa</key><string>287</string>
<key>Saccharomyces cerevisiae</key><string>4932</string>
<key>Saccharum officinarum</key><string>4547</string>
<key>Salmo salar</key><string>8030</string>
<key>Schizosaccharomyces pombe</key><string>4896</string>
<key>Staphylococcus aureus</key><string>1280</string>
<key>Sus scrofa</key><string>9823</string>
<key>Takifugu rubripes</key><string>31033</string>
<key>Lycopersicon esculentum</key><string>4081</string>
<key>Triticum aestivum</key><string>4565</string>
<key>Vitis vinifera</key><string>29760</string>
<key>Xenopus laevis</key><string>8355</string>
<key>Xenopus tropicalis</key><string>8364</string>
<key>Zea mays </key><string>4577</string>
</dict>
<!-- }}} -->
</object>
</hexff>
2) Add an entry
<key>Your organism name</key><string>NCBI Taxon ID</string>
after the Zea mays line
3) In your GeneSpring directory under this tree:
GeneSpring GX11\bin\packages\marray\project\2.1
Create a folder called ‘plugins’ and save the edited XML above as a file called TaxID.plg
4) Restart GeneSpring and proceed to update your newly added Biological genome, which now appears in the list!
Actually, I have to say, I’m not sure I ever want to see that in a manual of a piece of software as expensive as GeneSpring… And besides this still doesn’t work for me as advertised because GeneSpring, whilst aware of what an HTTP proxy might conceivably be, has no concept of what an FTP proxy might be – which is problematic when you need to connect to ftp.ncbi.nlm.nih.gov. Brilliant!
No Comments »
I will eventually write a blog post about Team:Newcastle’s trip to the fantastic and fun iGEM Jamboree in Boston, but in the meantime I spent a while yesterday reviewing some of my photo’s from the event.
These are now on Flickr and I apologise for the lack of quality, shooting indoors at MIT with no flash and shaky hands – but I hope they convey some of the spirit of the 2 days we spent there.
http://www.flickr.com/photos/eridanus/sets/72157622760324890/
We won a Gold Award!
No Comments »
Posting this for an old colleague of mine from my PhD days. I can’t believe I have colleagues that date back to 1995…
Enquiries about the position to:
Liz Worthy, Senior Staff Scientist at the HGMC (eworthey@mcw.edu)
DESCRIPTION:
A Postdoctoral position is now open within the Human and Molecular Genetics Center (HMGC) at the Medical College of Wisconsin (http://www.mcw.edu/HMGC.htm). HMGC scientists work at the interface of genomics, bioinformatics, and clinical research using genomic approaches to understand disease, and translating this information from the laboratory bench to patients in our affiliated hospitals. The centers close affiliation with clinicians and researchers at nationally and internationally acclaimed medical centers and universities provides our researchers with a unique opportunity to impact patient care. Diseases under study are varied and include: end stage renal disease, hypertension, insulindependent diabetes mellitus, acute liver failure, neurocognitive disabilities, cancer, cystic fibrosis, metabolic syndrome, hereditary cataracts, myocardial infarction and various cardiac malformations. A variety of core resources exist within the HMGC including: Bioinformatics, High-throughput sequencing, Genotyping, Microarray, Proteomics, Gene Therapy, and Transgenics. The HMGC consists of 28.
Read the rest of this entry »
1 Comment »
So this evening I went to my first SuperMondays event. What is SuperMondays you ask? Well it’s a social networking event for geeks in the North East.
One of the things I’ve always been vaguely jealous of is the amount of these kinds of events that seem to exist in the USA – there’s a meetup for everything whether you’re interested in tech, science, hacking, or publishing. People get together, talks are given, people interact over food or a coffee (or a beer if you’re lucky).
I used to go to 2600 and alt.ph.uk meetings back in my impressionable younger days, so outside of scientific conferences this is the first opportunity I’ve taken to sit in a room with a bunch of like minded people outside of my day to day work to chew the fat on tech for an awfully long time. This months theme (for the meetings are most definitely monthly) was databases. Now I can’t get terribly excited about databases per se – SQL is fugly, I prefer MySQL over PostgreSQL for ease of use rather than functionality and these days if I could do it in SQLite I probably would, but nevertheless there was a really nice series of three talks in this themed session.
Ross Cooney (SuperMondays organiser extraordinaire and @rosscooney on Twitter) gave a speedy history of the database world, and a quick reminder of the things I have already forgotten about databases after not doing a lot of db development recently (like what ACID stands for – no it’s not an HTML compliance test, or a drug (you crazy Berkeley hippies)) and introduced the other two speakers for the evening.
David Lavery followed next (@dlavery62) with a review of both SimpleDB from Amazon Web Services and Google BigTable two cloud offerings for the post-RDBMS database world. I particularly enjoyed the SimpleDB part of the talk, anything delivered via a RESTful interface (don’t bother trying to convince me it’s not really RESTful, I could not care less) looks like a good thing to me after trying to deal with the SOAP webservices world last year.
The final talk was of a far more academic slant with David Livingstone of Northumbria University who presented RAQUEL which is an open source implementation of some of the ideas in The Third Manifesto, which appears at first glance to be an ‘RDBMS done right’ according to modern relational theory (and not affected by legacy cruft from current popular SQL implementations). Part middleware, part programming language, part educational tool I would like to have heard a little more about the implementation here. We were treated to a lot of syntactical details (which had me in mind of a cross of SQL, Perl and R and therefore maybe not something you would want to necessarily spend all day doing), but they’ve only just released this to the world and are looking for people to engage and interact with their foray into OSS development. It certainly generated the most questions from the gathered geeks!
After these a roadmap for the future SuperMondays was presented. Although this was my first SuperMonday event, it was in fact their 12th. It may have started in a (very nice!) restaurant in Newcastle a year ago around a table, but there were maybe 80 people in the theatre tonight which suggests it is going from strength to strength. Newly incorporated as a Community Interest Company (saving buckets of paperwork over being a charitable organisation) the future for SuperMondays looks very bright indeed. Very much looking forward to the next one!
Yeah, there’s no oxymoron of a face to face geek event, but if you only saw the tagline in your RSS reader maybe you read a little further because of it ;) I should also say cheers to the Newcastle ARCSOC students who I had a couple of drinks with afterwards too (depriving myself of further SuperMondays sandwiches in the process), it was nice to see you all again!
You can also find SuperMondays on Twitter (@supermondays) and on Facebook too!
4 Comments »
It’s no secret that my desire to blog was interrupted by the arrival of FriendFeed and Twitter in my daily routine. The former allows me to aggregate not only content from those people I really like to follow online, but also allows me to push most of my RSS-isable online content to the same place. I should probably just push most of it here (the flickr tab is a new introduction, as is the Twitter feed sidebar on this blog) but I haven’t found a sufficiently pleasant life-streaming theme for WordPress yet. Twitter on the other hand really does allow me to push my ramblings, day to day thoughts, and status updates to the world in a way that is much more immediate (and in some ways pleasing) than blogging. I don’t mind if you don’t get it, Twitter is fun, and a damn site more useful than MSN messenger.
The only thing about FriendFeed is that whilst it aggregates a lot of personalised content from my cohort of tech-savvy friends, it hasn’t quite turned into an RSS aggregator par excellence. A vast swathe of my online time is still spent consuming RSS feeds in Google Reader. So as an incentive to start blogging again (with a low barrier of entry ;)) I thought I would share a few of my favourite feeds that keep me coming back again and again, some of which hopefully are novel and useful.
I organise my reads by broad themes, and the first one in Google Reader happens to be ‘Arts’. This covers, for me, contemporary art to music (which I won’t *cough*Torrentech*cough* go into) to literature (and by literature I mean Sci-Fi ;)) and some stuff in between.
First up is the BALTIC forthcoming events feed. The BALTIC is a fantastic contemporary art gallery on the banks of the Tyne, which splits Gateshead (my stomping grounds) from Newcastle (where I actually work). Housed in an old flour mill on the Gateshead side, the BALTIC is one of my favourite places in the North East and every time I go I feel privileged to have such a great gallery so close to home. The feed itself serves up good notice of forthcoming exhibitions, but for the real skinny on lectures, workshops etc. then the Facebook group or mailing list tends to have more details than just the exhibitions. It would have been nice if they hadn’t switched their RSS feed URL with zero warning in August though. A last post detailing the new URL would have been a nice touch. Or maybe a PURL for the feed perhaps? It took me a little while to notice as new installations don’t come around that often!
Charlie’s Diary is the excellent blog of Charles Stross – one of my favourite British SciFi authors. Aside from the fact that he writes great fiction, this is a man who writes his own blog engine, loves vi, predates Linux and once was gainfully employed crafting crufty Perl scripts to provide one of the earliest online payment systems on the internet. This man is not only an author, but a geeks geek, and politically astute enough to understand the ramifications of advancing technologies. This blog is worth a read, regardless of whether you are a fan or not.
The final two links in this category deal with my current mild obsession which is digital photography. Having recently splashed out on a very low end digital SLR (a Nikon D40 if you care) the chance to marry obscene technological geekery with something vaguely artistic has been a powerful draw for me. It’s lead to a number of magazine and book purchases, but there’s nothing like the internet for free and useful information.
So the Digital Photography School feed is first up. For me this is just the best site for a continual stream of information for my new hobby. Split into 3 main sections the site slices and dices it’s articles into “Photography Tips and Tutorials”, “Cameras and Equipment” and “Post Production”, I really only look at the first of these. I won’t be making any more equipment purchases other than a tripod, remote shutter release, neutral gradient density filter, polarising filter, telephoto and macro lenses (and the depressingly expensive 50mm AF-S f1.4G Nikkor that I crave) for a while. OK so it’s basically so that I don’t add any more expensive items to my Amazon wish list, and the fact that I can pretty much work out how to use Lightroom by myself for now. I am completely awash in a world of f-stops, ISOs, apertures and shutter speeds, and it’s this site that is helping me through it currently.
The second is a proper photography blog – the feed of the Newcastle Upon Tyne Daily Photo site. This is a photoblog of local (well to me!) sights, with two main contributors. Now when these girls say ‘Daily’ they mean ‘Daily’ I don’t think in the months that I’ve subscribed they have ever missed an opportunity to post a view of Newcastle and its surrounds. Yes, there’s been an occasional repost, but not one that I’ve seen twice yet! Everything is covered from the Quayside, the city centre, pubs, bridges, the lot. Sometimes there’s views of the city and places that I’ve just not noticed, other times I’ve had to grab the camer and go out and grab similar shots. The photos might not always have stunning composition, but it’s just nice to be reminded that Newcastle is a beautiful northern city, proud of it’s heritage and definitely my favourite city in England. A few of my own photos taken over the last few months have ended up in this gallery on flickr but I need to take some more. I’ve also got some photos of Gateshead here and the Great North Museum (Hancock edition) here. Next time they invite reader submissions, I’m going to see if I can get one of my shots in…
So yes.. my ‘Arts’ section, not so large you see! Wait until I get onto the feeds in the ‘Geek’ section eh?
6 Comments »
84 DRIVE FAILURE BOX #1, BAY1
102 VOLUME #0 STATE INTERIM RECOVERY
102 VOLUME #1 STATE INTERIM RECOVERY
102 VOLUME #2 STATE INTERIM RECOVERY
102 VOLUME #3 STATE INTERIM RECOVERY
102 VOLUME #4 STATE INTERIM RECOVERY
102 VOLUME #5 STATE INTERIM RECOVERY
84 DRIVE FAILURE BOX #1, BAY2
84 DRIVE FAILURE BOX #1, BAY3
101 VOLUME #0 STATE FAILED
101 VOLUME #1 STATE FAILED
101 VOLUME #2 STATE FAILED
101 VOLUME #3 STATE FAILED
101 VOLUME #4 STATE FAILED
101 VOLUME #5 STATE FAILED
84 DRIVE FAILURE BOX #1, BAY4
84 DRIVE FAILURE BOX #1, BAY5
84 DRIVE FAILURE BOX #1, BAY6
84 DRIVE FAILURE BOX #1, BAY7
84 DRIVE FAILURE BOX #1, BAY8
84 DRIVE FAILURE BOX #1, BAY9
84 DRIVE FAILURE BOX #1, BAY10
84 DRIVE FAILURE BOX #1, BAY11
84 DRIVE FAILURE BOX #1, BAY12
Pretty ugly eh? It’s the kind of error that brings me out in a cold sweat every time I get emails from our users. Generally complaints that the databases are running slowly, or that files are disappearing from directories, that home directories are empty, reports that the filesystems have become read-only.
Of course when I go to look at the machine the display apparently tells me that an entire box of drives (we have 2 boxes with 12 drives in) has suddenly failed. The RAID volumes can’t maintain such a loss of drives, hence we INTERIM RECOVERY followed by STATE FAILED as more drives drop out of the array.
The weird thing is of course is that there’s nothing wrong with the drives at all, they’re sat there blinking little green lights at me telling me they are just fine.
The unit is an HP StorageWorks Modular Smart Array 1000, and I have to doubt the Smart moniker in this case, as it is the single most unreliable piece of hardware we own, apart from perhaps the HP blades it is attached to. Apple RAID units, Transtec RAID units, all the RAID5′d servers seem to pretty much be able to hold themselves together, but not this one.
Every time this happens we get an engineer called out, they plug a serial console into the unit, reset the error states on the drives and volumes, reboot the RAID and everything comes up smelling of roses. However trying to get them to send an engineer out is an exercise in frustration. It would of course be possible to affect this fix ourselves, given a laptop with a serial port, and one of HP’s magical and deeply proprietary 259992-001 console serial cables. Do we have one with our kit? No. How much do they cost? About £120. How much did we spend on the kit in the first place? Well over £100,000.
I will never, ever buy or recommend the purchase of another bit of HP kit as long as I am in the position to do so. Grr.
No Comments »
When I was a youngster, fascinated with the IBM PC clones that existed in the back pages of my Dad’s electronics catalogues and the amazing specs (and price tags!) they had whilst I toiled away with a Dragon 32 (and later an Atari ST), I used to sit around and lazily spec out machines that could not possibly have existed at the time. Sci-Fi computers if you will. Things with such outrageous memory and disk that I thought I would never see the day.
Now for a paltry £15k we have a Dell R905 in the rack, replacing our 32 CPU G5 Xserve cluster. Screenshots below because they make me happy :)
 
The rest of the day is basically going to be Simon and I pimping this out with software so we can start doing work on it. And we do have a fairly big backlog of memory intensive stuff we want to do!
No Comments »
I’m pretty sure these are student (Master’s) offices. Raised a smile as I wandered past however, something I didn’t think I was going to be able to muster at work today!

Apologies for the quality but I wasn’t toting my dSLR at the time, so crappy Blackberry curve camera had to do.
No Comments »
This came via the “R Project for Statistical Computing” group on LinkedIn, but it definitely seems there is a hashtag for all occasions these days on Twitter. Today it’s largely been connected with the election fallout in Iran on Twitter, with massive trending topics and a significant segment on the Channel 4 news devoted to it.
Getting back to things of a more academic bent, for those of us interested in R, posting those 140 character frustrations, hints, tips and questions – these can be aggregated with the #rstats hashtag allowing you to search Twitter traffic for statistical topics of interest with ease.
I seem to be spending a lot more time with my head in R these days, my “BioConductor Case Studies” book has just arrived, and the chunk of the day that wasn’t spent cursing a fritzed ext3 partition on an LVM volume, or installing our new monster machine was spent in R, answering questions from my Maths and Stats summer student.
No Comments »
I don’t know what vexes me more at the moment, the BCA (who I will not deign to link to) suing Simon Singh for libel, or the numpty’s in my beloved country that voted in two BNP MEP’s.
I am however a supporter of the “Sense about Science” campaign. Any organisation dedicated to fighting crackpots, frauds and pseudoscientists is going to find favour with me.
I do urge everyone to go and sign the ‘Keep Libel Laws out of Science’ petition here. Have a read, have a think. Is this the way we want things to continue?

No Comments »
|