R

Annotating KEGG compounds to pathway

To annotate a list of KEGG compounds to the KEGG pathways where they are involved I used the R package KEGGREST from Bioconductor. library(KEGGREST) So, having a list of KEGG compounds saved in a character vector like kegg_compounds, we use the method keggGet in batches of maximum 10 compounds to annotate them. The following (rudimentary) code, queries the database in batches of ten compounds fiddling a list (pathways) where it creates an entry per pathway and updates the field compounds with the compounds from kegg_compounds for each pathway.

Boston's Temperature Chart

At the end of January I will be moving to Boston. I will start my post-doc at the Boston Children’s Hospital. So… I started looking to weather and temperature conditions. I used Weather Underground to download a weatehr tamble for each month in 2016 and 2017. The aim is to create a plot with everyday mínimum and maximum temperature along all 2017. Also a heat-map indicating the weather condition of each day of the year.

minfi betas and residuals from methylation models

In the HELIX project we decided to use residuals instead of M values for the methylation analyses. So, how we get the residuals of a basic lineal model? Libraries and Data First of all we load the libraries we will use in this test: library( limma ) # We use lmFit to fit the lineal model library( minfi ) # Methylation data is saved as a GenomicRatioSet library( SmartSVA ) # We want to compute the SVA to correct methylation data library( isva ) # " library( Biobase ) # We will sabe the residuals in an ExpressionSet Once the libraries are loaded we proceed to obtain the methylation data:

Exploring public NHANES data using Rcupcake

The Rcupcake package contains functions to query different databases through the BD2K RESTful API. BD2K RESTful API is an interface that provides access to different data sources, making easier data accessibility, analysis reproducibility and scalability. The package is installed via devtools using it’s GitHub URL (hms-dbmi/Rcupcake) library( Rcupcake ) Rcupcake package follows a four-step process to retrieve the data from a database: Start session Select the variables of interest Build the JSON query Run the query to obtain the data The start.

Comparing 'user' Internet connection from some Catalan research centers

Using the same technique seen in the old post “Comparing ping time between connections” I asked some colleges to run the following command in their research centers. ping www.google.com -c 200 > ping_google.txt So, I load the multiple ping-files to create a data.frame with the icmp_seq number, the time spend per ping and the institution where the ping was promoted. ping <- lapply( files, function( file ) { dta <- read.

From Barcelona to Boston in R

Today I am traveling to Boston to attend the BioC 2017: Where Software and Biology Connect. In this trip to Boston, I stop in Lisbon to take the transoceanic flight. Let’s see a map Boston-Barcelona “centered” using the package maps: library( maps ) xlim <- c( -140, 20 ) ylim <- c( 25, 50 ) map( "world", lwd = 0.75, xlim = xlim, ylim = ylim ) Map showing Spain and USA

Comparing ping time between connections

To perform this test I ping 200 times Google from my PC at ISGlobal (running Linux Mint in a Virtual Machine) and from my laptop (running native Ubuntu). I saved the output in two TXT files with a command like the following one: ping www.google.com -c 200 > ping_google_work_wifi.txt I processed both files in R to create a data.frame. pwm <- read.delim( file1, nrows = 200, skip = 1, header = FALSE, sep = " ", stringsAsFactors = FALSE ) pwm <- pwm[ , c( 6, 8 ) ] colnames( pwm ) <- c( "icmp_seq", "time" ) pwm$icmp_seq <- as.

Creating a jobs time-lime for resume in R

Let’s say we define a data.frame with the jobs I’ve got from 2008 to 2017: jobs <- data.frame( employer = c( "GICO", "TES", "UAB", "IFAE", "ISGlobal" ), year_start = as.Date( c( "2008-07-01", "2009-11-01", "2010-09-01", "2011-07-01", "2013-09-01" ) ), year_end = as.Date( c( "2009-10-31", "2010-07-31", "2011-06-30", "2012-09-30", "2017-08-31" ) ), id = 1, stringsAsFactors = FALSE ) The content of the data.frame is easy understandable: The employer shows the nanme of the compaty/institution who employed me.

R and regex: find all occurrences

In R there are many functions that work with a pattern written as a regular expression. Today I needed to deal with one of these functions: str_locate_all (doc) from stringr My goal was to find "223777_at [Chip: U133B]" in a series of strings like the following one: text <- "11753227_s_at [Chip: PrimeView]; 223777_at [Chip: HT_HG-U133B]; 223777_PM_at [Chip: U133_Plus_PM]; 48336_at [Chip: U95B]; 223777_at [Chip: GeneProfilingArray]; g13477210_3p_at [Chip: U133_X3P]; MmugDNA.4759.1.S1_at [Chip: Rhesus]; 11753227_s_at [Chip: HG-U219]; ADXECADA.

Unload (detach) a loaded R package